Underlying all of the other API's, is the a basic HTTP Web Services API. The basic HTTP API does a post with attachments. Parameters are passed as fields. For the recognizer, the audio and grammars are passed as attachments. Audio is returned by the synthesizer in the HTTP response.
HTTP Speech Recognition Request |
|||
|---|---|---|---|
| Field/Attachment Name | Value | Default | Notes |
| developerId (field) | String | n/a | Your developer ID |
| DeveloperSecret (field) | String | n/a | Developer Password |
| UserId (field) | String | n/a | User Id (optional) |
| gmode (field) | simple**|jsgf|lm | false | if jsgf or simple, then include grammar attachment |
| continuousFlag (field) | true|false | false | If true, will continue recognizing audio stream, utterance by utterance, until end of stream is reached |
| dataMode (field) | Audio | Feature | Audio | If using the sphinx4 endpointer on the client, option to send aduio or feature stream. |
| doEndpointing (field) | true|false | false | To enable endpointing on the server set tis field to true. |
| sampleRate (field)* | Integer (Hz) | 8000 | Sample rate of the audio stream |
| Big-endian (field) * | true | fasle | true | Endian of audio stream |
| bytesPerValue (field) * | Integer | 2 | Size of each data sample in bytes |
| Encoding (field) * | AudioFormat.encoding.toString() | PCM_SIGNED | Encoding of the Audio stream |
| outputMode (field) | text | json | text | Returns plain text when in text mode. In json mode, returns confidence, phonetic pronunciations, word time tags (in lm mode). In jsgf mode, it will inlude the tags and values that can be used for semantic interpretation. |
| lmId (field) | Identifier String | n/a | Langugae Model Identifier (Not implemented yet) |
| amId (field) | Identifier String | n/a | Acoustic Model Identifier (Not implemented yet) |
| CmnBatchFlag (field) | boolean | true | tbs |
| lmFlag (field) | true | false | false | deprecated (use gMode) |
| Audio (attachment) | Audio File to be Decooded | audio/x-wav |audio/s4-audio |audio/s4-feature | Required attachment |
| Grammar (attachment) | JSGF grammar file | plain/text | Required in grammar mode |
* Use these format fields if you are can not include the format in the audio attachments header. If none of these format fields are in the http request, the server will attempt to get the format from the attachments.
** Simple grammar mode is a comma seperated list of words. the grammar is equivalent to one or more of those words spokene in nay sequence
HTTP Speech Recognition Response |
||
|---|---|---|
| Field | Type | Notes |
| oog | boolean | A flag indicating if the utterance was out of a grammar |
| text | string | The recognized text. |
| confidence | float | The confidence that the recogized text is correct. |
| Word List | Word Structure | A list of words containing the word, phonetic spelling, confidence of the word, start time and stop time. |
| Tag List | Tag Structure | A list of tags and the values (if jsgf mode was used and tags were included in the grammar) |
For each utterance (note in continuous mode, you will get a sequence of utterance).
In text mode, the result is plan text. Unless in jsgf mode then the tags are appended to the end of the raw text.
Example: this is the raw result <TAG:VALUE>
HTTP Speech Synthesis Request |
|||
|---|---|---|---|
| Field Name | Value | Default | Notes |
| developerId (field) | String | n/a | Your developer ID |
| DeveloperSecret (field) | String | n/a | Developer Password |
| UserId (field) | String | n/a | User Id (optional) |
| Text | plain text to be synthesized | none | Required |
| Voice | Name of voice used for synthesis | tbs | Required |
| Mime Type | audio/x-wav | audio/mpeg | audio/x-wav | Mimetype of result |
| sampleRate | Integer (Hz) | 8000 | Samples per second |
| Big-endian | true | fasle | true | Endian |
| bytesPerValue | Integer | 2 | bytes per sample value |
| Encoding | AudioFormat.encoding.toString() | PCM_SIGNED | Encoding |
The respones contains the sythesized audio file in the format specifid in the request.
| Copyright speechapi.com. 2009-2010 Contact Us |