HTTP API

Underlying all of the other API's, is the a basic HTTP Web Services API. The basic HTTP API does a post with attachments. Parameters are passed as fields. For the recognizer, the audio and grammars are passed as attachments. Audio is returned by the synthesizer in the HTTP response.


HTTP Speech Recognition Request

Field/Attachment Name Value Default Notes
developerId (field) String n/a Your developer ID
DeveloperSecret (field) String n/a Developer Password
UserId (field) String n/a User Id (optional)
gmode (field) simple**|jsgf|lm false if jsgf or simple, then include grammar attachment
continuousFlag (field) true|false false If true, will continue recognizing audio stream, utterance by utterance, until end of stream is reached
dataMode (field) Audio | Feature Audio If using the sphinx4 endpointer on the client, option to send aduio or feature stream.
doEndpointing (field) true|false false To enable endpointing on the server set tis field to true.
sampleRate (field)* Integer (Hz) 8000 Sample rate of the audio stream
Big-endian (field) * true | fasle true Endian of audio stream
bytesPerValue (field) * Integer 2 Size of each data sample in bytes
Encoding (field) * AudioFormat.encoding.toString() PCM_SIGNED Encoding of the Audio stream
outputMode (field) text | json text Returns plain text when in text mode. In json mode, returns confidence, phonetic pronunciations, word time tags (in lm mode). In jsgf mode, it will inlude the tags and values that can be used for semantic interpretation.
lmId (field) Identifier String n/a Langugae Model Identifier (Not implemented yet)
amId (field) Identifier String n/a Acoustic Model Identifier (Not implemented yet)
CmnBatchFlag (field) boolean true tbs
lmFlag (field) true | false false deprecated (use gMode)
Audio (attachment) Audio File to be Decooded audio/x-wav |audio/s4-audio |audio/s4-feature Required attachment
Grammar (attachment) JSGF grammar file plain/text Required in grammar mode

* Use these format fields if you are can not include the format in the audio attachments header. If none of these format fields are in the http request, the server will attempt to get the format from the attachments.

** Simple grammar mode is a comma seperated list of words. the grammar is equivalent to one or more of those words spokene in nay sequence


HTTP Speech Recognition Response

Field Type Notes
oog boolean A flag indicating if the utterance was out of a grammar
text string The recognized text.
confidence float The confidence that the recogized text is correct.
Word List Word Structure A list of words containing the word, phonetic spelling, confidence of the word, start time and stop time.
Tag List Tag Structure A list of tags and the values (if jsgf mode was used and tags were included in the grammar)

For each utterance (note in continuous mode, you will get a sequence of utterance).

In text mode, the result is plan text. Unless in jsgf mode then the tags are appended to the end of the raw text.

Example: this is the raw result <TAG:VALUE>


HTTP Speech Synthesis Request

Field Name Value Default Notes
developerId (field) String n/a Your developer ID
DeveloperSecret (field) String n/a Developer Password
UserId (field) String n/a User Id (optional)
Text plain text to be synthesized none Required
Voice Name of voice used for synthesis tbs Required
Mime Type audio/x-wav | audio/mpeg audio/x-wav Mimetype of result
sampleRate Integer (Hz) 8000 Samples per second
Big-endian true | fasle true Endian
bytesPerValue Integer 2 bytes per sample value
Encoding AudioFormat.encoding.toString() PCM_SIGNED Encoding

HTTP Synthesizer Response

The respones contains the sythesized audio file in the format specifid in the request.