Speech recognition is one of those emerging standards that I look forward to being more widely implemented, not in the least because of what it could mean for accessibility of web apps.

Interestingly for a powerful feature like this, it has an incredibly simple JavaScript API to use, in just a few lines of code you can get your app to handle speech input.

var recognizer = new webkitSpeechRecognition();

recognizer.onresult = function(event) {
  console.log(event.results[0][0].transcript);
  console.log(event.results[0][0].confidence);
}
recognizer.start();

Obviously the user will be prompted to whether or not they want to allow access to their microphone. As soon as that is allowed you can use the onresult callback to go over the transcript that got returned. If you want to keep continuously listening for voice input, rather than have it terminate after it finished recognition you can set the continuous property to true. By default this is set to false.

recognizer.continuous = true;

Likewise if you intend to deal with longer form voice input than user commands you might want to set the interimResults property to true. What this does is that while the user is still speaking, it already streams back the transcript. It will continue to work on the transcription and previous words might change based on the context until the isFinal property on the result event is true.

recognizer.interimResults = true;

Looking at the onresult handler, what you get back in the results property of the event is a multi-dimensional array. If you set interimResults to false, this array will only have a length of 1, otherwise it will hold as many results as got streamed in. Inside of each item of the array will be one or more recognition alternatives, sorted by the confidence property (which is a floating point number between 0 and 1). If you want the most likely transcript, you just access the first item in the array.

event.results[i][0]

Finally you can also set the language for the speech recognition using the lang property (en_US, fr_FR, de_DE, nl_NL,...) for example specified by the user or using the browser locale.

recognizer.lang = 'nl_NL';

There are more callbacks you can take advantage of (onAudioStart, onAudioEnd, onSpeechStart, onSpeechEnd,...), but for most typical use cases what I covered above is more than enough.  Below is a simple demo putting together what was discussed here:

See the Pen lxHhn by Peter Elst (@peterelst) on CodePen.

Posted
AuthorPeter
CategoriesJavaScript