Live Cross-Browser Speech Recognition in Node with socket.io & Google
TL;DR: An easy-to-set-up playground for cross device real-time Google Speech Recognition with a Node server and socket.io. Github Repo
We were building a demo with the following use case: Transcribing live microphone input with the Google Speech Recognition to work on all devices.
Unfortunately the Browser SpeechRecognition Api (or Web Speech Api) isn't supported anywhere but Chrome Desktop / Android to date. So we needed a different approach where we'd feed raw audio data into the Speech Api in realtime.
Building such a test case turned out to be a bit of a hassle as the google node.js implementation for live recordings isn't that comprehensible on what a basic setup could look like. The challenge of setting up the microphone stream correctly, recalculating it into 16 bit binary data and finding a proper way of streaming it to node, then google and back, made it difficult in the beginning. You'll find a couple plugins on Github and npm, but most are unmaintained, rather old and unclear documentation. There is just no good overview on how to do it today.
This Playground is aimed at first time users, who want to have their own easy working demo with the possibility of starting and stopping speech recognition. It basically works like this:
- Get Access to the users Microphone stream
- Init a recognizeStream for the Google Speech Api on your node server
- Get each AudioProcessData from the AudioContext
- Convert it to 16-bit binary data
- Send it to the node server
- Pipe (pass it to the Google recognize stream)
- Make the node server collect Googles transcription
- Send it back to the client
This is by far not final code. The credentials should be encrypted, it doesn't handle multiuser and so on... It's thought to be a starting point (without any module loading etc.) to easily get it up and running. Feel free to add any improvements to the repository!
Check the Github Repo to set it up.
The base of the code is greatly inspired by a blog post (2014) from Gabriel Poça, who made it work using the binary.js plugin. The cross device getUserMedia code comes from this stackoverflow answer. If you would like to implement this into a working product, you should also take a look at the microphone-stream package.