Simon is:

an open-source speech recognition program and replaces the mouse and keyboard. designed to be very flexible and allows customization for any application where speech recognition is needed. a potential European project of “e-inclusion” because of the language-independent programming. in development for physically disabled people and seniors to give them the possibility to chat, write e-mails, surf the Internet, do Internet-banking and much more.

(from simonlistens.org)

Getting Simon

Simon is written in Qt and depends on phonon, so it is probably best to install it on KDE (plasma)-based systems, unless you don’t mind ~300MB of dependencies being pulled in with the installation (on a Kubuntu 15.04 system, Simon only takes up 55 Mb). Simon is in the Ubuntu repositories, so installing it would be as easy as on any Ubuntu-based distro. In Linux Mint (or in other distros), if you find that Simon cannot connect after installation, you will have to install libqt4-sql-lite If you want to make use of HTK acoustic models, you also need to install the HTK. This is optional, and if you do not know what this means, or you just don’t need to use it, you can safely skip it. Unfortunately HTK is only available as source code, but these simple instructions will make the installations straightforward.

Simon Listens

Simon is not easy to set up or use. To help you make it listen to you, an assisted setup greets you on the first run, walking you through the process. First you need to set up scenarios.

Scenarios are complete packages that will allow you to use Simon for specific purposes. They also provide what is known as a language model, describing all existing words and what sentences are grammatically correct, at least under the circumstances. By default you only have a standard scenario installed. From this screen you can create or load more and even download some with the Simon addon-installer (“Open -> Download”).

Next you will need an acoustic or speech model. This basically tells Simon what the individual words sound like.

If you have HTK installed, you can create your own model and train Simon to recognise the very specific way you talk. If you do not have HTK or do not care to use it, you can download static base models (“Open Model -> Download”) which will provide Simon with predefined acoustic patterns.

If you go with a static model, you can choose to adapt it with training samples later. On the server settings page, if you are going to install it locally and plan to use it regularly, you are safe to leave the default options on. If the server is located somewhere else, this is where you can define it.

You need to set up sound recording and playback devices,

test your microphone,

and you are ready to use Simon.

From the main screen, you can manage your loaded scenarios or open a specific one.

After opening a scenario, you can view and modify the vocabulary and grammar, train the acoustic model, modify the context dependence of the scenario, and set up direct commands.

The training wizard can also be started from the overview screen; you do not need to go into the scenario setting.

On the overview screen you also have the opportunity to change the audio (hardware) configuration or configure the acoustic (speech) model, including adding new ones.

Further help using Simon

Simon’s usage is far from straightforward. It takes a good amount of learning, training, and getting used to. Fortunately there is quite extensive documentation available online. The Simon Listens blog provides some insight, although the posts are quite outdated (the last entry was 2013). The site offers some further reading, and you can learn about voice-controlled business solutions powered by Simon on their commercial webpage. Although it might take some time to set up and master, Simon can transform any Linux (and even Windows) computer into a voice-controlled environment, be it special needs or just convenience you wish to use it for.