Until a few years ago, the state-of-the-art for speech recognition was a phonetic-based approach including separate components for pronunciation, acoustic, and language models. Typically, this consists of n-gram language models combined with Hidden Markov models (HMM).

We wanted to start with this as a baseline model, and then explore ways to combine it with newer approaches such as Baidu’s Deep Speech. While summaries exist explaining these baseline phonetic models, there do not appear to be any easily-digestible blog posts or papers that compare the tradeoffs of the different freely available tools.

This article reviews the main options for free speech recognition toolkits that use traditional HMM and n-gram language models.

This is also not an exhaustive list of speech recognition software, most of which are listed here (which goes beyond open source).