The history of speech and language technology

Language and speech technology are customarily thought to be part of artificial intelligence research, although it started much earlier than that. The first attempts at machine-produced speech date back more than 200 years: Farkas Kempelen unveiled his first machine capable of voicing speech-like sounds (speech synthesizer) in 1791. This was the first speech production machine in the world, based on articulatory principles, Kempelen therefore was more than 200 years ahead of his time. He summed up the conclusions of his research in his book Mechanismus der Menschlichen Sprache (Vienna, 1791); with his observations he had helped found the science of phonetics.

Speech synthesis was developed further in beginning of the 20th century, parallel to the advances in sound recording and electronics. We are proud that it was also a Hungarian expert who applied for the first patent with an invention connected to machine reading. Miklós Bánó applied to the Patent Office in 1916 with the following invention (no. 74361): speaking machine capable of reproducing any text. He received the patent in 1919. His machine was using an electromechanical solution: the connecting of speech sounds coming from simultaneously running wax cylinders (a principle somewhat similar to today’s concatenation techniques).

The next step was to evolve a fully electronic, hand-controlled speaking machine. It was in the Bell Laboratory that they presented the device known as Voice Demonstrator (VODER) in 1939: this spoke English, and was controlled by a trained technician with a keyboard. VODER could produce continuous speech. But the real breakthrough in speech synthesis came in the 1950s, when even the speech synthesizer’s control could be automated with the help of computers. The first Hungarian speech synthesizer was developed in 1979, in the Phonetics Laboratory of the Research Institute of Linguistics (Hungarian Academy of Sciences) by Gábor Olaszy’s team. This machine is now on display and can be heard in the Institute’s permanent exhibition. Research in speech recognition started later and were propelled by the exponential advances in computer science: by now, significant results have been reached on this field as well.

Computational linguistics was launched in the United States of America, in the 1950s, when they first used computers to translate foreign language academic texts (mainly from Russian) to English. Since computers have proved to be much faster than humans in performing mathematical operations, researchers thought that after some initial clarification of the technical background, computers would soon be able to handle human languages as well.

The first machine translation experiments, however, turned out to be a disappointment. They failed in making accurate translations, and researchers had to admit that processing human language was a much more complex task than they had thought. When artificial intelligence research started in the 1960s, language and speech technology integrated into this new field, which was concerned with the comprehension and production of natural languages on a fully human scale.

To be able to translate from one language into another, we first have to understand their grammatical system: what sort of constructions they contain, how the systems traditionally named morphology and syntax work in them. But to learn syntax successfully, our system also has to be competent in semantics and the operations of the lexicon, along with at least elementary pragmatic knowledge and information about the actual use of language. The proper use of morphology has to be built on phonology and phonetics, especially if the goal is to develop applications in speech technology. A new branch of industry arose thus from the initial attempts at translation: it examines how computers can help in representing and processing natural languages.