Ismeretterjesztés

The definition of speech and language technology

The aim of speech and language technology is to make communication between humans and humans, and humans and machines more efficient, and to make human work less difficult by providing the technological basis of new, computer-based products and services. Speech and language technology are dynamically growing, interdependent new industries dealing with the processing of written and spoken language. These are exceptionally interdisciplinary fields: they rely on foundations in mathematics, information technology, physics, neurology, linguistics, psychology and electrical engineering, and therefore engaging in them requires high qualifications and extensive investments. Internet searching, machine translation and translation support are based on language technology; so are spell-checking and text mining, to mention only the most widely known applications. For commercial use, speech technology is as yet less developed, but it already enables partly automated call centers, voice dial without training, the reading out of text messages, e-mails and the contents of a monitor; automated searches in audio and video material, medical dictation applications, etc. It is our hope that speech translation from and to Hungarian can also be achieved in the not so distant future.

The history of speech and language technology

Language and speech technology are customarily thought to be part of artificial intelligence research, although it started much earlier than that. The first attempts at machine-produced speech date back more than 200 years: Farkas Kempelen unveiled his first machine capable of voicing speech-like sounds (speech synthesizer) in 1791. This was the first speech production machine in the world, based on articulatory principles, Kempelen therefore was more than 200 years ahead of his time. He summed up the conclusions of his research in his book Mechanismus der Menschlichen Sprache (Vienna, 1791); with his observations he had helped found the science of phonetics.

Speech synthesis was developed further in beginning of the 20th century, parallel to the advances in sound recording and electronics. We are proud that it was also a Hungarian expert who applied for the first patent with an invention connected to machine reading. Miklós Bánó applied to the Patent Office in 1916 with the following invention (no. 74361): speaking machine capable of reproducing any text. He received the patent in 1919. His machine was using an electromechanical solution: the connecting of speech sounds coming from simultaneously running wax cylinders (a principle somewhat similar to today’s concatenation techniques).

The next step was to evolve a fully electronic, hand-controlled speaking machine. It was in the Bell Laboratory that they presented the device known as Voice Demonstrator (VODER) in 1939: this spoke English, and was controlled by a trained technician with a keyboard. VODER could produce continuous speech. But the real breakthrough in speech synthesis came in the 1950s, when even the speech synthesizer’s control could be automated with the help of computers. The first Hungarian speech synthesizer was developed in 1979, in the Phonetics Laboratory of the Research Institute of Linguistics (Hungarian Academy of Sciences) by Gábor Olaszy’s team. This machine is now on display and can be heard in the Institute’s permanent exhibition. Research in speech recognition started later and were propelled by the exponential advances in computer science: by now, significant results have been reached on this field as well.

Computational linguistics was launched in the United States of America, in the 1950s, when they first used computers to translate foreign language academic texts (mainly from Russian) to English. Since computers have proved to be much faster than humans in performing mathematical operations, researchers thought that after some initial clarification of the technical background, computers would soon be able to handle human languages as well.

The first machine translation experiments, however, turned out to be a disappointment. They failed in making accurate translations, and researchers had to admit that processing human language was a much more complex task than they had thought. When artificial intelligence research started in the 1960s, language and speech technology integrated into this new field, which was concerned with the comprehension and production of natural languages on a fully human scale.

To be able to translate from one language into another, we first have to understand their grammatical system: what sort of constructions they contain, how the systems traditionally named morphology and syntax work in them. But to learn syntax successfully, our system also has to be competent in semantics and the operations of the lexicon, along with at least elementary pragmatic knowledge and information about the actual use of language. The proper use of morphology has to be built on phonology and phonetics, especially if the goal is to develop applications in speech technology. A new branch of industry arose thus from the initial attempts at translation: it examines how computers can help in representing and processing natural languages.

Specialized fields of language and speech technology

Language and speech technology can roughly be divided into two parts: one working with written characters, and another dealing with speech as an acoustic phenomenon. Computational language technology is mostly concerned with the first of these, while speech technology takes the second. The two of course overlap in certain points, as for example in the text-based marking of accents with linguistic analysis, for the purposes of speech synthesis.

Some more important fields of research:
▪ speech recognition and speech synthesis
▪ computer-enhanced corpus linguistics
▪ processing and grouping of natural language data (e.g. syntactic or morphological analysis, establishing roots, tokenization)
▪ labeling applications (e.g. part-of-speech labeling, verb argument structure applications)
▪ automated distribution of word stress in written text, on the sentence level
▪ application of logical and semantic knowledge
▪ general examination of the connection of natural and formal languages
▪ machine translation, translation support
▪ spell-checking and style-checking
▪ text summaries
▪ speech acoustics