The main problematics of speech recognition
The long-term aim of machine-based speech recognition is general speech–text transformation regardless of background noise, speaker or subject matter. No such ‘all-competent’ speech recognition system exists at the moment in any language of the world. By narrowing the subject, however, we see that several minor fields have made advances that can have practical application, or have started more concentrated research.
▪ isolated word (command) recognition: if the commands are elements in a well-defined, smaller set of words or phrases, recognition can be fairly efficient.
▪ keyword search: keywords or sets of keywords can be found and signaled even in continuous speech.
▪ recognition of continuous speech (large vocabulary): transcription of continuous speech to text about a certain subject matter (low occurrence of errors only when the subject is very much limited)
▪ speech recognition in morphologically rich languages: research for linguistic and acoustic modeling of the morphological variety in Hungarian, Finnish, Turkish, for speech recognition purposes
▪ spontaneous speech recognition: research for speech recognition methods that can handle the characteristics of spontaneous speech
▪ noise-robust speech recognition: research for sound processing units and modeling methods suited for noisy environments (e.g. cars)
▪ speaker recognition: identification of speakers
▪ recognition of the emotional coloring of speech: recognition of emotions (joy, anger, fear, etc.) on the basis of the acoustic characteristics of speech
Some fields and applications of speech synthesis
In research
▪ conveying emotion in synthesized speech
▪ approximating human tones
▪ modeling of human prosodic variety
Development
The speech synthesizer always has to be adapted to its planned application.
▪ production of speech from general text
▪ audiobooks
▪ sound and speech in toys and games
▪ newsreading
▪ weather forecasts
▪ text processing and speech synthesis for specific applications
▪ mail-reader (reads the text of an electronic message aloud to the phone)
▪ text message reader (you can send text messages to landline phones, the machine reads them out for the user)
▪ reading names and addresses for company information bases
▪ vocalization of timetables on the phone and at stations
▪ telebanking systems, invoice readers
▪ time, date, exchange rate, etc. readers (e.g. stock prices)
▪ speaking computer and mobile phone applications for the blind and the visually impaired
▪ speaking information systems for the general public
▪ public drug information system (phone: (06-1)-886-94-90). Reads out the text from the information sheet.)
▪ speaking ATMs (helping the visually impaired in operating the machine)
▪ phone inquiries (phone: 12-70) (reads out the name and address of the phone number given.)
▪ medical devices: e.g. MONDOM-2000, speech hearing screening device (using a new method, synthetic speech, to measure hearing deficiency,
mostly used in kindergartens)
▪ automatic accent check in texts
▪ automatic accentuation of unaccented texts