Overview

"Language-mind-machine" – New technologies in the information society

The main hallroom of the Hungarian Academy of Sciences served as a venue for the closing conference of the Platform, Language-mind-machine – New technologies in the information society, on October 7, 2010. The conference presented the result fromthe two years of the Platform, gave an overview of recent developments in the field,and discussed the leading role of Platform members in European projects. More then150 participants attended the conference, many from the media and policy makers,and also from partner platforms.The main focus of the conference was the Implementation Plan describing the researchefforts that need to be undertaken and the implementation that has to be putin place for the sector to maintain its leading role in national and European researchand development. After the opening speech of the president of the Platform outliningthe results from two years of operation, the main talk presented the ImplementationPlan. Further presentations described two major European projects coordinated bythe coordinating institute of the Platform, and gave an overview of speech technologyapplications. A usual demo session closed the conference which offered a handson experience for participants with current applications, and also the opportunity forpartner platforms to present their activities.The closing conference proved to be most successful both with respect to mediaresponse and as a basis for further activities outlined in the Implamentation Plan.

 

The fields of speech technology

The main problematics of speech recognition

The long-term aim of machine-based speech recognition is general speech–text transformation regardless of background noise, speaker or subject matter. No such ‘all-competent’ speech recognition system exists at the moment in any language of the world. By narrowing the subject, however, we see that several minor fields have made advances that can have practical application, or have started more concentrated research.
▪ isolated word (command) recognition: if the commands are elements in a well-defined, smaller set of words or phrases, recognition can be fairly efficient.
▪ keyword search: keywords or sets of keywords can be found and signaled even in continuous speech.
▪ recognition of continuous speech (large vocabulary): transcription of continuous speech to text about a certain subject matter (low occurrence of errors only when the subject is very much limited)
▪ speech recognition in morphologically rich languages: research for linguistic and acoustic modeling of the morphological variety in Hungarian, Finnish, Turkish, for speech recognition purposes
▪ spontaneous speech recognition: research for speech recognition methods that can handle the characteristics of spontaneous speech
▪ noise-robust speech recognition: research for sound processing units and modeling methods suited for noisy environments (e.g. cars)
▪ speaker recognition: identification of speakers
▪ recognition of the emotional coloring of speech: recognition of emotions (joy, anger, fear, etc.) on the basis of the acoustic characteristics of speech

Some fields and applications of speech synthesis
In research
▪ conveying emotion in synthesized speech
▪ approximating human tones
▪ modeling of human prosodic variety

Development
The speech synthesizer always has to be adapted to its planned application.
▪ production of speech from general text
    ▪ audiobooks
    ▪ sound and speech in toys and games
    ▪ newsreading
    ▪ weather forecasts
▪ text processing and speech synthesis for specific applications
    ▪ mail-reader (reads the text of an electronic message aloud to the phone)
    ▪ text message reader (you can send text messages to landline phones, the machine reads them out for the user)
    ▪ reading names and addresses for company information bases
    ▪ vocalization of timetables on the phone and at stations
    ▪ telebanking systems, invoice readers
    ▪ time, date, exchange rate, etc. readers (e.g. stock prices)
    ▪ speaking computer and mobile phone applications for the blind and the visually impaired
    ▪ speaking information systems for the general public
    ▪ public drug information system (phone: (06-1)-886-94-90). Reads out the text from the information sheet.)
    ▪ speaking ATMs (helping the visually impaired in operating the machine)
    ▪ phone inquiries (phone: 12-70) (reads out the name and address of the phone number given.)
    ▪ medical devices: e.g. MONDOM-2000, speech hearing screening device (using a new method, synthetic speech, to measure hearing deficiency,
                mostly used in kindergartens)
    ▪ automatic accent check in texts
    ▪ automatic accentuation of unaccented texts