Their Use Of Continuous Voice Recognition Technology example essay topic
When users speak into the microphone their words can appear on a computer screen in a word processing format, ready for revision and editing. Voice recognition has gained tremendous popularity over the past few years. It has gone from imagination, to rumor, to reality and this trend is not going to stop. It this will be better explained by Jeffrey C. Scott from Computer Shopper in the paragraphs below.
"The ability to interact with computers by talking to them may help bridge the gap between human beings and machines. Once deemed a fantastical glimpse into a science-fiction-like future, voice-recognition technology has undergone several significant developments recently, and it now seems destined to move us toward a humanized computer interface. But that's for the future. At present, voice-recognition technology at its best lets us focus on our daily tasks and needs rather than on the computer's commands and syntax. Imagine inputting all your requests, numerous e-mail messages, routine correspondence and numerical data simply by telling them to the computer. Executives with limited typing skills, individuals who are physically challenged, and users who suffer from pain associated with the repetitive motion of typing can all benefit from voice-recognition applications.
In fact, almost anyone can profit from this technology, though it may take some adjustment and a little patience to scale the learning curve. Today's voice-recognition products are notably easier to use than those of the past. In addition, veterans of voice recognition should find that the new and improved dictation systems require substantially less training and customization."The Virtues of Training: Training a voice-recognition system consists of a defined dictation session where you are prompted to say words, phrases, and sentences. This exercise, which can take anywhere from 20 minutes to an hour and a half, allows the computer to become accustomed to your pronunciation. After this session, the program calculates the results of the speech samples. When the entire process is complete, the speech engine is thoroughly tuned to your particular voice.
You continually train the speech engine by correcting its recognition mistakes. This review focuses on four voice-recognition programs for the Windows environment: Dragon Dictate for Windows 2.01, IBM Voice Type 3.0 for Windows 95, Kurzweil Voice for Windows 2.0, and Listen for Windows 95. These packages all have two basic functions -- command and control, and dictation. Dragon Dictate, Voice Type, and Voice for Windows all seek control of the Windows operating system using dictation modes based on large vocabularies.
Listen for Windows 95 has a more strictly defined dictation capability, focusing on number dictation and certain vocabulary words. The packages were tested on a variety of Pentium and 486-equipped machines having from 8 MB to 32 MB of RAM. As voice-recognition packages are memory- and resource-intensive, superior performance was achieved by the machines having more RAM and faster microprocessors. To compare the programs, we dictated a variety of information, including sample phrases, letters, paragraphs, price lists, forms, and numerical data.
And to measure the recognition engines' robustness, the test passages included groups of similar words, such as 'there,' 'their,' and 'they " re,' or 'to,' 'too,' and 'two. ' Also, a command and control obstacle course was set up to test the packages' ability to manipulate the Windows interface. An examination of speech-to-text systems over the past several years reveals that prices have decreased dramatically, from thousands to merely hundreds of dollars. At the same time, usability and adaptability have greatly increased. These products have moved from being hardware-dependent to hardware-independent programs, no longer requiring a special or proprietary sound card. Many are also speaker-independent as well, meaning that the programs recognize with minimal training the speech of any user.
Voice-recognition software consists of two broad categories: discrete- and continuous-speech packages. Discrete-speech packages require the speaker to pause between words so that the engine can identify each word accurately. Most users find these short pauses to be relatively minor adjustments. Continuous-speech packages, on the other hand, can recognize your speech in the way you naturally talk -- flowing from word to word without constant interruptions or pauses. Though rare in mainstream software applications, continuous-speech capability can be found in many large-scale telephony operations. (See the sidebar 'The Future of Voice Recognition.
' ). These four packages, excluding Listen for Windows, use discrete speech for text dictation and continuous speech for commands and numeric dictation. Listen for Windows is a continuous-speech product. Read on as we begin to dictate". (THE VOICES OF AUTOMATION: Once a technology that existed only in fantasy, voice recognition has made itself a reality in modern computing. By Jeffrey C. Scott, Computer Shopper, September, 1996 web " "In grade school, you might have learned that reading your written creations aloud would help you identify errors.
Now, the computer can help in this area. A number of text-to-speech applications on the market let your computer convert onscreen text into reasonably human-sounding speech. Think of the uses: These systems can read aloud letters, online newspapers, e-mail, and sales forecasts. In the past, speech-generating systems consisted of external voice synthesizers attached to computers. Soon after that, voice-synthesis devices were housed on internal soundboards.
Though there are still many extensive hardware-based systems, a lot of speech-synthesis packages are hardware-independent. In the text-to-speech process, your computer uses a variety of language models and phonetic rules to synthesize sound, based on characters that it previously recognized as binary code. Some text-to-speech programs let you control the characteristics of the voices produced by your computer. Eloquent Technology's text-to-speech package, Eloquence for Windows, produces voices that stress particular words, leading to speech that reflects feelings like boredom or excitement.
Moreover, this program, like other similar systems, has p reprogrammed voices with names like Gretchen, Herbert, or Larry; the names attempt to conjure up images for each of the voices. Eloquence for Windows should be available by now. Another text-to-speech application, Digital Equipment Corp.'s Windows NT-based DEC talk, has a comprehensive speech engine and allows for the creation of personalized voices. An included module, Mail Talk, announces the subject and source of incoming e-mail messages -- a welcome bonus for frequent users of e-mail. If you want to hear how the written documents sound read aloud, or if you want to be able to listen to e-mail while you " re performing other chores, you may want to purchase a text-to-speech application. Once you " re up and running with such a system, you " ll likely wonder how you got along without it.
Eloquence by Eloquent Technology lets your computer read the text in its window in a variety of voices. Product Listing Eloquence for Windows DECtalkEloquent Technology Digital Equipment Corp. 2389 North Triphammer Rd. P.O. Box 9501 Ithaca, NY 14850 Merrimack, NH 03054607-266-7025 800-344-4825 Fax: 607-266-7030 Requires: 8 MB RAM; web Windows NT Requires: 4 MB RAM; List Price: $1,195 Windows 3.1, Windows 95, or Windows NT List Price: TBA (LOOK WHO'S TALKING NOW: By Jeffrey C. Scott, Computer Shopper September, 1996 web) The Future of Voice Recognition has unlimited possibilities in the way its applications could be used". Unlike discrete-speech packages, which are characterized by noticeable pauses between words, continuous-speech programs recognize words as they are spoken naturally. The most advanced form of voice-recognition technology today, continuous-speech engines are still rarely found in mainstream software applications.
But with technological advancements occurring so rapidly, that should change. It was not so long ago that discrete voice-recognition packages were once relegated to large-scale, expensive software implementations. Now they " re common commodities. If you " re interested in the voice-recognition technology of the future, you can get a glimpse of it in many telephony systems set up by small and large businesses across the country. In fact, its use is becoming so widespread, you might already be familiar with the technology.
If you " ve ever checked bank or credit card balances by phone, or listened to the verbal litany of seemingly endless options for directing a phone call, you might have been on the receiving end of a continuous voice-recognition system. One such wide-scale application, Speak 2 Banking by Pure Speech (617-441-0000; web), lets financial institutions use continuous voice-recognition technology to provide common banking services. For instance, when you call your bank to check your balance and make a transfer between two accounts, you can accomplish it without touching the phone's keypad. Once the system begins, you simply state your requests and answer questions in your natural speaking manner.
Your inquiries can be as conversational as 'I want to check the balance in my savings account. ' A popular telephony software application that uses a continuous-speech engine is Wildfire by Wildfire Communications (617-674-1500; web). This system works over phone lines to provide a number of useful call-answering and contact-management services. Wildfire functions as an automated assistant, answering the phone, transferring calls, and providing customized information for callers. It will also inform users of incoming calls even while they " re on the phone. Most important, all these services take place during the course of a single call.
The system is available on a user-subscription basis or in custom office configurations. With their use of continuous voice-recognition technology, telephony-based voice-recognition systems provide a high level of functionality and service. But even more important, they give us a glimpse into the not-so-distant future when affordable mainstream software applications will let you and your computer converse, devoid of any unnatural pauses". (The Future of Voice Recognition: By Jeffrey C. Scott, web Computer Shopper September, 1996 web) You may ask how does voice recognition work?
I had the same questions, so I went to a few local stores (Best Buy and Circuit City) Also I e-mailed Dragon Systems and asked a few question in which some were answered and some they have not gotten back to me yet. But first, to operate a computer through voice, the user must learn how to dictate in a word-by-word manner known as 'discrete speech. ' In other words, the computer cannot recognize individual words if they are spoken the way people usually speak -- in fluent sentences or 'continuous speech. ' Next, the user must 'teach' the system to recognize his or her voice through a combination of training and usage. We all pronounce individual words in different ways, and voice recognition software cannot simply recognize everyone's voice right off the bat.
As the user speaks to the system, the software creates a user-specific voice file that contains a lot of information about his or her voice qualities and pronunciations. The system uses this information to make its best guess at what each word is as it is dictated. The process of 'familiarizing' the voice recognition software with an individual voice takes time. When a user takes the time to properly train and use the voice recognition system, which creates a strong and accurate voice file, the system will supply the correct word most of the time.
However, the system will never achieve a 100% accuracy rate in all situations. Sometimes the computer just doesn't get it right and suggests the wrong word. The user must then stop and correct the system. What happens when the computer does not recognize a dictated word correctly?
Because the computer is programmed to assume it knows it occasionally makes mistakes, for each word it offers as its best guess, it generates a list of alternative words. In some voice recognition programs, this list appears in a suggestion window on the screen and the words in it change with each dictation. The user can correct a mistake by choosing the desired word from this list if it appears there. If the correct word is not in this list of alternatives, the user can spell it aloud, letter by letter, or begin typing the letters on the keyboard. The computer will use this information to predict the right word. A voice recognition system is made up of a computer with system software, voice recognition software, a microphone, and usually a sound card.
To use voice recognition to word process, a word processing program is also needed. Each software program has different hardware requirements, but generally speaking a more powerful computer is needed -- typically with a Pentium or a very fast 486-based CPU and at least 16 MB of RAM. In general, the voice recognition software itself is built on three parts: a large electronic dictionary (e. g., a 150,000 word dictionary from some publisher such as Merriam-Webster), a smaller active dictionary that reflects the user's own usage, and a voice model. The price of voice recognition systems has changed over the past few years.
I have found that in about 1988, the basic system cost $9,000, not including the computer itself, also the computer had to be a relatively powerful, and therefore costly. Fortunate for all of us, the cost of both the hardware and software has dropped dramatically over the years and there are a lot more choices. It is now possible to purchase a beginning-level voice recognition system for $100 or less. There are may voice recognition systems on the market today. For IBM and compatible users, the three leading voice recognition systems are Dragon Dictate, IBM Voice Type, and Kurzweil Applied Intelligence.
For Macintosh users, the primary system is Power Secretary. Below is a product listing of voice recognition software. This Product list is from ZDNet Products (Product listing: September 1996 web) IBM Voice Type 3.0 for Windows 95, IBM Corp. Old Orchard Rd. Armonk, NY 10504800-825-5263; 407-443-9250 web P 90 processor; 256 K of L 2 cache; 16 MB RAM; 30 MB hard drive space; Creative Labs Sound Blaster 16 or compatible sound card; Windows 95 List Price: $699; upgrade, $99 Kurzweil Voice for Windows 2.0 Kurzweil Applied Intelligence 411 Waverley Oaks Rd. Waltham, MA 02154800-380-1234; 617-893-5151 Fax: 617-893-6525 web 8 MB RAM for 30,000-word vocabulary, 16 MB RAM for 60,000-word vocabulary; 35 MB hard drive space; 16-bit Sound Blaster-compatible sound card; Windows 95 or Windows 3. xList Price: $695 Listen for Windows 95 Vertex Voice Systems 275 Raritan Center Pkwy. Edison, NJ 08837-3613800-275-8729; 908-225-5225 Fax: 908-225-7764 web 4 MB RAM; 14 MB hard drive space; Sound Blaster-compatible sound card or embedded audio; microphone; Windows 95 List Price: $99; download off the Internet, $59.95 Dragon Dictate for Windows 2.01 Dragon Systems 320 Nevada St. Newton, MA 02160800-825-5897; 617-965-5200 Fax: 617-527-0372 web 16 MB RAM; 36 MB hard drive space; 16-bit multimedia sound card or a Dragon Systems-certified DSP audio board; Windows 95 or Windows 3. xList Price: Personal Edition, $395; Classic Edition, $695; Power Edition, $1,695 But I have found out the Dragon Systems offers far more applications than stated above.
(See web for prices and in depth detail of below applications.) Consumer and Retail Products Dragon Naturally Speaking for Teens Dragon Naturally Speaking Essentials Dragon Naturally Speaking Standard Dragon Naturally Speaking Preferred Dragon Naturally Speaking Preferred USB Dragon Naturally Speaking Mobile Dragon Naturally Clear USB System H 100 (Microphone and Sound Card) Business and Professional Products Dragon Naturally Speaking Preferred Dragon Naturally Speaking Professional Dragon Naturally Speaking Mobile Dragon Naturally Speaking Mobile Organizer Dragon Naturally Speaking Mobile Option Kit Dragon Naturally Speaking Legal Suite Dragon Naturally Speaking Medical Suite Dragon Dictate Software Development Products Dragon Naturally Speaking Developer Suite Dragon Naturally Speaking Runtime DragonXToolsTM 2.0 for DragonDictateDragon Speech Tool (R) for DragonDictateTelephony Products Dragon Naturally Speaking Call Center Edition Dragon Dictate for DOS uses the IBM M-Audio Acquisition and Playback Adapter, M-A CPA, which is an ISA-based card. This card is used for digitizing the audio and for doing Fits on the digitized samples to assist recognition and reduced the CPU load. 1) How expensive was voice recognition in the late 80'sa) $15,000 b) $5,000 c) $100,000d) $9,0002) Which of the following was not covered in this paper? a) Dragon Dictate b) IBM VoiceTypec) Kurzweil d) Listen for Windows 95'e) none of the above 3) With Dragon Systems software, which of the following would not use one of their software packages. a) Professionals b) Lawyers) Doctors d) Teenagers e) All of the above. 4) How much does Kurzweil Voice for Windows 2.0 cost? a) $200 b) $520 c) $695 $15955) How much L 2 case do you need for IBM Voice Type 3.0 for Windows 95? a) 50 K b) 256 Kc) 186 K d) 320 K 6) Which of the following does users like the most? a) Dragon Dictate b) IBM VoiceTypec) Kurzweil d) Listen for Windows 95'7) Do you need a special sound card to run voice recognition software? a) True b) False 8) Can you order voice recognition software from over the net from Dragon Systems? a) True b) False 9) Will voice recognition software ever achieve a 100% accuracy rate in every situation? a) Yes it will b) No 10) Voice recognition software is being used today in business, where is it being used? a) Banking b) Doctors office sc) Law offices d) Court room se) All of the above f) all but D.