dependable expert writers

With experience gained since 2003, EssayPride successfully completes 98% of all incoming orders. Select options, proceed, and consider your paper done!

Assignment type

Deadline (UTC)

Pages

Search our huge database of over 200,000 free example essays and research papers nearly on any topic imaginable!

Training Of An Automatic Speech Recognition System example essay topic

2,903 words

Table of contents Abstract 3 Overview of the Characteristics of Automatic Speech Recognition Systems 4 Number of Words 4 Use of Grammar 5 Continuous vs. Discrete Speech 5 Speaker Dependency 6 Early Approaches to Automatic Speech Recognition 6 Acoustic-Phonetic Approach 7 Statistical Pattern Recognition Approach 8 Modern Approach to Automatic Speech Recognition 8 Hidden Markov Models 9 Training of an Automatic Speech Recognition System Based on HMMs 11 Sub-Word Units 11 Applications of Automatic Speech Recognition Systems 12 Automated Call-Type Recognition 13 Data Entry 13 Future Applications Using Automatic Speech Recognition Systems 14 Conclusion 14 References 15 Abstract With the advances of technology, a lot of people may think that integrating the ability of understanding human speech in a computer system is a piece of cake. However, scientists disagree. Since the early nineteen fifties, scientists have tried to implement the perfect automatic speech recognition system, but they failed. They were successful in making the computer recognise a large number of words, but till now, a computer that understands everything without meeting any conditions does not exist. Due to the enormous applications, a lot of money and time is spent in improving speech recognition systems. SPEECH RECOGNITION: PRINCIPLES AND APPLICATIONS Nowadays, computer systems play a major role in our lives.

They are used everywhere beginning with homes, offices, restaurants, gas stations, and so on. Nonetheless, for some, computers still represent the machine they will never know how to use. Communicating with a computer is done using a keyboard or a mouse, devices many people are not comfortable using. Speech recognition solves this problem and destroys the boundaries between humans and computers.

Using a computer will be as easy as talking with your friend. Unfortunately, scientists have discovered that implementing a perfect speech recognition system is no easy task. This report will present the principles and the major approaches to speech recognition systems along with some of their applications. Overview of the Characteristics of Automatic Speech Recognition Systems How can we evaluate a speech recognition system Obviously describing it by good or bad isn't enough since the performance of such a system may be outstanding in one application and poor in another. In fact, speech recognition systems are designed according to the application. Some of these variable characteristics are presented below.

Number of Words The major characteristic of a speech recognition system is the number of words it can recognise. The question that comes to mind is how many words are enough so that the performance of a speech recognition system is acceptable. The answer depends on the application (6, p 98). Some applications may require few words, like automated call-type recognition, others may require thousands, like data entry. However, increasing the number of words or the vocabulary of a speech recognition system increases its complexity and decreases its performance (probability of error is higher) (6, p. 98). Systems with large vocabularies are also slower since more time is needed to search a word in a large vocabulary.

Increasing the number of words isn't enough because the speech recognition system is unable to differentiate words like 'to' and 'two' or 'right' and 'write' (6, p. 98). Use of Grammar Using grammar, differentiating words like 'to' and 'two' or 'right' and 'write' is possible. Grammar is also used to speed up a speech recognition system by narrowing the range of the search (6, p. 98). Grammar also increases the performance of a speech recognition system by eliminating inappropriate word sequencing. However, grammar doesn't allow random dictation which is a problem for some applications (6, p. 98). Continuous vs. Discrete Speech When speaking to each other, we don't pause between words.

In other words, we use continuous speech. However, for speech recognition systems, there is difficulty in dealing with continuous speech (6, p. 98). The easy way out will be using discrete speech where we pause between words (6, p. 100). With discrete speech input, the silent gap between words is used to determine the boundary of the word, whereas in continuous speech, the speech recognition system must separate words using an algorithm which is not a hundred per cent accurate.

Still, for a small vocabulary and using grammar, continuous speech recognition systems are available. They are reliable and do not require great computational power (6, p. 100). However, for large vocabulary, continuous speech recognition systems are very difficult to achieve, require huge computational power, as well as being slow. In fact, processing a speech sample can take three to ten times the time required for a person to say it (6, p. 100). Speaker Dependency Speech recognition system designers must consider another important issue: whether their systems are speaker-dependent or speaker-independent.

Each person pronounces a word differently. Although it is easy for humans to recognise the word 'car' whether an American or an Englishman says it, for speech recognition systems, this is not the case. Speaker dependency is determined from the application, some may require speaker-dependent systems (as in data entry), others may require speaker-independent systems (as in automated call-type recognition) (6, p. 100). Speaker dependency affects greatly the training of an automatic speech recognition system (4, p. 42). Early Approaches to Automatic Speech Recognition When scientists dreamed about a machine capable of understanding spoken language, computers and super fast integrated circuits were not available. However, they managed to build the fundamental principles of speech recognition systems.

Several approaches were used, each one with advantages and disadvantages. Two of these approaches are discussed below. Acoustic-Phonetic Approach The theory behind acoustic-phonetic approach is acoustic phonetics. This theory assumes that spoken language is divided into phonetic units that are finite and particular. These phonetic units are distinguished by properties that are apparent in the speech signal (7, pp. 42-43). The process by which speech is recognised is described briefly in what follows: initially, speech is divided into segments.

According to the acoustic properties of these segments, an appropriate phonetic unit is attached to it. The obtained sequence of units is used to formulate a valid word (7, p 43). Figure 1: Phonetic sequence for a speech sample (7, 43). As an example, consider the sequence of phonetic units matched with a sample of speech illustrated in figure 1. The symbol 'SIL' indicates a silence whereas the vertical position of the phonetic unit indicates how good it is matched with the corresponding segment of speech (the higher, the best match). After searching, we can match the phonetic sequence SIL-AO-L-AX-B-AW-T with the expression 'all about'.

It is obvious that the chosen phonemes are not only the first choices in the phonetic sequence, but also second (B and AX) and third (L) choices. Therefore matching a phonetic sequence with a word or a group of words is not obvious (7, p. 43). In fact, this the main disadvantage of this approach. Statistical Pattern Recognition Approach In statistical pattern recognition, the speech patterns are directly inputted into the system and compared with the patterns inputted in the system during training (7, p. 43).

Unlike the acoustic-phonetic approach, the speech is not segmented nor checked for its properties. If enough patterns are inputted to the speech recognition system during training, it will perform better than the acoustic-phonetic approach. In general, statistical pattern recognition approach is used more than acoustic-phonetic approach because it is simpler to use, invariant to different speech vocabularies, and more accurate (higher performance) (7, p. 44). Modern Approach to Automatic Speech Recognition With the availability of computers and high speed microprocessors, more research was done using the huge computational power available to solve the speech recognition problem.

However, scientists, till now, don't know the solution. Nevertheless, they were able to implement new approaches that proved to be much more efficient than earlier methods. Speech recognition systems are able to recognise more words and with more accuracy (3, p. 115). Some of these approaches are presented below.

Hidden Markov Models (HMMs) Speech is divided into phonemes. Unfortunately, these phonemes do not remain the same, they change according to the surrounding phonemes (4, p. 44). HMMs are a tool to represent these changes mathematically. A Markov model consists of a number of states linked together with each state corresponding to a unique output.

Each link between two states is characterised by a probability called transitional probability (4, p. 44). Moving from one state to another or remaining in the same state is function of the corresponding transitional probability (2, p. 50). A classical example illustrating Markov models is the following: consider a three-state weather system with state one being rainy, state two cloudy, and state three sunny. Such a system is shown in figure 2 (transitional probabilities are added for explanation below). From the diagram, it is clear that if the current day is sunny, the probability of tomorrow being cloudy is 0.1, of tomorrow being rainy is 0.1, of tomorrow being sunny is 0.8 (2, p. 50). Figure 2: Three-state Markov model of the weather (2, p. 51).

This example is an observable Markov model since we can check the state we are currently in (2, p. 50). Nevertheless, speech recognition systems use hidden Markov models since the speech fragment is not observable by the speech recognition system (2, p. 50). In hidden Markov models, a state can represent many outputs, therefore, a probability distribution of all possible outputs is associated with each state. A diagram of a three-state HMM is shown in figure 3 (4, p. 44). This figure shows that each state has five possible outputs (A, B, C, D, and E) occurring with a probability according to b-1 (s), b 2 (s), or b 3 (s). HMMs are doubly probabilistic since the transition from one state to the other and the output generated at that state are probabilistic (4, p. 44).

Therefore we notice that if we receive a sequence of outputs from an HMM, we are not able to retrace the sequence of states that the HMM passed by to get that sequence (4, p. 44). Looking at figure 3, it is evident that an output sequence of A-B-C for example, can be achieved by any sequence of three states; however, each sequence of states has its own probability of occurrence. In speech recognition, each word is represented by a sequence of states (1, p. 53), therefore, it is essential to find this sequence for any sequence of outputs. In fact, finding this sequence is equivalent to solving the speech recognition problem. Figure 3: Three-state hidden Markov model (4, p. 44). The sequence of states is determined according to its probability.

However, checking all the probabilities of all possible sequences can be very time consuming, especially in speech recognition HMMs that are much more complicated than our three-state example in figure 3. This problem was solved using an algorithm that utilises the fact that the probability of being in a certain state relies on the previous state (4, p. 44). Training of an Automatic Speech Recognition System Based on HMMs As mentioned earlier, a major component of an HMM system are the probabilities between states and the probability distribution of each state. To have a good speech recognition system, these probabilities must change to factors like language, possible number of speakers, and so on (3, p. 115). Determining these probabilities is part of what is known as training the speech recognition system.

This training process depends on whether we are dealing with a speaker-dependent or a speaker-independent speech recognition system. In the first case, speech samples are taken from the user and the probabilities are determined accordingly. In the second case, speech samples are accumulated from many speakers in addition to the text of what was said. In this case, the training process is much more complicated since the spectrogram (measure of frequency vs. time) of the same word depends on the speaker. A training process consists also of implementing a dictionary holding the vocabulary along with a grammar of permitted word sequences (4, p. 42). Sub-Word Units In HMMs, each word is represented by a sequence of states (1, p. 53).

A word is recognised from the sequence of states that is most probably associated with a sequence of outputs. Therefore, the unit for such HMMs is the word. Many scientists believe that using sub-words instead of words may improve the quality of speech recognition (1, p. 50). To implement sub-word HMMs, a system of sub-word units must by selected. The simplest form of sub-word units are phones. Using phones as units for an HMM seems to be the right choice since phones are small in number and smoothly trained, but the performance of such an HMM is poor since a phone is affected by the surrounding phones (1, p. 53).

Another choice of sub-word units are syllables. Similar to phones, syllables are also affected by surrounding syllables, but their number is much greater than phones (around 20 000 in English) which make them hard to train (1, p. 53). A new sub-word unit, known as triphone, seem to be the most successful. Triphones solve the problem of influence between sub-word units and their surrounding by modelling each phone according to its right and left neighbour (1, p. 53). As an example, the 't' in 'still' will be modelled by the's-t-i triphone (1, p. 53). The immediate problem one can think of is the large number of triphones since we are taking each phone and combining it with all possible left and right phone neighbours.

This problem can be resolved by using the fact that some triphones can be very similar since many neighbouring phones can affect a phone the same way (1, pp. 53-54). For example, the effect on the 't' in 'still' is similar to the one in 'steal' (1, pp. 53-54). Even though the performance of the recognition system is affected by such approximations, it remains within acceptable standards (1, p. 54). Applications of Automatic Speech Recognition Systems With all the time and money spend on researches on speech recognition systems, someone may wonder about the applications of speech recognition. This part will present some of the currently available applications along with some future applications of automatic speech recognition systems. Automated Call-Type Recognition An interesting and relatively simple application of speech recognition systems is automated call-type recognition.

In pay phones, operators are needed to determine the call-type of the caller (7, p. 490). Speech recognition may be used instead of operators. Five types of calls are available: 'collect', 'calling card', 'operator' for operator assisted calls, 'third number' for third party billing calls, 'person' for person-to-person calls (7, p. 490). For this application, the speech recognition system must be speaker independent and capable of recognising and spotting the five key words mentioned above in a speech sample (2, p. 52). The problem in this application is the high amount of background noise since pay phones are usually available in public places, however, this problem can be solved using appropriate speech recognition systems (low-level speakers, etc.) (2, p. 52). Data Entry Entering data using speech recognition is very practical when performing a manual task (6, p. 102).

A speech recognition system for this application is highly complex and structured since it should contain a large vocabulary. For data entry, speaker-dependent or speaker-independent speech recognition systems are available even though speaker-independent systems perform better than speaker-dependent systems. They are also available for discrete or continuous speech (6, p. 102). Data entry applications are still limited since the performance of speech recognition systems in this field is still limited. Future applications using automatic speech recognition systems With the increasing performance of automatic speech recognition systems, companies are more interested in integrating speech recognition systems in their products. Car manufacturers are interested in replacing all the levers, knobs, and buttons by a speech recognition system capable of doing everything, from raising temperature to locking doors and turning on the radio (5, p. 49).

In this way, the electronic content of the car is increased whereas the mechanical is reduced. This makes the car easier to design and build, therefore costing less (5, p. 49). Others think of applying speech recognition systems in kitchen appliances such as dishwashers, ovens, refrigerators. Air-conditioners might some day be voice controlled (5, p. 49). Conclusion The gradual but inevitable development of speech recognition systems will surely lead to a system that will one day compare to the perfect speech recognition device, the human being. New methods and algorithms are researched every day to improve the performance of speech recognition systems.

Will we reach a stage where keyboards, buttons, and all input devices become obsolete Time will tell.

Bibliography

1. Holmes, W.J., & Pearce, D.J.B. (1993, Vol.
11, No. 1). Sub-word units for automatic speech recognition of any vocabulary. GEC Journal of Research, 49-58.2. Juang, B.H., & Perdue, R.J., Jr, & Thomson, D.L. (1995, March / April).
Deployable automatic speech recognition systems: Advances and challenges. AT&T Technical Journal, 45-54.3. Kay, R. (1998, January).
Do you hear what I say. Byte, 115-116.4. Makhoul, J.F., & Schwartz, R. (1997, December).
The voice of the computer is heard in the land (and it listens too! ). Spectrum, 39-47.5. Manner, G. (1995, July).
Machines that listens. Popular Mechanics, 47-49.6. Markowitz, J. (1995, December).
Talking to machines. Byte, 97-104 7. Rainer, L. & Juang, B.H. (1993).

U
Use Speech

Things every student should keep in mind

Free example essays found anywhere online are available to anyone, which makes them used, re-used, paraphrased, and abused millions of times.
Watch out, some are poorly written!
We strongly discourage you to submit free essays or any of their parts for credit at your school -- they are easily detected by PLAGIARISM CHECKERS.
Get a brand-new, 100% original paper that will be written especially for you following YOUR EXACT instructions.

100% money back, no questions asked if you paper is plagiarized (this won't happen anyway).

We use a simple but effective principle: one satisfied customer will come back for more, but one who was cheated and misled will tell 10 others too.

Our clients are treated with the highest level of respect that a legitimate student deserves. You, as a customer, will feel this attitude starting from your first contact with our essay service and all the way through.

A significant percentage of cheap essay writing services have also been the source of complaints from students for selling cut and pasted work off the net -- this is a world away from the personalized essay service that EssayPride offers. All our guarantees are always kept -- we are nothing without quality, affordable prices, and the high degree of customer satisfaction!

What our customers say

Piotr S.
Toronto, Canada

I am an ESL student and I am only learning how to write good papers in English. Thanks to EssayPride I am mastering this skill much faster. They help me because they always respond any questions and explain me things I do not understand. They also strictly follow the deadlines and I am never late with my assignments. Thank you very much, guys, your hard work is appreciated. I will surely be ordering more.

Lindsay M.
Winston Salem, NC

Oh my Gosh! My life has become sooo much easier after I've come across this website. I am a working student, so sometimes I am too overwhelmed with so many things and I really need a hand with my papers. I am glad I have found a company I can trust. I have confidence in these guys, because they proved to be good quite a few times.

Kirk N.
Austin, TX

I never leave reviews for products or services because I am quite particular and picky. Surprisingly enough, EssayPride has managed to satisfy all of my requirements, even though I asked for several alterations to the paper they've sent initially. My assignments are quite complicated and it is essential to possess a certain level of knowledge in order to write a decent paper. These guys have managed, so I give them four stars.

Emma Ch.
Birmingham, UK

Customer service is very responsive to your queries, they answer any question within an hour. Even if you have a problem with an order, you can contact them and they will fix it promptly. The quality of writing is very good, writer knows what he is talking about. I had a very positive experience using this website, I will be a returning customer.

Travis J.
Perth, Australia

I used this website more than once and every time my experience turned out to be extremely positive. I think their price-quality ratio is very good, because I couldn't find anyone writing better than these guys, who would work for this money. Thank you!

Myung O.
Seoul, South Korea

I am a Korean student studying in the US. I would like to thank essaypride staff, especially writer Jeff P., for helping me out so much. My grades are good with all your help and I keep calm about the results of my year.

Boyi Zh.
St. Louis, MO

As probably any student, I was quite hesitant about asking somebody else to do my assignments at first. However, after EssayPride has sent me my first paper and I've read it, I understood that I could actually learn a lot from them. The research they've done was impressive and I understood the topic even better than after going to class and reading my textbook. I am not using this website to "cheat", I am using it as a tutoring service, they help me to understand the material better.

Order ANY essay at an affordable price!

Samples of professionally written essays produced by our company. Feel the difference!

Free example essays, top 50

A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X-Z

Our website uses cookies. By continuing, you agree to their use. Learn more, including how to control cookies.