Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Workshop on Acoustic Voice Analysis, Lecture notes of Microwave Engineering and Acoustics

University of Iowa (UI)Microwave Engineering and Acoustics

A summary statement of a workshop held in Denver, Colorado in 1994, sponsored by the National Center for Voice and Speech and The Denver Center for the Performing Arts. The workshop aimed to reach better agreement on purpose and methods of acoustic analysis of voice signals. a list of contributors and their affiliations. The workshop covered topics such as definitions and nomenclature in voice analysis, algorithms for extraction of parameters, high fidelity recording of microphone signals, computer file structures, sharing of data bases, and development of test signals.

Typology: Lecture notes

2021/2022

Uploaded on 05/11/2023

arjaa 🇺🇸

4.2

(5)

229 documents

1 / 36

This page cannot be seen from the preview

Don't miss anything!

SUMMARY STATEMENT 1

orkshop on

Acoustic Voice

Analysis

SUMMARY STATEMENT

BY INGO R. TITZE, PH.D.

National Center for Voice and Speech

The National Center for Voice and Speech is a multi-site, interdisciplinary organization dedicated to delivering state-of-the-

art voice and speech research to practitioners, trainees and the general public. Members of the consortium are The Univer-

sity of Iowa, The Denver Center for the Performing Arts, The University of Wisconsin-Madison and The University of Utah.

The NCVS gratefully acknowledges its source of support: Grant P60 DC00976 from the National Institutes on Deafness

and Other Communication Disorders, a division of the National Institutes of Health.

Discover Lecture notes of Microwave Engineering and Acoustics University of Iowa (UI)

Partial preview of the text

Download Workshop on Acoustic Voice Analysis and more Lecture notes Microwave Engineering and Acoustics in PDF only on Docsity!

SUMMARY STATEMENT 1

orkshop on

Acoustic Voice

Analysis

SUMMARY STATEMENT

BY INGO R. TITZE, PH.D.

National Center for Voice and Speech

The National Center for Voice and Speech is a multi-site, interdisciplinary organization dedicated to delivering state-of-the- art voice and speech research to practitioners, trainees and the general public. Members of the consortium are The Univer- sity of Iowa, The Denver Center for the Performing Arts, The University of Wisconsin-Madison and The University of Utah. The NCVS gratefully acknowledges its source of support: Grant P60 DC00976 from the National Institutes on Deafness and Other Communication Disorders, a division of the National Institutes of Health.

VV

2 WORKSHOP ON ACOUSTIC VOICE ANALYSIS

A workshop was held on the 17th and 18th of February, 1994, in Denver, Colorado to reach better agreement on purpose and methods of acoustic analysis of voice signals. Sponsorship was by the National Center for Voice and Speech, a research and training center funded by the National Institute on Deafness and Other Communication Disorders, and The Denver Center for the Perform- ing Arts. Topics included definitions and nomenclature in voice analysis, algorithms for extraction of parameters, high fidelity recording of microphone signals, computer file structures, sharing of data bases, and development of test signals. Attendance and contributions were by invitation, keep- ing in mind a balance between industry and academia. The following contributors were present:

David Berry, Ph.D. University of Iowa and NCVS Timothy Curran, M.S. Private Voice Consultant Dimitar Deliyski, Ph.D. Kay Elemetrics Bruce Gerratt, Ph.D. UCLA VA Hospital Wolfgang Hess, Dr. - Ing. University of Bonn, Germany Yoshiyuki Horii, Ph.D. University of Colorado and NCVS David Huang, Ph.D. University of Washington and Tiger Electronics Jack Jiang, M.D., Ph.D. Northwestern University Issam Kheirallah, M.A.Sc. University of Western Ontario, and Avaaz Innovations, Inc. Jody Kreiman, Ph.D. UCLA VA Hospital Jon Lemke, Ph.D. University of Iowa Martin Milder, B.S. University of Iowa and NCVS Paul Milenkovic, Ph.D. University of Wisconsin, CSpeech, and NCVS Fred Minifie, Ph.D. University of Washington and Tiger Electronics Ed Neuberg, M.S. Institute for Defense Analysis Ying Yong Qi, Ph.D. University of Arizona David Talkin, B.E.S. Entropic Ingo Titze, Ph.D. University of Iowa and NCVS William Winholtz, A.A.S. WJ Gould Voice Research Center, 1 Wintronix and NCVS Darrell Wong, Ph.D. WJ Gould Voice Research Center and NCVS

Dr. Wong, Coordinator of Technology Transfer at the National Center for Voice and Speech, acted as chairman of the workshop and editor of the proceedings. Dr. Titze, Director of the National Center for Voice and Speech and Executive Director of the WJ Gould Voice Research Center, led most of the discussions and served as author of the Summary Statement. In this Summary Statement, only the Recommendations (pp 26-30) should be viewed as majority opinion. All other materials are explanatory and the opinion of the author. The full proceedings may be obtained by writing to the National Center for Voice and Speech, Wendell Johnson Speech and Hearing Center, The University of Iowa, Iowa City, Iowa 52242.

FORWARD

(^1) TheWilburJamesGouldVoiceResearchCenterisadivisionofTheDenverCenterforthePerformingArts.

4 WORKSHOP ON ACOUSTIC VOICE ANALYSIS

Analysis of acoustic signals of the human voice has many purposes. From a technological standpoint, there is an ever-growing need to store, code, transmit, and synthesize voice signals. The telecommunications industry has dichotomized transmission of information into either voice or data , suggesting that voice signals are a class of their own. From a basic science standpoint, investigators have traditionally studied the microphone signal to understand speech production and perception, given that the acoustic signal is the common link between them. Finally, from a health science standpoint, the human voice has been shown to carry much information about the general health and well-being of an individual. Our voice reveals who we are and how we feel, giving considerable insight into the structure and function of certain parts of the body. This workshop was limited to voice analysis rather than speech analysis, the focus being on the extraction of information about the source of sound from a microphone signal. Thus, no attempt was made to discuss or summarize general speech analysis dealing with vocal tract information. For a complete review of speech analysis, the reader is referred to the three volumes of selected papers published by the Acoustical Society of America (Miller et al., 1991; Atal et al., 1991 and Kent et al., 1991). More specifically, the workshop was a response to an urgency expressed by a group of voice scientists, voice clinicians, and manufacturers of instrumentation to reach some consensus on utility, feasibility, and standardization of voice perturbation methods. There has been much expec- tation and much disappointment in what perturbation analysis can offer for diagnosis and assessment of voice disorders. This workshop gives some of the underlying reasons for both the high expecta- tion and the limited success. Perturbation analysis is based on the premise that small fluctuations in frequency, ampli- tude, and waveshape are always present in a voice signal, reflecting the internal “noises” of the human body. Every attempt on the part of the speaker to produce a perfectly steady sound results in an aperiodic waveform. Movements of tissue and air are modulated by the irregular internal motion of electrical impulses, fluids, and cells within an organ. Thus, what might appear to be steady movement or posture on a macroscopic scale is often pulsatile movement on a microscopic scale, as evidenced by twitching of muscles, expansion and contraction of blood vessels, and beating of cilia to transport fluids. If we could shrink to microscopic dimension and travel through the human body, we would see that much of the physical plant (the hydraulic, electrical, and chemical systems) exhib- its complex back-and-forth motions (oscillations). These micromovements impose fluctuations on what would otherwise be smooth and steady activity. Voice production can be thought of as the activation of an entire system of coupled oscilla- tors. The intent to vocalize activates motor commands that are responsible for the neural inputs to an

INTRODUCTION

SUMMARY STATEMENT 5

array of biomechanical, neural, and acoustic oscillators (large box in Figure 1). The vocal folds are the primary oscillating system that produce what we might call the carrier signal (the glottal air- flow). All other oscillators can then be thought of as modulators of the carrier signal. Some of the modulations are nearly sinusoidal (respiratory, heart beat) but many are high dimensional (action potentials of muscles, air vortices, mucus in motion). Yet others are passive oscillators (tracheal resonator, supraglottal vocal tract, various sinuses) that can influence the primary oscillating system. We can assume that the system of coupled oscillators contains and releases information about the human body; in particular, about its genetics, development, age, disease, language, culture, food and drug intake, and response to the environment (Figure 1). Voice perturbation analysis has the goal of extracting some of this information from the voice signal. The goal is not unlike that of extracting information about the universe from cosmic radiation, or the earth’s interior from seismic signals. In all cases, the procedure is extremely difficult and usually requires considerable a priori knowledge about the modulations to be extracted. Therein lies the primary problem of voice perturbation analysis in its present state. We don’t know how to measure or classify the multiplicity of perturbations and modulations that are observed simultaneously. Many studies are needed to isolate the individual contributions of each oscillator. Some of these studies are underway (Orlikoff, 1990; Titze, 1991). We also don’t know how to apply simple concepts of periodicity and aperiodicity to voice signals. Learning how to quantify aperiodicity is a central focus of this document. An abundance of terminology tends to mystify what is known about irregularity in voice production. It is appropriate, therefore, to establish working definitions of a few commonly used vocal terms. A more general glossary of terms is included at the end of this summary statement.

Figure 1. A list of biological oscillators involved in voice production and factors that may influence them.

SUMMARY STATEMENT 7

A problem has arisen in trying to make a precise mathematical definition stick for jitter or shimmer. What is meant by short term, for example, and what kind of variability measure should be adopted? There are many ways of quantifying a deviation from an expected pattern or trend. This has led to a proliferation of mathematical definitions for jitter and shimmer. We believe that it is best to leave the terms as they are (as generic descriptors of fundamental frequency and amplitude vari- ability) and use more standard terminology of engineering and statistics to quantify error measure- ments (see the later section on perturbation measures). An unfortunate misunderstanding can arise for singing teachers who use the term shimmer to describe a beautiful bell-like vocal quality. A shimmering voice is aesthetically most pleasing in this context. As a random short-term amplitude perturbation, however, shimmer is not particularly pleasing to listen to. It is usually perceived as a crackling or buzzing sound, and in extreme cases, it can become very unpleasant and rough. It is important to communicate, therefore, the context in which the term shimmer is used. Tremor is a low-frequency fluctuation in amplitude or frequency (or both). Its origin is usually neurologic. Physiologic tremors in the body have fluctuation rates between 0-15 Hz, but not all are perceived the same way auditorily when they are part of the vocal signal. Thus, a low- frequency tremor (0-3 Hz) is perceived as a wow. This is also the term used by the recording industry to describe variability in the speed of the tape drive of an audio recorder. A companion term, flutter , describes the variability associated with tape contact on the recording head. In the voice literature, flutter has been used to describe neurologic fluctuations in the 9-15 Hz range (Aronson et al., 1992). Flutter appears to be associated with rapid onset and offset of phonation, reflecting the natural oscil- lating frequency of the adductor-abductor control system in phonation. Some singers tend to culti- vate this natural frequency in the production of trillo - a fast, fluttering ornament typically used in renaissance music (Hakes et al., 1990). In the mid-range rate (4-8 Hz), vocal tremor is part of the natural quality of the human voice, provided it’s extent does not exceed certain limits. Synthesis has shown that without a small degree of tremor, steady vowel production has a buzzy quality. There is something about a low frequency fluctuation in the voice that makes it warm and acceptable. An exaggerated extent of vocal tremor, on the other hand, is considered pathologic (Koda & Ludlow, 1992). The origin of vocal vibrato is not completely understood, but some evidence is beginning to show that vocal vibrato may be a stabilized physiologic tremor in the laryngeal muscles (Niimi et al., 1988; Ramig & Shipp, 1987). It is conceivable, though speculative at this point, that a natural vocal vibrato can be cultivated from a 4 to 6 Hz physiologic tremor in the cricothyroid and thyroarytenoid muscles. This would require some mechanical load or reflex loop to stabilize irregular movement (Titze et al., 1994). For the description of pathologic voices, several terms have found universal appeal. Rough- ness refers to an uneven, bumpy quality. It results from irregularity in the energy contained in a critical band of the auditory system (Terhardt, 1974). Periodic sounds (such as vocal fry) can have

8 WORKSHOP ON ACOUSTIC VOICE ANALYSIS

roughness, but more often there is a lack of periodicity. Breathiness is a vocal quality that contains the sound of breathing (expiration, in particular) during phonation. Acoustically, there is a signifi- cant component of noise in the signal due to glottal air turbulence. Sometimes the term hoarseness is used to describe the combination of roughness and breathiness. The terms described thus far - perturbation, fluctuation, variability, jitter, shimmer, tremor, wow, vibrato, flutter, roughness, breathiness, hoarseness, and several others defined in the glossary - have no mathematical definitions. No numbers or physical units of measurement need to be attached to them, although some of them can be rated psychophysically. Nevertheless, they serve a purpose in describing vocal phenomena and the associated physical processes. At this point, some additional terms will be reviewed that have mathematical definitions.

Periodicity, Subharmonics, and Modulation A series of events is termed periodic if the events cannot be distinguished from one another by shifting time forward or backward by a specific interval nTo ,

f(t + nTo ) = f(t) (1)

where n is any positive integer and To is the period. To must be the smallest value possible to be deemed the fundamental period. Equation (1) can never be strictly satisfied in a voice signal. All vocal events tend to be aperiodic. The term quasi-periodic is sometimes used to suggest that there is only a small deviation from periodicity. It must be kept in mind, however, that quasi-periodicity is simply a special case of aperiodicity. Furthermore, in physics the term quasiperiodic has the special meaning of the superposition of two or more periodic signals with incommensurate (non-integer ratio) frequencies. Hence, we prefer not to use the term, but adopt nearly-periodic to avoid confu- sion. A series of events is termed cyclic if the events recur, but not necessarily in periodic fashion. A cyclic event is recognized on the basis of a pattern that involves neighboring points on a waveform (e.g., a zero crossing, a maximum value, a minimum value). A cyclic parameter is a construct of cyclic events (e.g., inter-pulse-interval, open quotient, skewing quotient, peak-to-zero amplitude, peak-to-peak amplitude, maximum flow declination rate). Some of these parameters are identifiable only after the acoustic waveform has been inverse filtered , which is the process of removing the vocal tract resonances from the waveform to obtain the glottal airflow (Rothenberg, 1973). In a sinusoidal waveform, the amplitude A , the period T , and the fre- quency 1/T are obvious cyclic parameters and have precise definitions. In a complex periodic wave- form, the fundamental period To and fundamental frequency Fo = 1/ To also have exact definitions (equation 1), but amplitude can be defined in a variety of ways. Traditionally, the peak value (maxi- mum positive or negative) and the peak-to-peak value (maximum positive to maximum negative) have been used. As alternatives, Hillenbrand (1987) used the root-mean-squared (RMS) intensity in

10 WORKSHOP ON ACOUSTIC VOICE ANALYSIS

one burst of instability in the middle of the contour. Over the rest of the utterance, the F o variation was considerably smaller. (Other graphs in Figure 2 will be discussed later). Now let x (^) i represent an arbitrary cyclic parameter, for which some stylistic contours are illustrated in Figure 3. Part (a) shows an irregular contour, similar to that of Figure 2 just discussed, but with fewer cycles. Part (b) shows a regular “up-down” pattern that is often seen in voice signals, and parts (c) and (d) show a linear and sinusoidal trend, respectively. The “up-down” pattern in part (b) suggests the presence of a subharmonic frequency Fo /2, or a period doubling 2 To. Clearly, if only every other point were plotted in the contour, a constant would result and periodicity would be achieved. Thus, the true period is doubled. In equation (1), period doubling is represented by using only the even values of n.

Figure 3. Modulations of a cyclic parameter x (^) i around the mean value (a) random, (b) alternating, (c) linear trend, and (d) sinusoidal.

SUMMARY STATEMENT 11

The “up-down” sequence is also referred to as a period-2 sequence in nonlinear mechanics. This nomenclature can be extended to define a period-3 sequence (the pattern would be high-low- middle) or to a period-4 sequence (high-low-very high-very low), and so on. The terms diplophonia, triplophonia, quadruplophonia have also been used in the description of these sequences, but the terminology has not been universally adopted. In general, a period-n sequence in the parameter contour would be called multiplophonia if it were important to retain reference to the word “phona- tion” in the nomenclature. However, “period-n phonation” or “phonation with an Fo /n subharmonic” accomplishes the same objective. But why isn’t Fo /n simply redefined as the fundamental frequency? That depends on the relative energy contained in the subharmonic. Often the period-n variations of a cyclic parameter are small, suggesting that “on average” the cyclic parameter has not changed. Furthermore, the auditory perception of the cyclic parameter (e.g., pitch in the case of Fo or loudness in the case of amplitude) may not have changed, but rather a dimension of roughness or some other quality has been added. Their frequencies are commensurate (in integer ratio) with the primary frequencies and may or may not be perceived as separate pitches. In contrast to period-n phonation or multiplophonia, the term multiphonia is used to sug- gest the presence of several independent phonations (sound sources). Thus, biphonia would contain two independent sources, such as the true vocal folds and the false vocal folds, and triphonia would contain three independent sound sources (perhaps the addition of a glottal whistle). Their frequen- cies would not have to be commensurate. However, different modes within the same sound source may also generate independent frequencies, making the identification of the sound sources a non- trivial matter. The term modulation is used to quantify the systematic change of a cyclic parameter (usu- ally frequency or amplitude) of a periodic signal. The periodic signal (usually a sinusoid) is called the carrier in communication theory. In phonation, the carrier is the sequence of periodic airflow pulses emitted from the glottis, and the modulation is the slower variation of cyclic parameters dis- cussed in the previous section. In radio communication, the entire voice signal modulates an elec- tronically generated sinusoid for wireless transmission (typically in the MHz range), suggesting that modulations can be stacked up (layered) upon each other. The carrier of one signal becomes the modulation of another.

SUMMARY STATEMENT 13

Perturbation Functions As before, let x i to be a cyclic variable of vocal fold vibration that has been extracted from the i-th vibratory cycle. A window of observation is defined, containing N cycles of vibration, so that the subscript i can range from 1 to N in the observation window. The mean value of the cyclic variable over the window of observation is defined as

If the mean value is intended to be a constant, as in steady vowel phonation, then a zeroth-order perturbation of the i-th cycle can be defined as

(The term zeroth-order is used because a constant is basically a zero-order pattern or trend). Higher- order perturbation functions are defined as the following finite differences:

In general, since the first subscript represents the order n of the perturbation function and the second subscript represents the i-th cycle, higher-order (n+1) perturbation functions are generated recur- sively as

14 WORKSHOP ON ACOUSTIC VOICE ANALYSIS

where K is a normalization factor that keeps the coefficient of x (^) i positive and unity in each perturba- tion function. Note that with this normalization, all perturbation functions are zero when x (^) i is a constant. The perturbation functions can be used to remove known or assumed trends in the cyclic parameter contour. The zeroth-order perturbation function removes nothing, the first order perturba- tion function removes a constant (the mean value x ), the second order function removes a linear trend, the third order function removes a quadratic trend, and so on. In general, the n-th order perturbation function removes a polynomial trend of order n-1 in the contour. Consider a linear trend as shown in Figure 3c. It is represented by the relation

where k is the rise per cycle. It is easily seen from equation (6) that P1i = k and that all higher-order perturbation functions in this example are zero. Thus, the first order perturbation function extracts the linear trend, whereas the higher order perturbation functions remove it. The second graph from the top in Figure 2 shows a second order perturbation function computed from a human voice. The scaling is smaller than that of the contour because it is an absolute scaling (+10% deviation from the mean value). Note that the short-term fluctuations of the contour are retained, but the long-term trends are removed. For example, the gradual downward slope of the F o contour in the beginning one-third of the utterance has been removed. So has the tremorous variation that is most noticeable in the middle of the contour. All that is left in the second-order perturbation is the short-term “noise”. If a linear trend is deliberately produced by the voice, such as a uniform Fo glide between two pitches in a specified amount of time, then k is a known quantity. It can simply be inserted into the perturbation formulas. For example, the first-order perturbation then becomes

which is now known as the deviation from a linear trend. If a linear trend is suspected as an inherent pattern, but k is not known, it can be computed from the data by linear regression. This is a well- known statistical procedure (Hays, 1988). Furthermore, all patterns with forward predictability (e.g., a sinusoid, a damped sinusoid, an exponential) can collectively be removed by linear predictive coding (LPC), with only random (or unpredictable) events remaining in the residual perturbation function. LPC analysis is based on the assumption that x (^) i can be predicted from a weighted sum of M previous samples,

where the a’s (the predictor coefficients) are determined by a linear least squares fit to the contour (Markel & Gray, 1976).

16 WORKSHOP ON ACOUSTIC VOICE ANALYSIS

Finally, the power spectrum of the parameter contour (bottom right) is a useful display of the dominant frequencies that modulate the contour. Note that a frequency of about 5 Hz stands out in this spectrum. This is the frequency of the small amount of vibrato in the voice. All other peaks in the power spectrum are at least 10 dB lower and do not represent significant components. Again, subharmonics, tremors, or any other modulations can easily be detected in this type of display. In summary, a cyclic parameter profile of the type shown in Figure 2 is a useful tool in voice analysis. It helps to quantify visually what is perceived aurally. A similar profile can be constructed for amplitude variation or for any other cyclic parameter (open quotient, maximum flow declination, skewing quotient, etc.).

Perturbation Measures A perturbation measure is an effective value of the overall perturbation in the cyclic con- tour. For example, the standard deviation from the mean is

This measure can also be identified as the root-mean-squared (RMS) value of the zeroth-order per- turbation function (recall equation 5). The mean rectified value , or mean absolute value , of the zeroth-order perturbation is de- fined as

This measure of perturbation is fundamentally not much different from σo , but it is a little easier to compute because it does not involve squares and square roots. Also, it does not weight outliers (large deviations from the mean) as heavily as σo because first-power terms rather than second-power terms are used in the summation. In general, a collection of perturbation measures can be written as

with δ 1 being the most frequently used measure in the literature. In Figure 2, σo has the value of 0.832%, σ 1 has the value of 0.419%, and δ 2 has the value of 0.316%.

SUMMARY STATEMENT 17

Both δn and σn are magnitude perturbation measures only. The squaring and absolute mag- nitude operations remove all information about the direction in which the cyclic variable deviates from the mean value. Consider again the four contours shown in Figure 3. They appear quite different visually but could all produce rather similar perturbation measures. The magnitude pertur- bation measures σn and δn tell us little about the patterns in the perturbations functions. They are totally insensitive to any regularity that may exist in the deviations. Indeed, the only pattern they relate to is a constant, the mean value x. This is a serious limitation for many applications in voice perturbation analysis because the patterns may reveal more about the nature of a disorder, or special voice characteristic, than a simple magnitude error measure. (For a more detailed discussion of magnitude versus directional perturbation measures, see Pinto & Titze, 1990). Several investigators have used a harmonics to noise ratio (Yumoto et al. 1982; Cox, 1989), a signal to noise ratio (Klingholz, 1987), or a normalized noise energy (Kasuya et al. 1986) to quan- tify the aperiodic portion of the voice signal. The harmonic energy is first defined as

where N is the number of cycles, T is the greatest period found among the N cycles, and f (^) A is the average acoustic waveform per cycle (obtained by padding all cycles to the maximum period with zeros and averaging point by point from event marker to event marker). The noise energy is then defined as

where f (^) i is the waveform in the i-th cycle, and the harmonics to noise ratio is

If the HNR is used as a perturbation measure, it needs to be noted that this measure is not specific to any cyclic parameter. Therein lies its asset as well as its liability. One cannot tell if the period, the amplitude, or the waveshape is perturbed. Simple Gaussian noise added to a periodic waveform can decrease the HNR, as will jitter or shimmer. Thus, the measure correlates best with an overall per- ception of “noisiness and roughness” in the signal, regardless of what the source might be. New approaches described by Qi (1992) and Qi et al. (1995) includes a time-base correction that mini- mizes the effect of jitter as a contributor to noise. Thus, these approaches begin to separate the sources of noise in the HNR measure.

SUMMARY STATEMENT 19

The following classification scheme is adopted here to recognize the nature of bifurcations in voice signals. The classification is central to all other considerations in acoustic voice analysis. It follows the general principles of nonlinear dynamics of coupled oscillators.

Type 1 signals - nearly-periodic signals that display no qualitative changes in the analysis segment; if modulating frequencies or subharmonics are present, their energies are an order of mag- nitude below the energy of the fundamental frequency.

Type 2 signals - signals with qualitative changes (bifurcations) in the analysis segment, or signals with subharmonic frequencies or modulating frequencies whose energies approach the en- ergy of the fundamental frequency; there is therefore no obvious single fundamental frequency throughout the segment.

Type 3 signals - signals with no apparent periodic structure.

A spectrogram is useful in making the classification. For example, Figure 6 shows a spec- trogram of a patient with hyperfunctional childhood dysphonia. The fundamental frequency is 300 Hz. Bifurcations can be seen to occur around 400 ms (the beginning of a period-3 phonation), around 900 ms (return to the original), and around 1100-1200 ms (beginning of a mixture between period-3 and period-4 phonation). The signal is therefore classified as type 2.

Figure 6. Narrow-band computer spectrogram for a patient with hyper- functional childhood dysphonia. Abrupt transitions to different phonatory regimes are visible, indicating bifurcations in vocal fold vibration.

20 WORKSHOP ON ACOUSTIC VOICE ANALYSIS

A fundamental frequency profile, similar to that of Figure 2, is shown for this dysphonic patient in Figure 7. Note that bifurcations can be identified in the F (^) o contour as segments where the Fo extractor is uncertain about the constant 298 Hz value. In two cycles the extracted Fo drops down to 98 Hz, close to the Fo /3 subharmonic. In one case, the extracted Fo jumps to 420 Hz. In general, Fo is extracted reliably only in the three segments where the waveform is nearly periodic. The second-order perturbation function has wild fluctuations. It is clear from this display that a single perturbation measure for the entire segment is meaningless and that the visual displays carry more information than can be characterized by a single number. As another example, analysis was performed on the waveform of a patient with unilateral laryngeal nerve paralysis (Figure 8). The waveform itself shows intermittent segments of low fre- quency modulation (segments b and d). The fundamental frequency is 285 Hz and the modulation frequency is 32 Hz. If only segments a, c, and d had been acquired and analyzed, the signal would have been classified type 1. As it is, it is clearly a type 2 signal.

Figure 7. Fundamental frequency (Fo ) profile for the patient with hyper- functional childhood dysphonia.

Workshop on Acoustic Voice Analysis, Lecture notes of Microwave Engineering and Acoustics

Related documents

Partial preview of the text

Download Workshop on Acoustic Voice Analysis and more Lecture notes Microwave Engineering and Acoustics in PDF only on Docsity!

orkshop on

Acoustic Voice

Analysis

SUMMARY STATEMENT

BY INGO R. TITZE, PH.D.

National Center for Voice and Speech

VV