



Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity
Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium
Prepara tus exámenes
Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity
Prepara tus exámenes con los documentos que comparten otros estudiantes como tú en Docsity
Encuentra los documentos específicos para los exámenes de tu universidad
Estudia con lecciones y exámenes resueltos basados en los programas académicos de las mejores universidades
Responde a preguntas de exámenes reales y pon a prueba tu preparación
Consigue puntos base para descargar
Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium
Comunidad
Pide ayuda a la comunidad y resuelve tus dudas de estudio
Ebooks gratuitos
Descarga nuestras guías gratuitas sobre técnicas de estudio, métodos para controlar la ansiedad y consejos para la tesis preparadas por los tutores de Docsity
This tutorial provides an introduction to speech analysis, focusing on oscillograms, fundamental frequency, and spectrograms. It explains the physical representation of speech signals, the importance of instrumental analysis, and the use of these representations to understand speech sounds.
Tipo: Apuntes
1 / 5
Esta página no es visible en la vista previa
¡No te pierdas las partes importantes!




http://www.ling.lu.se/research/speechtutorial/tutorial.htmlhttp://www.ling.lu.se/research/speechtutorial/tutorial.htmlhttp://www.ling.lu.se/research/speechtutorial/tutorial.htmlhttp://www.ling.lu.se/research/speechtutorial/tutorial.html
Denes, P.B., Pinson, E.N., The speech chain , 1973 Ladefoged, P., A course in phonetics , 1982. Catford, J.C., A practical introduction to phonetics , 1982.
Phonetics is part of the linguistic sciences. It is concerned with the sounds produced by the human vocal organs, and more specifically, the sounds which are used in human speech. One important aspect of phonetic research is the instrumental analysis of speech. This is often referred to as experimental phonetics , or machine phonetics.
The instrumental analysis is performed using one or many of the available instruments. These include X-ray photography and film, air-flow tubes, electromyography (EMG), spectrografs, mingografs, laryngographs etc. The aim for most of these methods is to visualize the speech signal in some way, and to try and capture some aspects of the speech signal on paper or on a computer screen. Today the computer is the most readily available and used tool. With the computer the analysis process is much simpler and usually faster than with other tools, however, it does not necessarily produce a result of higher quality. In this tutorial we will look at and try to explain the most common ways of speech analysis and visualization.
OOOOSCILLOGRAM (WAVEFORM)SCILLOGRAM (WAVEFORM)SCILLOGRAM (WAVEFORM)SCILLOGRAM (WAVEFORM)
Physically the speech signal (actually all sound) is a series of pressure changes in the medium between the sound source and the listener. The most common representation of the speech signal is the oscillogram, often called the waveformwaveformwaveformwaveform. In this the time axis is the horizontal axis from left to right and the curve shows how the pressure increases and decreases in the signal. The utterance we have used for demonstration is "phonetician""phonetician""phonetician""phonetician", American English, spoken by a male adult. The utterance has been transcribed using the IPA phonetic alphabet, which is the most commonly used. The signal has also been segmented, such that each phoneme in the transcription has been aligned with its corresponding sound event. Note that the nine vertical lines are not part of the speech signal, it is the segmentation points.
SSSSPECTRUMPECTRUMPECTRUMPECTRUM
According to general theories each periodical waveform may be described as the sum of a number of simple sine waves, each with a particular amplitude, frequency and phase. The spectrum gives a picture of the distribution of frequency and amplitude at a moment in time. Note that this picture does not have a time scale. Instead, the horizontal axis represents frequency, and the vertical axis amplitude. If we want to plot the spectrum as a function of time we need a way of representing a three-dimensional diagram, one such representation is the spectrogram. The picture shows the spectrum 0.15 seconds into the utterance, in the beginning of the "o" vowel.
SSSSPECTROGRAMPECTROGRAMPECTROGRAMPECTROGRAM
In the spectrogram the time axis is the horizontal axis, and frequency is the vertical axis. The third dimension, amplitude, is represented by shades of darkness. Consider the spectrogram to be a number of spectrums in a row, looked upon "from above", and where the highs in the spectra are represented with dark spots in the spectrogram. From the picture it is obvious how different the speech sounds are from a spectral point of view. In the unvoiced fricative sounds, the energy is concentrated high up in the frequency band, and quite disorganized (noise-like) in its appearance. In other unvoiced sounds, e.g. the plosives, much of the speech sound actually consists of silence until strong energy appears at many frequency bands, as an "explosion". The voiced sounds appear more organized. The spectrum highs (dark spots) actually form horizontal bands across the spectrogram. These bands represent frequencies where the shape of the mouth gives resonance to sounds. The bands are called formants, and are numbered from the bottom up as F1, F2, F3 etc. The positions of the formants are different for different sounds and they can often be predicted for each phoneme.