Acoustic phonetics.pdf

Acoustic phonetics • The science to describe sound is known as acoustics. • The study of the physical properties of so

Views 200 Downloads 0 File size 1MB

Report DMCA / Copyright

DOWNLOAD FILE

Recommend stories

Citation preview

Acoustic phonetics

• The science to describe sound is known as acoustics. • The study of the physical properties of sound waves.

• Acoustic phonetics is concerned with describing the different kinds of acoustic signal that the movement of the vocal organs gives rise to in the production of speech.

Speech Production system

COMPONENTS OF SPEECH PRODUCTION

1.The system below the larynx (SUBGLOTTAL). 2.The larynx and the surrounding structure. 3.The structure and the airways above the larynx (SUPRAGLOTTAL).

THE SUBGLOTTAL SYSTEM • The trachea: 2.5cm2 cross-sectional area 10-12 cm in length (adults) • The bronchi: 2nos • Alveolar sacs: lies within the lungs • Lungs:vital capacity 3000-5000 cm3 maximum range of lung volume available excursions during normal breathing 1000 cm3

INSPIRATION The principal muscles for inspiration are: : contraction lowers the 1.Diaphragm diaphragm : contraction raises 2. External intercostals the ribcage

EXPIRATION The principal muscles for expiration are: 1. Internal intercostals : contraction pulls the

ribcage downwards 2. Abdominal muscles The elastic recoil of the lungs always contributes an expiratory force, but this force is augmented or reduced by the action of expiratory or inspiratory muscles.

THE LARYNX • The principal structure in the larynx that play a direct role in the production of speech are the vocal folds. 2 bands or cordlike • Vocal folds:

segments of tissue

• Length: 1.0 to 1.5 cm • thickness 2 to 3 mm • roughly parallel to each other in an antero-posterior direction.

• Vocal folds and ventricular folds are adducted(approximated) or abducted(separated) leaving a space between the vocal folds.

SOUNDS • Sound is a pattern of pressure variation that moves in wave from a source. • Sound waves are the means of acoustic energy transmission between a sound source and a sound receiver. • Pressure fluctuations move through space but each particle moves only a small distance.

• Sound is experienced when pressure fluctuations reach the eardrum and the auditory system translate these movement into neural impulses. • Sound waves perceived by human ears range from 20μPascal to 20Pascal.

Propagation of sound • A sound produced at a source sets up a sound wave that travel through the acoustic medium. • Sound waves are small differences in air pressure which diffuses in all directions. • An acoustic waveform is a record of sound-producing pressure fluctuation over time.

Types of waves • Transverse Wave: In a transverse wave the motion of the individual particles is perpendicular to the motion of the • wave. e.g. the mexican wave. • Longitudinal Wave: The motion of the individual particles is parallel to the motion of the wave. • e.g. sound waves.

• Sound wave consist of air pressure variations. • The speed at which these air pressure variations spread through a space is called the speed of sound. • The speed of sound depends on the density and the elasticity of the medium. • The speed of sound is around 344m/s in dry air of 21 degree Celsius.

Types of sound 1. Periodic sound: periodic sounds have a pattern that repeats at regular intervals. 2. Aperiodic sound: aperiodic sound do not have a regularly repeating pattern; they have either a random waveform or a pattern that doesn’t repeat.

Periodic Sounds

1.Simple periodic sounds. 2.Complex periodic sounds.

Simple periodic waves • Simple periodic waves are also called sine waves. • The name comes from a wave of this shape graphs the geometric sin function of an angle as it moves from 0° to 360° (one cycle). A cosine wave has the same shape as a sine wave, but begins at the maximum value (1) rather than 0.

• Any wave that

has the shape of a sine wave, regardless of differences in phase, is called a sinusoidal wave.

•They result from simple harmonic motion.

• Representation where the sound pressure is plotted vertically against a horizontal time axis, is called an oscillogram or waveform.

• In order to define a sine wave, we need to know three principal dimensions (properties):

1.Frequency 2.Amplitude 3.Phase

What is frequency? • The number of times the sinusoidal pattern repeats per unit. • Each repetition of the pattern is called a cycle. • The duration of a cycle is its period. • Frequency is expressed as cycle per second, which by convention is called Hertz (Hz)

One Cycle

Maximu m

Minimu m

• How do we get the frequency of a sine wave in Hertz. • Divide one second by the period(the duration of one cycle)

f= 1/T, where T is the period in seconds

Amplitude • The displacement of the vibrating medium from its rest position. • The maximal displacement from the zero line is known as amplitude. • It shows the vertical range of the waveform.

• The distance between a maximum and the next minimum is called peak-to-peak amplitude. • The higher the peak-to-peak the difference between the air pressure maxima and the air pressure minima is larger. • This means that the acoustic signal is perceived as being louder. • Amplitude is measured in terms of decibel(dB).

• Damping: The gradual loss of energy(and amplitude) from cycle to cycle is known as damping.

Phase • Phase: The exact position of a specific point in a waveform. • It is measured in terms of degrees.

Why frequency, amplitude important for acoustic phonetics? • Any oscillating system whose period and velocity have the inverse relationship defined above and captured by the sine waves are simple harmonic motion. • Systems that oscillate in Simple harmonic motion produce a simple tone.

• The mathematics of sinusoidal motions are well understood. • Sinusoidal waves can be described in terms of their frequency, the amplitude and the phase. • Phase is not usually that important for speech analysis. • So, if we know the frequency and amplitude of a sinusoid we know everything important there is to know about anything vibrating in simple harmonic motion.

• French mathematician Jean Baptiste Joseph Fourier proved in 1807 that every kind of vibration(including all complex speech sound) can be described as the sum of a set of simple sinusoid of varying frequencies and amplitude. • An understanding of sinusoidal motion defined by frequency and amplitude is the key to understanding all speech sounds.

Complex periodic waves • The result of adding sinusoids or simple periodic waves is a complex wave. • Complex waves are not sinusoidal itself, but it is periodic. • As complex waves is made up of some numbers of component frequencies, the basic frequency, the rate at which the whole patterns repeats, is called fundamental frequency(F0)

• F0 determines the pitch of a sound wave. • The loudness of the sound depends on both frequency and amplitude. • Given a F0, greater the overall amplitude, the louder the sound. • The component frequencies are called harmonics. • The different frequencies and amplitudes of the component harmonics give the sound its quality.

• The fundamental frequency is always equal to the greatest common factor of the complex frequency. • Component waves of 50Hz, 150Hz, and 250Hz will have a F0 of 50Hz.

Aperiodic waves • The moment-to-moment pressure variation are random, there are no repeating pattern. • A special category of aperiodic sound is transient. • Transient sounds are instantaneous, there is a momentary disturbance, not drawn out or repeated. • e.g. knock on the table, slamming of a door

Resonance • The reinforcement or prolongation of sound by reflection from a surface or by the synchronous vibration of a neighboring object. • Natural resonant frequency : Every object has a basic frequency, or a set of frequencies at which it will naturally oscillate when energy is applied.

• If an input frequency is synchronized with the natural frequency of any object, the two system are in resonance. • When energy is applied in resonance with a natural frequency, the amplitude of movement at that frequency is increased, because the two forces are acting together. • When energy is applied that is not in resonance with a natural frequency, that energy is quickly dissipated because the forces are cancelling each other out, and amplitude at that frequency dies out.

• Objects do not vibrate freely; they are tuned to resonate only to a narrow frequency band. • If the frequency of the sound from a source happens to match the natural resonant frequency of the object, the object will vibrate in resonance with the sound, passing along the pattern of vibration at a high amplitude; otherwise the sound energy dissipates and the vibration dies out.

• The resonating body thus acts as a filter, allowing only some frequencies to get through: resonant frequencies are amplified and the other frequencies are lost.

The Vocal tract as a sound producing device: source-filter theory • The vocal tract is a resonating system. • In a vowel sound the vibrating vocal folds provides the driving force, which induces resonance in the air trapped in the vocal tract. • The energy is output as sound. • This is known as the source-filter theory of speech sounds.

• Vocal tract sound source may be periodic or aperiodic. • The vibrating vocal folds provide a periodic source, which dominates in sonorants. • An aperiodic source is most important for Obstruents. • The turbulence created by a fricative, aspiration is sustained aperiodic noise. • The release burst of a stop is a transiant.

Source • Given the right amount of tension and the right amount of egressive airstream, the vocal folds vibrate. • The opening and closing of the vocal folds in the air column provides repeated burst of air pressure. • The complex vibration of the vocal folds provide rich source and generate waveforms composed of multiple harmonic frequencies.

• The complex movements of the vocal folds leads to complex signals, which carries frequencies far above the fundamental frequency. • This “richness of the source signal allows us to produce many different speech sounds from the same source signal by filtering it with the vocal tract.

Vocal Tract Filter • The vocal tract can be approximated by a cylindrical tube, which is open at one side(the pips) and virtually closed at the glottis. • The length and width of the tube determines the acoustic properties of the tube. • The bending of the tube has little effect on the acoustic properties

• The resonance frequencies of the vocal tracts are very important and are called the formant frequencies. • The formant frequencies are numbered and are named F1, F2, F3, etc. • The numbering of the formant frequencies have nothing to do with the fundamental frequency. • F0 is the property of the vocal fold vibration (the voice source) and the formant frequencies (F1, F2, F3,…) are properties of the vocal tract(the filter)

Formants • Formants are the property of the vocal tract itself, independent of whether a laryngeal source signal is present or not. • The shape of the vocal tract determines the formants, whether there is a source signal or not. • Formant frequencies do not always corresponds to the harmonics of the laryngeal signal.

• The position of the articulators determine the location of the formants. • Since, the formant frequencies of the depend on the vocal tract, it is possible to formulate some general rules about how the position of the articulators influences the formant frequencies on the basis of perturbation theory (Chiba and Kajiyama, 1941). • Perturbation theory is a way to compute (explain) whether the resonance frequency for an arbitrarily constricted tube are higher or lower than those for unconstricted cylindrical tube.

• As a rule of thumb, low vowels in the vowel quadrilateral have a high F1 and high vowels have a low. • Similarly, front vowels have a high F2 and back vowels have a low F2. • The terms low, high, front and back refer to positions in the vowel quadrilateral that reflect idealized tongue positions, i.e. an articulatory description. • Formant frequency values can serve as a basis for a rough classification of different vowels.

• It holds across different speakers, languages, and dialects (Peterson & Barney, 1952)

Acoustics of vowels

• F1 correlates with size of pharyngeal cavity and degree of lip opening(when the tongue high, the pharyngeal cavity is larger, as in [i], resulting in lower F1) -- Vowel openness or height • F2 correlates with the length of the oral cavity -- frontness/backness(the longer the oral cavity - due to the more retracted tongue - the lower F2) • Lip rounding protracts the oral cavity and thus will decrease F2

Identifying vowel quality based on formant frequencies • – F1 is inversely related to height • – F2 is related to frontness/backness • – (Lip) rounding lowers formant values (esp. F2)

Acoustic property of Consonants

• • •



Four acoustic properties of plosives   1.      Duration of stop gap – silent period in the closure phase i.e. the closure duration of /p, t, k/ are longer than /b, d, g/ 2.      Voicing bar – a dark bar that is shown at the low frequencies and it’s usually below 200Hz i.e. only for voiced plosives /b, d, g/ , which is a primary indicator of voicing in the spectrogram, and all kinds of voiced sounds, including vowels, show this voicing bar at such low frequencies

• 3.      Release burst – a strong vertical spike • i.e. In general, we observe a stronger spike for /p, t, k/ than for /b, d, g/ • 4.      Aspiration – a short frication noise before vowel formants begin and it is usually in 30ms • i.e. /p, t, k/ of stressed syllable in initial position e.g. /ph/ in pin. 

• Aspiration is not the same as the release burst. The period of aspiration (which only some voiceless plosives have) is much longer than the very short release burst (which all released plosives have). • High-intensity noise of  / p / and / b / appears in the range of 3,000-5,000Hz

Voiced stops identified using formant transitions

Voiced stops identified using formant transitions

Fricatives • Fricatives can be divided into sibilants versus non-sibilants.  • Sibilants include [s,ʃ, z, ʒ]. Sibilants involve a turbulent airstream that strikes an obstacle, such as the teeth. • non-sibilants involve turbulence at the site of constriction sibilants tend to be louder than non-sibilants.

• Most of their acoustic energy occurs at higher frequencies, e.g. the bulk of the turbulence of both /s/ and /z/occurs above 3500Hz, and reaching as high as 10,000 Hz, and /ʃ/ has most of its acoustic energy from around 2000 Hz up to 10,000 Hz. • Voiced fricatives show aspects of both regular vocal fold vibrations and a randomly turbulent airstream. Different from their voiceless counterparts, the voiced fricatives have a substantial voicing bar occupying approximately the lower 400 Hz.

• The typical properties of /f/ include high frequency turbulence concentrated between 30004000Hz . • The voiced labiodental fricative /v/ also shows high frequency turbulence focused above 4,000 Hz, but it is not stronger than /f./ • There is no voicing bar with /f/, but there is a substantial voicing bar of /v/ occupying approximately the lower 400 Hz.

• fricative / h /, is voiceless. There is no voicing bar for /h/, and its turbulence appears to be strongest around 1000 Hz.

approximant  Like vowels, approximants areː •  highly resonant •  produced with a relatively open vocal tract • characterised by identifiable formant structures • continuant sounds since there is no occlusion or momentary stoppage of the airstream • non-turbulent due to lack of constriction •  oral sounds

• They have faint formant structures that they all have a low F1(below 1000Hz) as they are voiced consonants. •  /w/, a large downward transition of F2 is characteristic due to the back tongue constriction. • Lip rounding lowers the intensity of all formants particularly F3. • So /w/ has F1 (250-450Hz), F2 (600 850Hz), and F3 (2000 - 2400Hz). 

• /j/, the tongue is in the position for a front half close to close vowel (depending on the degree of openness of the following sound).  • Therefore it has a similar formant pattern to /i/. • Lips are neutral to spread but rounded in anticipation of round vowels. It has a low F1 (200 - 300Hz) and a high F2 (1850 - 2100Hz) and F3 (2620 - 3050Hz)

• /r/ is characterized by very low F3 due to retroflex articulation, which is usually below 2000Hz, sometimes, falling to as low as 1500Hz. • The frequency of F1 appears to be related to lip rounding. i.e. low F1 = lip round • /r/ normally has F1 (300-350Hz), F2 (1000-1200Hz) and F3(1600 -1750Hz).

• For /l/, F1 is low and there is no continuous transition at vowel junctures. • F1 approx 200 - 400 Hz - F1 rises to all vowel targets except high front. • F2 approx 950-1500Hz (lowest for back vowels) • F3 approx 2700-3200.

Nasals

• The formants of all these three nasals are not as dark as they are in vowels.   • The frequency of F1 is very low (200450 Hz) and the F3 is more visible (2500Hz). F2 is generally not visible • [m] shows a fairly level F1 with a downward sloping F2. • [n] shows a downward slope for both F1 and F2. • [ŋ] shows an upward direction for F2 and a downward direction for F3.