Download Notes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Animal echolocation wikipedia , lookup

Speech perception wikipedia , lookup

Perception of infrasound wikipedia , lookup

Sensory cue wikipedia , lookup

Sound localization wikipedia , lookup

Transcript
Machines Without
Screens
Part of the Topics in Computing Series of Lectures
Dr. D. Fitzpatrick
Friday, 2 November 2007
In this Lecture:
•
•
•
•
How some of our senses work
Synthetic Speech and how it works
Describing Maths using speech
3D audio, and Force Feedback
How the Eye Works
• Light rays enter the eye through the cornea.
• The cornea takes widely diverging rays of light
and bends them through the pupil
• The lens of the eye is located immediately behind
the pupil. The purpose of the lens is to bring the
light into focus upon the retina, the membrane
containing photoreceptor nerve cells that lines the
inside back wall of the eye.
• The photoreceptor nerve cells of the retina change
the light rays into electrical impulses and send
them through the optic nerve to the brain
How People Read
• not a linear progression
• use of Sacades and fixations
• consequence? Eye tracks to highlighted, or
other key portions of the page.
How the Ear Works
• The outer ear or pinna (plural pinnae) leads
to the middle ear’s auditory canal or meatus.
• The auditory canal terminates with the ear
drum, or tympanic membrane.
• Beyond the ear drum is the inner ear, which
contains the hidden parts of the ear encased
in bone .
How the Ear Works II
• There are semicircular canals, and three liquidfilled passages that are associated with equilibrium
rather than hearing.
• They tell us about the orientation of the head
• cause us to get dizzy when they are
malfunctioning
• cause some of us to get seasick when the head,
body and eyes undergo motional disturbances.
• The three little bones of the air-filled middle ear
which are attached to the eardrum, excite
vibrations in the cochlea, the liquid-filled inner
ear.
How we hear Sound
• In the cochlea the vibrations of sound are
converted into nerve impulses which travel along
the auditory nerve, toward the brain
• The purpose of the auditory canal is to guide
sound waves to the ear drum. The pinna acts as a
collector of sound from the outside world, and
also acts as a directional filter.
• The intensity of a sound wave in the auditory
canal is proportional to the intensity of the sound
wave that approaches the listener.
Sound Waves
• We are immersed in an ocean of air.
• The snapping of fingers, speaking, singing,
plucking a string or blowing a horn set up a
vibration in the air.
• The sound wave travels outward from
– the source as a spherical wavefront
– It is a longitudinal wave
– In contrast, waves in a stretched string are transverse
waves
How fast does the sound wave
travel?
• If the air temperature is 20 degrees Celsius
a sound wave travels at a velocity of 344
metres or 1,128 feet a second
• Sound travels in helium almost 3 times as
fast as in air, and longitudinal sound waves
can travel through metals and other solids
far faster.
How Do We Hear?
• The sound waves that travel through the air cause
components of our ears to vibrate in a manner
similar to those of the sound source.
• What we hear grows weaker with distance from
the source, because the area of the spherical wave
front increases as the square of the distance from
the source, and power of the source wave is spread
over that increasing surface.
• What actually reaches our ears is complicated by
reflections from the ground and other objects.
Role of Speech
• Primary mode of communication
• Convey emotional content
Problems with synthetic speech
• Monotonous; basically uninflected speech
• Not possible to convey emotional content
• Consequence? very boring...
What is speech?
• speech can be decomposed into three primary
components:
– frequency, amplitude and time.
• “Frequency is the term used to describe the
vibration of air molecules caused by a vibrating
object...which are set in motion by an egressive
flow of air during phonatation.” measured in
Hertz (Hz).
• Speech not as simple as other acoustic sounds: can
contain many elements vibrating at different
frequencies.
• frequency of repetition referred to as the
fundamental frequency f0.
What is Speech? II
• Amplitude: The acoustic component which gives
the perception of loudness.
– “the maximal displacement of a particle from its place
of rest”
– measured in decibels
• Duration: the third component in the acoustic
view.
– The measurement, along the time-line of the speech
signal
Introducing Prosody
• Simple description: Inflection
• that set of features which lasts longer than a
single speech sound.
• “. . . those auditory components of an
utterance which remain, once segmental as
well as non-linguistic as well as
paralinguistic vocal effects have been
removed”
What will it Sound Like?
• The aim is to discard the monotone
• E.g: If emboldened text is found:
1. Rate will slow (Duration)
2. Pitch range will increase (F0)
3. Volume will increase (Amplitude)
• Most structural and font information will be
conveyed by prosody
Speaking Text Attributes
• Major headings read as section x.
– Slower rate, they have a lower average pitch, a lower
baseline fall.
• Minor headings read as x.x,
– Same slower rate, lower average pitch, lower baseline
fall
• Emphasis increase pitch range, increase accent
height, minimise smoothness, maximise richness,
increase amplitude where possible.
Speaking Maths
• We intend to use prosodic changes to
convey equations:
1. The prosodic system is already familiar
2. Prosody is capable of expressing
mathematical material
3. All we have to do is...match the prosody
to the maths!!
Mathematical Prosody
• Equations resemble a tree when broken
down
• Nested levels conveyed by:
1. use of parentheses or brackets
2. juxtaposition of symbols; vertical &
horizontal
Linearity
• Only the most simple math is linear
– a=b*c
– This is easy to represent in a linear fashion
– Unfortunately, math doesn’t stop here, though many
wish it did?
– Now try
a=b*c-d
– Still linear - well sort of!
Linearity
• Using implicit hierarchy rules it would be
understood to be
a = (b * c) - d
• But what if we really wanted
a = b * (c - d)
Linearity
• These simple equations are still considered to
be linear in nature
• But, linearity has a very short half life when
learning math
• Math rapidly becomes two-dimensional
• Representing that non-visually becomes
difficult
Linearity
Linearity
• This relatively simple equation could be represented
a = sqrt(((x super 2 base) - y) / z)
• Essentially, using parentheses, we can represent
ANY equation in a linear fashion, BUT???
• Speech is a basically linear representation
Designing a Browser?
• What is the goal of a Math Browser?
– To allow users to traverse an equation
•
•
•
•
•
In whole or in parts
Forwards or backwards
Upwards or downwards
Under user control
To convey structure and semantics
3D Audio
• Surround sound has amazed film watchers for
years immersing the audience in a full experience.
• However, surround sound systems do not provide
a true 3D reconstruction of a setting.
• Surround sound provides location of sources
within a single plane, a true reconstruction would
include all possible planes, the entire sound field.
3D Audio II
• Through research it has been learned that the
physical shape of the ears and head affect how we
perceive sound.
• To perfectly record a sound field, microphones
must either be placed in the ears, or a model of the
human head complete with ear canals can be
created with microphones inside the ear cavities.
• These two audio channels can then be driven to
headphones creating a very life-like reproduction
of the 3D sound environment.