Download a survey of computer graphics contents

Document related concepts

InfiniteReality wikipedia , lookup

3D television wikipedia , lookup

Indexed color wikipedia , lookup

Graphics processing unit wikipedia , lookup

Image editing wikipedia , lookup

Stereo display wikipedia , lookup

Original Chip Set wikipedia , lookup

Stereoscopy wikipedia , lookup

Computer vision wikipedia , lookup

Molecular graphics wikipedia , lookup

Framebuffer wikipedia , lookup

Hold-And-Modify wikipedia , lookup

Video card wikipedia , lookup

Spatial anti-aliasing wikipedia , lookup

Waveform graphics wikipedia , lookup

BSAVE (bitmap format) wikipedia , lookup

Apple II graphics wikipedia , lookup

Tektronix 4010 wikipedia , lookup

2.5D wikipedia , lookup

Aims and objectives
History of Computer Graphics
Applications of Computer Graphics
Computer Aided Design
Computer Aided Manufacturing
Medical Content Creation
Visualizing Complex Systems
Graphical User Interface
Three-dimensional Graphical User Interfaces
Let us Sum Up
Lesson-end Activities
Points for Discussion
Model answers to “Check your Progress”
Aims and Objectives
The aim of this lesson is to learn the introduction, history and various applications of
computer graphics
The objectives of this lesson are to make the student aware of the following concepts
a. History of Computer Graphics
b. Applications of Computer Graphics
c. Graphical User Interface
Computer Graphic is the discipline of producing picture or images using a computer
which include modeling, creation, manipulation, storage of geometric objects, rendering,
converting a scene to an image, the process of transformations, rasterization, shading,
illumination, animation of the image, etc. Computer Graphics has been widely used in
graphics presentation, paint systems, computer-aided design (CAD), image processing,
simulation, etc. From the earliest text character images of a non-graphic mainframe
computers to the latest photographic quality images of a high resolution personal
computers, from vector displays to raster displays, from 2D input, to 3D input and
beyond, computer graphics has gone through its short, rapid changing history. From
games to virtual reality, to 3D active desktops, from unobtrusive immersive home
environments, to scientific and business, computer graphics technology has touched
almost every concern of our life. Before we get into the details, we have a short tour
through the history of computer graphics
History of Computer Graphics
In the 1950’s, output are via teletypes, lineprinter, and Cathode Ray Tube (CRT).
Using dark and light characters, a picture can be reproduced. In the 1960’s, beginnings
of modern interactive graphics, output are vector graphics and interactive graphics. One
of the worst problems was the cost and inaccessibility of machines. In the early 1970’s,
output start using raster displays, graphics capability was still fairly chunky. In the
1980’s output are built-in raster graphics, bitmap image and pixel. Personal computers
costs decrease drastically; trackball and mouse become the standard interactive devices.
In the 1990’s, since the introduction of VGA and SVGA, personal computer could easily
display photo-realistic images and movies. 3D image renderings became the main
advances and it stimulated cinematic graphics applications. Table 1: gives a general
history of computer graphics.
Table 1: General History of Computer Graphics
Inventions, discovery and findings
Ben Laposky created the first graphic images, an Oscilloscope, generated by an electronic
(analog) machine. The image was produced by manipulating electronic beams and
recording them onto high-speed film.
1) UNIVAC-I: the first general purpose commercial computer, crude hardcopy devices,
and line printer pictures.
2) MIT – Whirlwind computer, the first to display real time video, and capable of
displaying real time text and graphic on a large oscilloscope screen.
William Fetter coins the computer graphics to describe new design methods.
Steve Russel developed Spacewars, the first video/computer game
1) Douglas Englebart developed first mouse
2) Ivan Sutherland developed Sketchpad, an interactive CG system, a man-machine
graphical communication system with pop-up menus, constraint-based drawing,
hierarchical modeling, and utilized lightpen for interaction. He formulated the ideas of
using primitives, lines polygons, arcs, etc. and constraints on them; He developed the
dragging, rubberbanding and transforming algorithms; He introduced data structures
for storing. He is considered the founder of the computer graphics.
William Fetter developed first computer model of a human figure
Jack Bresenham designed line-drawing algorithm
1) Tektronix – a special CRT, the direct-view storage tube, with keyboard and mouse, a
simple computer interface for $15, 000, which made graphics affordable
2) Ivan Sutherland developed first head-mounted display
John Warnock – area subdivision algorithm, hidden-surface algorithms
Bell Labs – first framebuffer containing 3 bits per pixel
Nolan Kay Bushnell – Pong, video arcade game
John Whitney. Jr. and Gary Demos – “Westworld”, first film with computer graphics
Edwin Catmuff –texture mapping and Z-buffer hidden-surface algorithm
James Blinn – curved surfaces, refinement of texture mapping
Phone Bui-Toung – specular highlighting
Martin Newell – famous CG teapot, using Bezier patches
Benoit Mandelbrot – fractal/fractional dimension
James Blinn – environment mapping and bump mapping
Steve Wozniak -- Apple II, color graphics personal computer
Roy Trubshaw and Richard Bartle – MUD, a multi-user dungeon/Zork
Steven Lisberger – “Tron”, first Disney movie which makes extensive use of 3-D graphics
Tom Brighman – “Morphing”, first film sequence plays a female character which deforms
and transforms herself into the shape of a lynx.
John Walkner and Dan Drake – AutoCAD
Jaron Lanier – “DataGlove”, a virtual reality film.
Wavefron tech. – Polhemus, first 3D graphics software
Pixar Animation Studios – “Luxo Jr.”, 1989, “ Tin toy”
NES – Nintendo home game system
IBM – VGA, Video Graphics Array introduced
Video Electronics Standards Association (VESA) – SVGA, Super VGA formed
Hanrahan and Lawson – Renderman
Disney and Pixar – “Beauty and the Beast”, CGI was widely used, Renderman systems
provides fast, accurate and high quality digital computer effects.
Silicon Graphics – OpenGL specification
University of Illinois -- Mosaic, first graphic Web browser
Steven Spielberg – “Jurassic Park” a successful CG fiction film.
Buena Vista Pictures – “Toy Story”, first full-length, computer-generated, feature film
NVIDIA Corporation – GeForce 256, GeForce3(2001)
ID Software – Doom3 graphics engine
Applications of Computer Graphics
We have a short tour through the applications of computer graphics.
1.4.1 Computer Aided Design
Computer-aided design (CAD) is use of a wide range of computer based tools that
assist engineers, architects and other design profession in their design activities. It is the
main geometry authoring tool within the Product Lifecycle Management process and
involves both software and sometimes special-purpose hardware. Current packages range
from 2D vector base drafting systems to 3D solid and surface modellers.
The CAD Process
CAD is used to design, develop and optimize products, which can be goods used
by end consumers or intermediate goods used in other products. CAD is also extensively
used in the design of tools and machinery used in the manufacture of components, and in
the drafting and design of all types of buildings, from small residential types (houses) to
the largest commercial and industrial structures (hospitals and factories).
CAD is mainly used for detailed engineering of 3D models and/or 2D drawings of
physical components, but it is also used throughout the engineering process from
conceptual design and layout of products, through strength and dynamic analysis of
assemblies to definition of manufacturing methods of components.
CAD has become an especially important technology, within the scope of
Computer Aided technologies, with benefits such as lower product development costs
and a greatly shortened design cycle. CAD enables designers to layout and develop work
on screen, print it out and save it for future editing, saving time on their drawings.
The capabilities of modern CAD systems include (a) Wireframe geometry
creation, (b) 3D parametric feature based modelling, Solid modeling, (c) Freeform
surface modeling, (d) Automated design of assemblies, which are collections of parts
and/or other assemblies, (e) create Engineering drawings from the solid models, (f) Reuse
of design components, (g) Ease of modification of design of model and the production of
multiple versions, (h) Automatic generation of standard components of the design, (i)
Validation/verification of designs against specifications and design rules, (j) Simulation
of designs without building a physical prototype, (k) Output of engineering
documentation, such as manufacturing drawings, and Bills of Materials to reflect the
BOM required to build the product, (l) Import/Export routines to exchange data with
other software packages, (m) Output of design data directly to manufacturing facilities,
(n) Output directly to a Rapid Prototyping or Rapid Manufacture Machine for industrial
prototypes, (o) maintain libraries of parts and assemblies, (p) calculate mass properties of
parts and assemblies, (q) aid visualization with shading, rotating, hidden line removal,
etc..., (r) Bi-directional parametric association (modification of any feature is reflected in
all information relying on that feature; drawings, mass properties, assemblies, etc... and
counter wise), (s) kinematics, interference and clearance checking of assemblies, (t) sheet
metal, (u) hose/cable routing, (v) electrical component packaging, (x) inclusion of
programming code in a model to control and relate desired attributes of the model, (y)
Programmable design studies and optimization, (z) Sophisticated visual analysis routines,
for draft, curvature, curvature continuity...
Originally software for CAD systems were developed with computer language
such as Fortran, but with the advancement of object-oriented programming methods this
has radically changed. Typical modern parametric feature based modeler and freeform
surface systems are built around a number of key C programming language modules with
their own APIs.
Today most CAD computer workstations are Windows based PCs; some CAD
systems also run on hardware running with one of the Unix operating systems and a few
with Linux. Some CAD systems such as NX provide multiplatform support including
Windows, LINUX, UNIX and Mac OSX.
CAD of Jet Engine
CAD and Rapid Prototyping
Parachute Modeling and Simulation
virtual 3-D interiors (Virtual Environment)
CAD design
CAM(jewelry industry)
CAD robot
Generally no special hardware is required with the exception of a high end
OpenGL based Graphics card; however for complex product design, machines with high
speed (and possibly multiple) CPUs and large amounts of RAM are recommended. The
human-machine interface is generally via a computer mouse but can also be via a pen and
digitizing graphics tablet. Manipulation of the view of the model on the screen is also
sometimes done with the use of a spacemouse/SpaceBall. Some systems also support
stereoscopic glasses for viewing the 3D model.
1.4.2 Computer Aided Manufacturing
Since the age of the Industrial Revolution, the manufacturing process has
undergone many dramatic changes. One of the most dramatic of these changes is the
introduction of Computer Aided Manufacturing (CAM), a system of using computer
technology to assist the manufacturing process.
Through the use of CAM, a factory can become highly automated, through
systems such as real-time control and robotics. A CAM system usually seeks to control
the production process through varying degrees of automation. Because each of the many
manufacturing processes in a CAM system is computer controlled, a high degree of
precision can be achieved that is not possible with a human interface.
The CAM system, for example, sets the toolpath and executes precision machine
operations based on the imported design. Some CAM systems bring in additional
automation by also keeping track of materials and automating the ordering process, as
well as tasks such as tool replacement.
Computer Aided Manufacturing is commonly linked to Computer Aided Design
(CAD) systems. The resulting integrated CAD/CAM system then takes the computergenerated design, and feeds it directly into the manufacturing system; the design is then
converted into multiple computer-controlled processes, such as drilling or turning.
Another advantage of Computer Aided Manufacturing is that it can be used to
facilitate mass customization: the process of creating small batches of products that are
custom designed to suit each particular client. Without CAM, and the CAD process that
precedes it, customization would be a time-consuming, manual and costly process.
However, CAD software allows for easy customization and rapid design changes: the
automatic controls of the CAM system make it possible to adjust the machinery
automatically for each different order.
Robotic arms and machines are commonly used in factories, but these do still
require human workers. The nature of those workers' jobs change however. The repetitive
tasks are delegated to machines; the human workers' job descriptions then move more
towards set-up, quality control, using CAD systems to create the initial designs, and
machine maintenance.
1.4.3 Entertainment
One of the main goals of todays special effects producers and animators is to
create images with highest levels of photorealism. Volume graphics is the key technology
to provide full immersion in upcoming virtual worlds e.g. movies or computer games.
Real world phenomena can be realized best with true physics based models and volume
graphics is the tool to generate, visualize and even feel these models! Movies like Star
Wars Episode I, Titanic and The Fifth Element already started employing true physics
based effects.
Entertainment Games
1.4.4 Medical Content Creation
Medical content creation has become more and more important in entertainment
and education in the last years. For instance, virtual anatomical atlas on CD-ROM and
DVD have been build on the base of the NIH Visible Human Project data set and
different kind of simulation and training software were build up using volume rendering
techniques. Volume Graphics' products like the VGStudio software are dedicated to the
used in the field of medical content creation. VGStudio provides powerful tools to
manipulate and edit volume data. An easy to use keyframer tool allows to generate
animations, e.g. flights through any kind of volume data. In addition VGStudio provides
highest image quality and unsurpassed performance already on a PC!
Images of a fetus rendered by a V.G. Studio MAX user.
1.4.5 Advertisement
Voxel data can be used to visualize the most fascinating and complex facts in the
world. The visualization of the human body and medical content creation is an example.
Voxel data sets like CT or MRI scans or the exciting Visible Human data show all the
finest details up to the gross structures of the human anatomy. Images rendered by
Volume Graphics 3D graphics software are already used for US TV productions as well
as for advertising. Volume Graphics cooperates with companies specialized on Video and
TV productions as well as with advertising agencies.
Neutron Radiography of a car engine
1.4.6 Visualization
Visualization is any technique for creating images, diagrams, or animations to
communicate a message. Visualization through visual imagery has been an effective way
to communicate both abstract and concrete ideas since the dawn of man.
Visualization today has ever-expanding applications in science, engineering
Product visualization, all forms of education, interactive multimedia, medicine etc.
Typical of a visualization application is the field of computer graphics. The invention of
computer graphics may be the most important development in visualization. The
development of animation also helped advance visualization.
Visualization of how a car deforms in an
asymmetrical crash using finite element analysis.
Computer aided Learning
Visualization is the process of representing data as descriptive images and,
subsequently, interacting with these images in order to gain additional insight into the
data. Traditionally, computer graphics has provided a powerful mechanism for creating
and manipulating these representations. Graphics and visualization research addresses the
problem of converting data into compelling and revealing images that suit users’ needs.
Research includes developing new representations of 3D geometry, choosing appropriate
graphical realizations of data, strategies for collaborative visualization in a networked
environment using three dimensional data, and designing software systems that support a
full range of display formats ranging from PDAs to immersive multi-display visualization
1.4.7 Visualizing Complex Systems
Graphic images and models are proving not only useful, but crucial in many
contemporary fields dealing with complex data. Only by graphically combining millions
of discrete data items, for example, can meteorologists track weather systems, including
hurricanes that may threaten thousands of lives. Theoretical physicists depend on images
to think about events like collisions of cosmic strings at 75 percent of the speed of light,
and chaos theorists require pictures to find order within apparent disorder. Computeraided design systems are critical to the design and manufacture of an extensive range of
contemporary products, from silicon chips to automobiles, in fields ranging from space
technology to clothing design.
Computer systems, on which we all increasingly depend, are also becoming more
and more visually oriented. Graphical user interfaces are the emerging standard, and
graphic tools are the heart of contemporary systems analysis, identifying and preventing
critical errors and omissions that might otherwise not be evident until the system is in
daily use. Graphic computer-aided systems engineering (CASE) tools are now used to
build other computer systems. Recent research indicates that visual computer
programming produces better comprehension and accuracy than do traditional
programming languages based on words, and commercial visual programming packages
are now on the market.
Medical research and practice offer many examples of the use of graphic tools
and images. Conceptualizing the deoxyribonucleic acid (DNA) double helix permitted
dramatic advances in genetic research years before the structure could actually be seen.
Computerized imaging systems like computerized tomography (CT) and magnetic
resonance imaging (MRI) have produced dramatic improvements in the diagnosis and
treatment of serious illness, and a project compiling a three-dimensional cross-section of
the human body provides a new approach to the study of anatomy. X-rays, venerable
medical imaging tools, are now being combined with expert systems to help physicians
identify other cases similar to those they are handling, suggesting additional diagnostic
and treatment information relevant to patients.
Sociologists and social psychologists use graphic tools extensively in their
research programs. They often turn to sociograms and other visual tools to present and
explain concepts extracted from complex statistical analyses and to identify meaningful
patterns in the data. Graphic depiction of exchange networks permits the study of changes
among groups over time. Another useful approach is Bales's Systematic Multiple Level
Observation of Groups (SYMLOG), which provides a three-dimensional graphic
representation of friendliness, instrumental-versus-expressive orientation, and dominance
in small groups.
Graphic visualization has demonstrated utility for organizing information
effectively and coherently in a broad range of fields dealing with complex data. Social
work deals with similarly (and sometimes more) complex patterns and contextual
situations, and, in fact, social work and related disciplines have discovered the utility of
images for conceptualizing and communicating about clinical practice.
1.5 Graphical user interface
A graphical user interface (GUI) is a type of user interface which allows people to
interact with a computer and computer-controlled devices which employ graphical icons,
visual indicators or special graphical elements called "widgets", along with text, labels or
text navigation to represent the information and actions available to a user. The actions
are usually performed through direct manipulation of the graphical elements.
The precursor to graphical user interfaces was invented by researchers at the
Stanford Research Institute, led by Douglas Engelbart. They developed the use of textbased hyperlinks manipulated with a mouse for the On-Line System. The concept of
hyperlinks was further refined and extended to graphics by researchers at Xerox PARC,
who went beyond text-based hyperlinks and used a GUI as the primary interface for the
Xerox Alto computer. Most modern general-purpose GUIs are derived from this system.
As a result, some people call this class of interface a PARC User Interface (PUI) (note
that PUI is also an acronym for perceptual user interface).
Following PARC the first commercially successful GUI-centric computer
operating models were those of the Apple Lisa but more successfully that of Macintosh
System graphical environment. The graphical user interfaces familiar to most people
today are Microsoft Windows, Mac OS X, and the X Window System interfaces. IBM
and Microsoft used many of Apple's ideas to develop the Common User Access
specifications that formed the basis of the user interface found in Microsoft Windows,
IBM OS/2 Presentation Manager, and the Unix Motif toolkit and window manager. These
ideas evolved to create the interface found in current versions of the Windows operating
system, as well as in Mac OS X and various desktop environments for Unix-like systems.
Thus most current graphical user interfaces have largely common idioms.
Graphical user interface design is an important adjunct to application
programming. Its goal is to enhance the usability of the underlying logical design of a
stored program. The visible graphical interface features of an application are sometimes
referred to as "chrome". They include graphical elements (widgets) that may be used to
interact with the program. Common widgets are: windows, buttons, menus, and scroll
bars. Larger widgets, such as windows, usually provide a frame or container for the main
presentation content such as a web page, email message or drawing. Smaller ones usually
act as a user-input tool.
The widgets of a well-designed system are functionally independent from and
indirectly linked to program functionality, so the graphical user interface can be easily
customized, allowing the user to select or design a different skin at will.
Some graphical user interfaces are designed for the rigorous requirements of vertical
markets. These are known as "application specific graphical user interfaces." Examples of
application specific graphical user interfaces:
Touch screen point of sale software used by wait staff in busy restaurants
Self-service checkouts used in some retail stores..
Airline self-ticketing and check-in
Information kiosks in public spaces like train stations and museums
Monitor/control screens in embedded industrial applications which employ a real
time operating system (RTOS).
The latest cell phones and handheld game systems also employ application specific
touch screen graphical user interfaces. Cars have graphical user interfaces in them. For
example, GPS navigation, touch screen multimedia centers, and even on dashboards of
the newer cars.
Metisse 3D Window manager
Residents training in Videoendoscopic Surgery Laboratory
XGL 3D Desktop
1.5.1 Three-dimensional graphical user interfaces
For typical computer displays, three-dimensional are a misnomer—their displays
are two-dimensional. Three-dimensional images are projected on them in two
dimensions. Since this technique has been in use for many years, the recent use of the
term three-dimensional must be considered a declaration by equipment marketers that the
speed of three dimension to two dimension projection is adequate to use in standard
graphical user interfaces.
Screenshot showing the 'cube' plugin of Compiz on Ubuntu
Three-dimensional graphical user interfaces are common in science fiction
literature and movies, such as in Jurassic Park, which features Silicon Graphics' threedimensional file manager.
In science fiction, three-dimensional user interfaces are often immersible
environments like William Gibson's Cyberspace or Neal Stephenson's Metaverse. Threedimensional graphics are currently mostly used in computer games, art and computeraided design (CAD). A three-dimensional computing environment could possibly be used
for collaborative work. For example, scientists could study three-dimensional models of
molecules in a virtual reality environment, or engineers could work on assembling a
three-dimensional model of an airplane.
Let us Sum Up
In this lesson we have learnt about the following
a) Introduction to computer graphics
b) History of computer graphics and
c) Applications of computer graphics
Lesson-end Activities
After learning this lesson, try to discuss among your friends and answer these
questions to check your progress.
The need of Computer Graphics in the modern world
The use of Computer Graphics in the modern world
Points for Discussion
Try to discuss the following
Computer aided design
Computer aided manufacturing
Model answers to “Check your Progress”
In order to check your progress, try to answer the following questions
a) Discuss about the application of computer graphics in entertainment
b) Discuss about the application of computer graphics in visualization
Chapter 1 of William M. Newman, Robert F. Sproull, “Principles of Interactive
Computer Graphics”, Tata-McGraw Hill, 2000
Chapter 1 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C
Version”, Pearson Education, 2007
Chapter 1, 2, 3 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006
Chapter 1 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer Graphics
– principles and practice”, Addison-Wesley, 1997
Aims and Objective
Computer Display
Random Scan
Raster Scan
Pixel Values
Raster Memory
Key attributes of Raster Displays
Display Processor
Let us Sum Up
Lesson-end Activities
Points for Discussion
Model answers to “Check your Progress”
2.1 Aims and Objectives
The aim of this lesson is to learn the concepts of computer display, random scan and
raster scan systems.
The objectives of this lesson are to make the student aware of the following concepts
a) Display systems
b) Cathode ray tube
c) Random Scan
d) Raster Scan and
e) Display processor
2.2 Introduction
Graphics Terminal: Interactive computer graphics terminals comprise distinct output
and input devices. Aside from power supplies and enclosures, these usually connect only
via a computer both connect to.
output: A display system presenting rapidly variable (not just hard-copy)
graphical output;
input: Some input device(s), e.g. keyboard + mouse. These may provide
graphical input:
o A mouse provides graphical input the computer echoes as a graphical
cursor on the display.
o A keyboard typically provides graphical input located at a separate text
cursor position.
There may be other I/O devices, e.g. a scanner and/or printer, microphone(s)
and/or speakers.
A Display System typically comprises:
A display device such as a CRT (cathode ray tube), liquid crystal display, etc.
o Most have a screen which presents a 2D image;
o Stereoscopic displays show distinct 2D images to each eye (head-mounted
/ special glasses);
o Displays with true 3D images are available.
A display processor controlling the display according digital instructions about
what to display.
memory for these instructions or image data, possibly part of a computer's
ordinary RAM.
2.3 Computer display
A computer display monitor, usually called simply a monitor, is a piece of
electrical equipment which displays viewable images generated by a computer without
producing a permanent record. The word "monitor" is used in other contexts; in particular
in television broadcasting, where a television picture is displayed to a high standard. A
computer display device is usually either a cathode ray tube or some form of flat panel
such as a TFT LCD. The monitor comprises the display device, circuitry to generate a
picture from electronic signals sent by the computer, and an enclosure or case. Within the
computer, either as an integral part or a plugged-in interface, there is circuitry to convert
internal data to a format compatible with a monitor.
The CRT or cathode ray tube, is the picture tube of a monitor. The back of the
tube has a negatively charged cathode. The electron gun shoots electrons down the tube
and onto a charged screen. The screen is coated with a pattern of dots that glow when
struck by the electron stream. Each cluster of three dots, one of each color, is one pixel.
The image on the monitor screen is usually made up from at least tens of
thousands of such tiny dots glowing on command from the computer. The closer together
the pixels are, the sharper the image on screen. The distance between pixels on a
computer monitor screen is called its dot pitch and is measured in millimeters. Most
monitors have a dot pitch of 0.28 mm or less.
There are two electromagnets around the collar of the tube which deflect the
electron beam. The beam scans across the top of the monitor from left to right, is then
blanked and moved back to the left-hand side slightly below the previous trace (on the
next scan line), scans across the second line and so on until the bottom right of the screen
is reached. The beam is again blanked, and moved back to the top left to start again. This
process draws a complete picture, typically 50 to 100 times a second. The number of
times in one second that the electron gun redraws the entire image is called the refresh
rate and is measured in hertz (cycles per second). It is common, particularly in lowerpriced equipment, for all the odd-numbered lines of an image to be traced, and then all
the even-numbered lines; the circuitry of such an interlaced display need be capable of
only half the speed of a non-interlaced display. An interlaced display, particularly at a
relatively low refresh rate, can appear to some observers to flicker, and may cause
eyestrain and nausea.
CRT computer monitor
As with television, several different hardware technologies exist for displaying
computer-generated output:
Liquid crystal display (LCD). LCDs are the most popular display device for new
computers in the Western world.
Cathode ray tube (CRT)
o Vector displays, as used on the Vectrex, many scientific and radar
applications, and several early arcade machines (notably Asteroids always implemented using CRT displays due to requirement for a
deflection system, though can be emulated on any raster-based display.
o Television receivers were used by most early personal and home
computers, connecting composite video to the television set using a
modulator. Image quality was reduced by the additional steps of
composite video → modulator → TV tuner → composite video.
Plasma display
Surface-conduction electron-emitter display (SED)
Video projector - implemented using LCD, CRT, or other technologies. Recent
consumer-level video projectors are almost exclusively LCD based.
Organic light-emitting diode (OLED) display
The performance parameters of a monitor are:
Luminance, measured in candelas per square metre (cd/m²).
Size, measured diagonally. For CRT the viewable size is one inch (25 mm)
smaller then the tube itself.
Dot pitch. Describes the distance between pixels of the same color in millimetres.
In general, the lower the dot pitch (e.g. 0.24 mm, which is also 240 micrometres),
the sharper the picture will appear.
Response time. The amount of time a pixel in an LCD monitor takes to go from
active (black) to inactive (white) and back to active (black) again. It is measured
in milliseconds (ms). Lower numbers mean faster transitions and therefore fewer
visible image artifacts.
Refresh rate. The number of times in a second that a display is illuminated.
Power consumption, measured in watts (W).
Aspect ratio, which is the horizontal size compared to the vertical size, e.g. 4:3 is
the standard aspect ratio, so that a screen with a width of 1024 pixels will have a
height of 768 pixels. A widescreen display can have an aspect ratio of 16:9, which
means a display that is 1024 pixels wide will have a height of 576 pixels.
Display resolution. The number of distinct pixels in each dimension that can be
A fraction of all LCD monitors are produced with "dead pixels"; due to the desire
to increase profit margins by companies, most manufacturers sell monitors with dead
pixels. Almost all manufacturers have clauses in their warranties which claim monitors
with fewer than some number of dead pixels is not broken and will not be replaced. The
dead pixels are usually stuck with the green, red, and/or blue subpixels either individually
always stuck on or off. Like image persistence, this can sometimes be partially or fully
reversed by using the same method listed below, however the chance of success is far
lower than with a "stuck" pixel.
Screen burn-in, where a static image left on the screen for a long time embeds the
image into the phosphor that coats the screen, is an issue with CRT and Plasma computer
monitors and televisions. The result of phosphor burn-in are "ghostly" images of the
static object visible even when the screen has changed, or is even off. This effect usually
fades after a period of time. LCD monitors, while lacking phosphor screens and thus
immune to phosphor burn-in, have a similar condition known as image persistence, where
the pixels of the LCD monitor "remember" a particular color and become "stuck" and
unable to change. Unlike phosphor burn-in, however, image persistence can sometimes
be reversed partially or completely. This is accomplished by rapidly displaying varying
colors to "wake up" the stuck pixels. Screensavers using moving images, prevent both of
these conditions from happening by constantly changing the display. Newer monitors are
more resistant to burn-in, but it can still occur if static images are left displayed for long
periods of time.
Most modern computer displays can show thousands or millions of different
colors in the RGB color space by varying red, green, and blue signals in continuously
variable intensities.
Many monitors have analog signal relay, but some more recent models (mostly
LCD screens) support digital input signals. It is a common misconception that all
computer monitors are digital. For several years, televisions, composite monitors, and
computer displays have been significantly different. However, as TVs have become more
versatile, the distinction has blurred.
Some users use more than one monitor. The displays can operate in multiple
modes. One of the most common spreads the entire desktop over all of the monitors,
which thus act as one big desktop. The X Window System refers to this as Xinerama.
Two Apple flat-screen monitors used as dual display
Display systems use either random or raster scan:
Random scan displays, often termed vector displays, came first and are still used
in some applications. Here the electron gun of a CRT illuminates points and/or
straight lines in any order. The display processor repeatedly reads a variable
'display file' defining a sequence of X,Y coordinate pairs and brightness or colour
values, and converts these to voltages controlling the electron gun.
A Random Scan Display (outline)
Raster scan displays, also known as bit-mapped or raster displays, are somewhat
less relaxed. Their whole display area is updated many times a second from
image data held in raster memory. The rest of this handout concerns hardware
and software aspects of raster displays.
2.4 Random Scan Systems
A two-dimensional video data acquisition system comprising: video detector
apparatus for scanning a visual scene; controller apparatus for generating scan pattern
instructions; system interface apparatus for selecting at least one scan pattern for
acquisition of video data from the visual scene, the scan pattern being selected from a
plurality of such patterns in accordance with the scan pattern instructions; and scan-video
interface apparatus comprising random scan driver apparatus for generating scan control
signals in accordance with the selected scan pattern, the video detector apparatus
scanning the visual scene in accordance with the scan control signals to provide an output
to the system interface such that an intensity data map is stored therein, the controller
apparatus performing data processing of the intensity data map in accordance with a
predetermined set of video data characteristics.
2.5 Raster Scan
A Raster scan, or raster scanning, is the pattern of image detection and
reconstruction in television, and is the pattern of image storage and transmission used in
most computer bitmap image systems. The word raster comes from the Latin word for a
rake, as the pattern left by a rake resembles the parallel lines of a scanning raster.
In a raster scan, an image is cut up into successive samples called pixels, or
picture elements, along scan lines. Each scan line can be transmitted as it is read from the
detector, as in television systems, or can be stored as a row of pixel values in an array in a
computer system. On a television receiver or computer monitor, the scan line is turned
back to a line across an image, in the same order. After each scan line, the position of the
scan line is advanced, typically downward across the image in a process known as
vertical scanning, and a next scan line is detected, transmitted, stored, retrieved, or
displayed. This ordering of pixels by rows is known as raster order, or raster scan order.
2.5.1 Rasters
Lexically, a raster is a series of adjacent parallel 'lines' which together form an
image on a display screen. In early analogue television sets each such line is scanned
continuously, not broken up into distinct units. In computer or digital displays these lines
are composed of independently coloured pixels (picture elements).
Mathematically we consider a raster to be a rectangular grid or array of pixel
A Raster
Pixel positions have X,Y coordinates. Usually Y points down. This may reflect
early use to display text to western readers. Also when considering 3D, right-handed
coordinates imply Z represents depth.
2.5.2 Pixel Values
The colour of each pixel of a display is controlled by a distinct digital memory
element. Each such element holds a pixel value encoding a monochrome brightness or
colour to be displayed.
Monochrome displays are of two types. Bi-level displays have 1-bit pixels and have
been green or orange as well as black-and-white. Greyscale displays usually have 8 to 16
bit pixel values encoding brightness.
Non-monochrome displays also have different types. True-colour displays have pixel
values divided into three component intensities, usually red, green and blue, often of 8
bits each. This used to be very costly. Alternatively the pixel values may index into a
fixed or variable colour map defining a limited colour palette. Pseudo-colour displays
with 8-bit pixels indexing a variable colour map of 256 colours have been common.
2.5.3 Raster Memory
Pixmap: A pixmap is storage for a whole raster of pixel values. Usually a contiguous
area of memory, comprising one row (or column) of pixels after another.
Bitmap: Technically a bitmap is a pixmap with 1 bit per pixel, i.e. boolean colour values,
e.g. for use in a black-and-white display. But 'bitmap' is often misused to mean any
pixmap - please try to avoid this!
Pixrect: A pixrect is any 'rectangular area' within a pixmap. A pixrect thus typically
refers to a series of equal-sized fragments of the memory within a pixmap, one for each
row (or column) of pixels.
Frame Buffer: In a bit-mapped display, the display processor refreshes the screen 25 or
more times per second, a line at a time, from a pixmap termed its frame buffer. In each
refresh cycle, each pixel's colour value is 'copied' from the frame buffer to the screen.
Frame buffers are often special two-ported memory devices ('video memory') with one
port for writing and another for concurrent reading. Alternatively they can be part of the
ordinary fast RAM of a computer, which allows them to be extensively reconfigured by
Additional raster memory may exist 'alongside' that for colour values. For example there
may be an 'alpha channel' (transparency values) a z-buffer (depth values for hidden object
removal), or an a-buffer (combining both ideas). The final section of these notes will
return to this area, especially use of a z-buffer.
2.5.4 Key Attributes of Raster Displays
Major attributes that vary between different raster displays include the following:
'Colour': bi-level, greyscale, pseudo-colour, true colour: see 'pixel values' above;
Size: usually measured on the diagonal: inches or degrees;
Aspect ratio: now usually 5:4 or 4:3 (625-line TV: 4:3; HDTV: 5:3);
Resolution: e.g. 1024×1280 (pixels). Multiplying these numbers together we can
say e.g. 'a 1.25 Mega-pixel display'. Avoid terms such as low/medium/high
resolution which may change over time.
Pixel shape: now usually square; other rectangular shapes have been used.
Brightness, sharpness, contrast: possibly varying significantly with respect to
view angle.
Speed, interlacing: now usually 50 Hz or more and flicker-free to most humans;
Computational features, as discused below...
Since the 1970s, raster display systems have evolved to offer increasingly powerful
facilities, often packaged in optional graphics accelerator boards or chips. These facilities
have typically consisted of hardware implementation or acceleration of computations
which would otherwise be coded in software, such as:
Raster-ops: fast 2D raster-combining operations;
2D scan conversion, i.e. creating raster images required by 2D drawing
primitives such as:
o 2D lines, e.g. straight/circular/elliptical lines, maybe spline curves (based
on several points);
o 2D coloured areas, e.g. polygons or just triangles, possibly with colour
o Text (often copied from rasterised fonts using raster-ops);
3D graphics acceleration, now often including 3D scan conversion, touched on
It is useful for graphics software developers to be aware of such features and how
they can be accessed, and to have insight into their cost in terms of time taken as a
function of length or area.
2.6 Display Processor
A display processor for displaying data in one or more windows on a display
screen. The display processor divides a display screen into a plurality of horizontal strips
with each strip further subdivided into a plurality of tiles. Each tile represents a portion of
a window to be displayed on the screen. Each tile is defined by tile descriptors which
include memory address locations of data to be display in that tile. The descriptors need
only be changed when the arrangement of the windows on the screen is changed or when
the mapping of any of the windows into the bit-map is changed. The display processor of
the present invention does not require a bit map frame buffer to be utilized before
displaying windowed data on a screen. Each horizontal strip may be as thin as 1 pixel,
which allows for the formation of windows of irregular shapes, such as circles.
Let us Sum Up
In this lesson we have learnt about random scan, raster scan, and the display
Lesson-end Activities
After learning this lesson, try to discuss among your friends and answer these
questions to check your progress.
 Discuss about raster memory
 Discuss about the key attributes of raster displays
Points for Discussion
Discuss the following
 Display processor
Model answers to “Check your Progress”
In order to check your progress, try to answer the following questions
 Performance parameters of a monitor
1. Chapter 1, 26 of William M. Newman, Robert F. Sproull, “Principles of
Interactive Computer Graphics”, Tata-McGraw Hill, 2000
2. Chapter 2 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C
Version”, Pearson Education, 2007
3. Chapter 2 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006
4. Chapter 4 of J.D. Foley, A. Dam, S.K. Feiner, J.F. Hughes, “Computer
Graphics – principles and practice”, Addison-Wesley, 1997
3.1 Aims and Objectives
3.2 Introduction
3.3 Graphics Kernel System
3.5 OpenGL
3.6 Let us Sum Up
3.7 Lesson-end Activities
3.8 Points for Discussion
3.9 Model answers to “Check your Progress”
3.10 References
3.1 Aims and Objectives
The aim of this lesson is to learn the concept of graphics software standards.
The objectives of this lesson are to make the student aware of the following concepts
a) Graphics Kernel System
c) OpenGL
3.2 Introduction
A list of graphics standards are given below
CGI - the computer graphics interface - which is the low-level interface between
GKS and the hardware.
CGM - the computer graphics metafile - which is defined as the means of
communicating between different software packages.
3D-GKS - the three-dimensional extension of GKS.
PHIGS - the Programmers Hierarchical Interactive Graphics System - another
three-dimensional standard (based on the old SIGGRAPH core).
3.3 Graphical Kernel System
The Graphical Kernel System (GKS) is accepted as an international standard for
two-dimensional graphics (although largely ignored in the USA.
The two-dimensional Computer Graphics is closely related to the six output
functions of GKS. These are:1. Polyline. Draws one or more straight lines through the coordinates supplied.
2. Polymarker. Draws a symbol at each of the coordinates supplied. The software
allows the choice of one of the five symmetric symbols, namely: x + * 0
3. Text. This allows a text string to be output in a number of ways, starting at the
coordinate given.
4. Fill-area. This allows a polygon to be drawn and filled, using the coordinates
given. Possible types of fill include hollow, solid and a variety of hatching and
5. Cell-array. This allows a pattern to be defined and output in the rectangle
defined by the coordinates given. This is discussed in the section "Patterns &
6. Generalised Drawing Primitive (GDP). This allows the provision of a variety of
other facilities. Most systems include software for arcs of circles or ellipses and
the drawing of a smooth curve through a set of points (I have called this
"polysmooth" elsewhere in this text).
Following the acceptance of GKS as an international standard, work commenced
on two related standards, namely CGI and CGM. The "Computer Graphics Interface"
provides a low-level standard between the actual hardware and GKS and specifies how
device-drivers should be written. The "Computer Graphics Metafile" is used to
transfer graphics segments from one computer system to another.
The Programmer's Hierarchical Interactive Graphics System (PHIGS) is a 3D
graphics standard which was developed within ISO in parallel to GKS-3D. The PHIGS
standard defines a set of functions and data structures to be used by a programmer to
manipulate and display 3-D graphical objects. It was accepted as a full International
Standard in 1988. A great deal of PHIGS is identical to GKS-3D, including the
primitives, the attributes, the workstation concept, and the viewing and input models.
However, PHIGS has a single Central Structure Store (CSS), unlike the separate
Workstation Dependent and Workstation Independent Segment Storage (WDSS and
WISS) of GKS. The CSS contains Structures which can be configured into a hierarchical
directed-graph database, and within the structures themselves are stored the graphics
primitives, attributes, and so forth. PHIGS is aimed particularly at highly interactive
applications with complicated hierarchical data structures, for example: Mechanical
CAD, Molecular Modelling, Simulation and Process Control.
At the end of 1991, CERN acquired an implementation of PHIGS in a portable
machine-independent version (i.e. it did not consider hardware-dependent packages
supplied by the hardware manufacturers). The package is from the French companie G5G
-- Graphisme 5eme Generation --. This specific implementation of PHIGS, the only one
officially supported at CERN, is called GPHIGS. The package is available on the
following platforms: VAX VMS, HP (HP/UX), Silicon Graphics, IBM RISC 6000, SUN
(SunOS and Solaris), DEC Station (Ultrix), DEC ALPHA (OpenVMS and OSF/1). Both
the FORTRAN and C bindings are available. The following driver interfaces are
available: X-Window, DEC-Windows, GL, Starbase, XGL, HP GL, CGM, and
PostScript. A new version (3.1) is now available, as announced in CNL 216.
3.5 OpenGL
OpenGL is a standard interface developed by Silicon Graphics and subsequently
endorsed by Microsoft. OpenGL is a widely accepted standard API for high-end graphics
applications. For example, Code written in OpenGL would typically include subroutine
calls to do things like "draw a triangle." The details of exactly how the triangle is drawn
are inside OpenGL and are hidden from the applications programmer. This leaves open
the possibility of having different implementations of OpenGL, all of which work with
the application because they all have the same subprogram calls.
Different implementations of OpenGL are written for different graphics
accelerators. If a computer running Microsoft software does not have a graphics
accelerator, Microsoft provides a software implementation that runs on the CPU. If the
computer is upgraded with a hardware accelerator, the maker of the accelerator board
may supply a version of OpenGL than routes the OpenGL commands to the board,
converting the control sequences to commands appropriate to that particular hardware.
3.6 Let us Sum Up
In this lesson we have learnt about
a) GKS
c) OpenGL
3.7 Lesson-end Activities
After learning this lesson, try to discuss among your friends and answer these
questions to check your progress.
a) Various graphics standards
3.8 Points for Discussion
Discuss about the following
a) Graphics metafile
3.9 Model answers to “Check your Progress”
In order to check your progress, try to answer the following questions
a) Discuss about PHIGS
b) Discuss about GKS
3.10 References
Chapter 1 of William M. Newman, Robert F. Sproull, “Principles of Interactive
Computer Graphics”, Tata-McGraw Hill, 2000
Chapter 2 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C Version”,
Pearson Education, 2007
Chapter 1, 2, 17 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006
Chapter 1, 2, 4, 7 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer
Graphics – principles and practice”, Addison-Wesley, 1997
4.1 Aims and Objectives
4.2 Introduction
4.3 Keyboard
4.4 Mouse
4.5 Data gloves
4.6 Graphics Tablets
4.7 Scanner
4.8 Joy Stick
4.9 Light Pen
4.10 Let us Sum Up
4.11 Lesson-end Activities
4.12 Points for Discussion
4.13 Model answers to “Check your Progress”
4.14 References
4.1 Aims and Objectives
The aim of this lesson is to learn the concept of some of important input devices
needed for computer graphics
The objectives of this lesson are to make the student aware of some of the important
input devices.
4.2 Introduction
In the following subsection we will learn about the following input devices
Data gloves
Graphics Tablet
Light Pen
4.3 Keyboard
A keyboard is a peripheral partially modeled after the typewriter keyboard.
Keyboards are designed to input text and characters, as well as to operate a computer.
Physically, keyboards are an arrangement of rectangular buttons, or "keys". Keyboards
typically have characters engraved or printed on the keys; in most cases, each press of a
key corresponds to a single written symbol. However, to produce some symbols requires
pressing and holding several keys simultaneously or in sequence; other keys do not
produce any symbol, but instead affect the operation of the computer or the keyboard
Keyboard keys
Roughly 50% of all keyboard keys produce letters, numbers or signs (characters).
Other keys can produce actions when pressed, and other actions are available by the
simultaneous pressing of more than one action key.
keyboard foldable keyboard
4.4 Mouse
A mouse (plural mice or mouses) functions as a pointing device by detecting
two-dimensional motion relative to its supporting surface. Physically, a mouse consists of
a small case, held under one of the user's hands, with one or more buttons. It sometimes
features other elements, such as "wheels", which allow the user to perform various
system-dependent operations, or extra buttons or features can add more control or
dimensional input. The mouse's motion typically translates into the motion of a pointer
on a display.
The name mouse, coined at the Stanford Research Institute, derives from the
resemblance of early models (which had a cord attached to the rear part of the device,
suggesting the idea of a tail) to the common eponymous rodent.
The first marketed integrated mouse — shipped as a part of a computer and
intended for personal computer navigation — came with the Xerox 8010 Star Information
System in 1981.
A contemporary computer mouse
The first computer mouse, held by
inventor Douglas Engelbart
An optical mouse uses a light-emitting diode and photodiodes to detect
movement relative to the underlying surface, rather than moving some of its parts — as in
a mechanical mouse.
4.5 Data gloves
A glove equipped with sensors that sense the movements of the hand and
interfaces those movements with a computer. Data gloves are commonly used in virtual
reality environments where the user sees an image of the data glove and can manipulate
the movements of the virtual environment using the glove
4.6 Graphics tablet
A graphics tablet is a computer input device that allows one to hand-draw
images and graphics, similar to the way one draws images with a pencil and paper. A
Graphics tablet consists of a flat surface upon which the user may "draw" an image using
an attached stylus, a pen-like drawing apparatus. The image generally does not appear on
the tablet itself but, rather, is displayed on the computer monitor.
graphics tablet
A Gerber
4.7 Scanner
A scanner is a device that analyzes images, printed text, or handwriting, or an
object (such as an ornament) and converts it to a digital image. Most scanners today are
variations of the desktop (or flatbed) scanner. The flatbed scanner is the most common in
offices. Hand-held scanners, where the device is moved by hand, were briefly popular
but are now not used due to the difficulty of obtaining a high-quality image. Both these
types of scanners use charge-coupled device (CCD) or Contact Image Sensor (CIS) as the
image sensor, whereas older drum scanners use a photomultiplier tube as the image
Another category of scanner is a rotary scanner used for high-speed document
scanning. This is another kind of drum scanner, but it uses a CCD array instead of a
Other types of scanners are planetary scanners, which take photographs of books
and documents, and 3D scanners, for producing three-dimensional models of objects, but
this type of scanner is considerably more expensive relative to other types of scanners.
Another category of scanner are digital camera scanners which are based on the
concept of reprographic cameras. Due to the increasing resolution and new features such
as anti-shake, digital cameras become an attractive alternative to regular scanners. While
still containing disadvantages compared to traditional scanners, digital cameras offer
unmatched advantages in speed and portability.
Desktop scanner, with the lid raised
Scan of the jade rhinoceros
4.8 Joystick
A joystick is a personal computer peripheral or general control device consisting
of a handheld stick that pivots about one end and transmits its angle in two or three
dimensions to a computer.
Joysticks are often used to control video games, and usually have one or more
push-buttons whose state can also be read by the computer. The term joystick has become
a synonym for game controllers that can be connected to the computer since the computer
defines the input as a "joystick input".
Apart from controlling games, joysticks are also used for controlling machines
such as aircraft, cranes, trucks, powered wheelchairs and some zero turning radius lawn
mowers. More recently miniature joysticks have been adopted as navigational devices for
smaller electronic equipment such as mobile phones.
There has a been a recent and very significant drop in joystick popularity in the
gaming industry. This is primarily due to the shrinkage of the flight simulator genre, and
the almost complete disappearance of space-based simulators.
Joysticks can be used within first-person shooter games, but are significantly less
accurate than a mouse-keyboard. This is one of the fundamental reasons why multiplayer
console games are not compatible with PC versions of the same game. A handful of
recent games, including Halo 2 and Shadowrun, have allowed console-PC matchings, but
have significantly handicapped PC users by requiring them to use the auto-aim feature.
Joystick elements: 1. Stick 2. Base 3. Trigger 4. Extra buttons 5. Autofire switch 6.
Throttle 7. Hat Switch (POV Hat) 8. Suction Cup
4.9 Light Pen
A light pen is a computer input device in the form of a light-sensitive wand used
in conjunction with the computer's CRT monitor. It allows the user to point to displayed
objects, or draw on the screen, in a similar way to a touch screen but with greater
positional accuracy. A light pen can work with any CRT-based monitor, but not with
LCD screens, projectors and other display devices.
A light pen is fairly simple to implement. The light pen works by sensing the
sudden small change in brightness of a point on the screen when the electron gun
refreshes that spot. By noting exactly where the scanning has reached at that moment, the
X,Y position of the pen can be resolved. This is usually achieved by the light pen causing
an interrupt, at which point the scan position can be read from a special register, or
computed from a counter or timer. The pen position is updated on every refresh of the
The light pen became moderately popular during the early 1980s. It was notable
for its use in the Fairlight CMI, and the BBC Micro. Even some consumer products were
given Light pens. For example, the Toshiba DX-900 VHS HiFi/PCM Digital VCR came
with one. However, due to the fact that the user was required to hold his or her arm in
front of the screen for long periods of time, the light pen fell out of use as a general
purpose input device.
The first light pen was used around 1957 on the Lincoln TX-0 computer at the
MIT Lincoln Laboratory. Contestants on the game show Jeopardy! use a light pen to
write down their answers and wagers for the Final Jeopardy! round. Light pens are used
country-wide in Belgium for voting.
4.10 Let us Sum Up
In this lesson we have learnt about various input devices needed for computer
4.11 Lesson-end Activities
After learning this lesson, try to discuss among your friends and answer these
questions to check your progress.
Explain about Joystick
Explain about Data glove
4.12 Points for Discussion
Discuss about the following
4.13 Model answers to “Check your Progress”
In order to check your progress, try to answer the following questions
Discuss about Mouse
Discuss about Graphics Tablets
4.14 References
Chapter 11 of William M. Newman, Robert F. Sproull, “Principles of Interactive
Computer Graphics”, Tata-McGraw Hill, 2000
Chapter 2 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C Version”,
Pearson Education, 2007
Chapter 2 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006
Chapter 8 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer Graphics –
principles and practice”, Addison-Wesley, 1997
5.1 Aims and Objectives
5.2 Introduction
5.3 Points and Lines
5.4 Rasterization
5.5 Digital Differential Analyzer (DDA) Algorithm
5.6 Bresenham’s Algorithm
5.7 Properties of Circles
5.8 Properties of ellipse
5.9 Pixel Addressing
5.10 Let us Sum Up
5.11 Lesson-end Activities
5.12 Points for Discussion
5.13 Model answers to “Check your Progress”
5.14 References
5.1 Aims and Objectives
The aim of this lesson is to learn the concept of output primitives
The objectives of this lesson are to make the student aware of the following concepts
points and lines
DDA and Bresenham’s algorithm
Properties of circle and ellipse
Pixel addressing
5.2 Introduction
The basic elements constituting a graphic are called output primitives. Each
output primitive has an associated set of attributes, such as line width and line color for
lines. The programming technique is to set values for the output primitives and then call a
basic function that will draw the desired primitive using the current settings for the
attributes. Various graphics systems have different graphics primitives. For example
GKS defines five output primitives namely, polyline (for drawing contiguous line
segments), polymarker (for marking coordinate positions with various symmetric text
symbols), text (for plotting text at various angles and sizes), fill area (for plotting
polygonal areas with solid or hatch fill), cell array (for plotting portable raster images).
At the same time GRPH1 has the output primitives namely Polyline, Polymarker, Text,
Tone and have other secondary primitives besides these namely, Line and Arrow
5.3 Points and Lines
In a CRT monitor, the electron beam is turned on to illuminate the screen
phosphor at the selected location. Depending on the display technology, the positioning
of the electron beam changes. In a random-scan (vector) system point plotting
instructions are stored in a display list and the coordinate values in these instructions are
converted to deflection voltages the position the electron beam at the screen locations.
Low-level procedure for ploting a point on the screen at (x,y) with intensity “I”
can be given as
A line is drawn by calculating the intermediate positions between the two end
points and displaying the pixels at those positions.
5.4 Rasterization
Rasterization is the process of converting a vertex representation to a pixel
representation; rasterization is also called scan conversion. Included in this definition are
geometric objects such as circles where you are given a center and radius. In these lesson
I will cover:
The digital differential analyzer (DDA) which introduces the basic concepts for
Bresenham's algorithm which improves on the DDA.
Scan conversion algorithms use incremental methods that exploit coherence. An
incremental method computes a new value quickly from an old value, rather than
computing the new value from scratch, which can often be slow. Coherence in space or
time is the term used to denote that nearby objects (e.g., pixels) have qualities similar to
the current object.
5.5 Digital Differential Analyzer (DDA) Algorithm
In this algorithm, the line is sampled at unit intervals in one coordinate and find
the corresponding values nearest to the path for the other coordinate. For a line with
positive slope less than one, x > y (where x = x2-x1 and y = y2-y1). Hence we
sample at unit x intervals and compute each successive y values as
y k 1  y k  m
(xi , yi)
(xi , Round(yi))
For lines with positive slope greater than one, y > x. Hence we sample at unit
y intervals and compute each successive x values as
Since the slope, m, can be any real number, the calculated value must be rounded
to the nearest integer.
x k 1  x k 
For a line with negative slope, if the absolute value of the slope is less than one,
we make unit increment in the x direction and calculate y values as
y k 1  y k  m
For a line with negative slope, if the absolute value of the slope is greater than
one, we make unit decrement in the y direction and calculate x values as
x k 1  x k 
[Note :- for all the above four cases it is assumed that the first point is on the left and
second point is in the right.]
DDA Line Algorithm
void myLine(int x1, int y1, int x2, int y2)
int length,i;
double x,y;
double xincrement;
double yincrement;
length = abs(x2 - x1);
if (abs(y2 - y1) > length) length = abs(y2 - y1);
xincrement = (double)(x2 - x1)/(double)length;
yincrement = (double)(y2 - y1)/(double)length;
x = x1 + 0.5;
y = y1 + 0.5;
for (i = 1; i<= length;++i)
myPixel((int)x, (int)y);
x = x + xincrement;
y = y + yincrement;
5.6 Bresenham’s Algorithm
In this method, developed by Jack Bresenham, we look at just the center of the
pixels. We determine d1 and d2 which is the "error", i.e., the difference from the "true
Steps in the Bresenham algorithm:
1. Determine the error terms
2. Define a relative error term such that the sign of this term tells us which pixel to
3. Derive equation to compute successive error terms from first
4. Compute first error term
Now the y coordinate on the mathematical line at pixel position xi+1 is calculated as
y = m(xi+1) + b
And the distances are calculated as
d1 = y - yi = m(xi +1) + b - yi
d2 = (yi +1) - y = yi +1 -m(xi +1) - b
d1 - d2 = 2m(xi +1) - 2y + 2b -1
Now define pi = dx(d1 - d2) = relative error of the two pixels.
Note: pi < 0 if yi pixel is closer, pi >= 0 if yi+1 pixel is closer. Therefore we only need to
know the sign of pi .
With m = dy/dx and substituting in for (d1 - d2) we get
pi = 2 * dy * xi - 2 * dx * yi + 2 * dy + dx * (2 * b - 1)
 (1)
Let C = 2 * dy + dx * (2 * b - 1)
Now look at the relation of p's for successive x terms.
pi+1 = 2dy * xi+1 - 2 * dx * yi+1 + C
pi+1 - pi = 2 * dy * (xi+1 - xi) - 2 * dx * ( yi+1 - yi)
with xi+1 = xi + 1 and yi+1= yi + 1 or yi
pi+1 = pi + 2 * dy - 2 * dx(yi+1 -yi)
Now compute p1 (x1,y1) from (1) , where b = y - dy / dx * x
2dy * x1 - 2dx * y1 + 2dy + dx(2y1 - 2dy / dx * x1 - 1)
2dy * x1 - 2dx * y1 + 2dy + 2dx * y1 - 2dyx1 - dx
2dy - dx
if pi < 0, plot the pixel (xi+1, yi) and next decision parameter is pi+1 = pi + 2dy
else and plot the pixel (xi+1, yi+1) and next decision parameter is pi+1 = pi + 2dy - 2dx
Bresenham Algorithm for 1st octant:
Enter endpoints (x1, y1) and (x2, y2).
Display x1, y1.
Compute dx = x2 - x1 ; dy = y2 - y1 ; p1 = 2dy - dx.
If p1 < 0.0, display (x1 + 1, y1), else display (x1+1, y1 + 1)
if p1 < 0.0, p2 = p1 + 2dy, else p2 = p1 + 2dy - 2dx
Repeat steps 4, 5 until reach x2, y2.
Note: Only integer Addition and Multiplication by 2. Notice we always increment x by 1.
For a generalized Bresenham Algorithm must look at behavior in different octants.
5.7 Properties of Circles
The set of points in the circumference of the circle are all at equal distance r from
the centre (xc,yc) and its relation is given be pythagorean theorem as
x  xc 2  y  yc 2  r 2
The points in the circumference of the circle can be calculated by unit increments
in the x direction from xc - r to xc + r and the corresponding y values can be obtained as
y  y c  r 2  xc  x 
The major problem here is that the spacing between the points will not be same.
It can be adjusted by interchanging x and y whenever the absolute value of the slope of
the circle is greater than 1.
The unequal spacing can be eliminated by using polar coordinates and is given by
x  x c  r cos
y  y c  r sin 
The major problem in the above two methods is the computational time. The
computational time can be reduced by considering the symmetry of circles. The shape of
the circle is similar in each quadrant. Thinking one step further shows that there are
symmetry between octants too.
Midpoint circle algorithm
To simplify the function evaluation that takes place on each iteration of our circledrawing algorithm, we can use Midpoint circle algorithm
The equation of the circle can be expressed as a function as given below
f ( x, y )  x 2  y 2  r 2
If the point is inside the circle then f(x,y)<0 and if it is outside then f(x,y)>0 and if
the point is in the circumference of the circle then f(x,y)=0.
Thus the circle function is the decision parameter in the midpoint algorithm.
Assume that we have just plotted (xk,yk), we have to decide whether to point
(xk+1, yk) or (xk+1, yk - 1) nearer to the circle. Now we consider the midpoint between
the points and define the decision parameter as
p k  f  x k  1, y k  
 x k  1   y k    r 2
p k 1  f  x k 1  1, y k 1  
 x k  1 1   y k 1    r 2
Now by subtracting the above two equations we get
p k 1  p k  2x k  1 y k21  y k2  y k 1  y k  1
where yk+1 is either yk or yk+1 depending on the sign of pk.
The initial decision parameter is obtained by evaluating the circle function at the
starting position (0,r)
p 0  f 1, r  
 1r    r2   r
Hence the algorithm for the first octant is as given below
1. Calculate p0
2. k=0
3. while x ≤ y
a) if pk < 0 then plot pixel (xk+1,yk-1) and find the next decision parameter as
p k 1  p k  2 x k 1  1
b) else plot pixel (xk+1,yk-1) and find the next decision parameter as
p k 1  p k  2 x k 1  1  2 y k 1
c) k=k+1
where 2xk+1=2xk+2 and 2yk+1=2yk-2
5.8 Properties of ellipse
An ellipse is defined as the set of points such that the sum of the distances from
two fixed positions (foci) is the same for all points. If the distances to the two foci from
any point P = (x,y) on the ellipse are labeled d1 and d2 then, the general equation of an
ellipse can be stated as
d1 + d2 = constant
Let the focal coordinates be F1 = (x1,y1) and F2 = (x2,y2). Then by substituting the value
of d1 and d2 we will get
x  x1 2  y  y1 2
x  x2 2  y  y2 2
 cons tan t
The general equation of the ellipse can be written as
Ax 2  By 2  Cxy  Dx  Ey  F  0
where the coefficients A, B, C, D, E, and F are evaluated in terms of the focal
coordinates and the dimensions of the major and minor axis of the ellipse.
If the major and the minor axis are aligned in the directions of x-axis and y-axis,
then the equation of ellipse can be given by
 x  xc   y  yc 
 1
  
 rx   ry 
where rx and ry are the semi-major and semi-minor axis respectively.
The polar equation of the ellipse can be given by
x  x c  rx cos
y  y c  ry sin 
5.9 Pixel Addressing
The Pixel Addressing feature controls the number of pixels that are read from the
Region of interest (ROI). Pixel Addressing is controlled by two parameters – a Pixel
Addressing mode and a value. The mode of Pixel Addressing can be decimate (0),
averaging (1), binning (2) or resampling (3).
With a Pixel Addressing value of 1, the Pixel Addressing mode has no effect and
all pixels in the ROI will be returned. For Pixel Addressing values greater than 1, the
number of pixels will be reduced by the square of the value. For example, a Pixel
Addressing value of 2 will result in ¼ of the pixels.
The Pixel Addressing mode determines how the number of pixels is reduced. The
Pixel Addressing value can be considered as the size of a block of pixels made up of 2x2
groups. For example, a Pixel Addressing value of 3 will reduce a 6 x 6 block of pixels to
a 2 x 2 block – a reduction of 4/36 or 1/9.
The decimate mode will drop pixels all the pixels in the block except for the topleft group of four. At the highest Pixel Addressing value of 6, a 12 x 12 block of pixels is
reduced to 2 x 2. At this level of reduction detail in the scene can be lost and color
artifacts introduced.
The averaging mode will average pixels with the similar color within the block
resulting in a 2x2 Bayer pattern. This allows details in the blocks to be detected and
reduces the effects of the color artifacts.
The binning mode will sum pixels with similar color within the block reducing
the block to a 2x2 Bayer pattern. Unlike binning with CCD sensors, this summation
occurs after the image is digitized so no increase in sensitivity will be noticed but a dark
image will appear brighter.
The resampling mode uses a different approach involving the conversion of the
Bayer pattern in the blocks to RGB pixels. With a Pixel Addressing value of 1,
resampling has no effect. With a Pixel Addressing mode of 2 or more, resampling will
convert the block of 10-bit pixels to one 30-bit RGB pixel by averaging the red, green
and blue channels. Setting the video format to YUV422 mode will result in the best
image quality while resampling. Resampling will create images with the highest quality
and the least artifacts.
Pixel Addressing will reduce the amount of data coming from the camera.
However, only the Decimate mode will permit an increase in the frame rate. Averaging,
binning and resampling modes will have the same frame rate as if the Pixel Addressing
value was 1 (no decimation.). Pixel Addressing works in the same fashion with color or
monochrome sensors. For example the pixel addressing a camera and is parameters are
shown in the following tables.
Auto Manual One-time Auto Off CiD
All cameras No Yes
Yes Yes
Camera Parameter Unit Type
Min Max Default Step
None Absolute 0
None Absolute 1
None Absolute 0
None Absolute 1
None Absolute 0
None Absolute 1
None Absolute 0
None Absolute 1
0: Decimate
0: Decimate, 1: Average, 2: Bin, 3:
0: Decimate, 1: Average, 2: Bin, 3:
Pixel Addressing Value of 3 is not
0: Decimate, 1: Average, 2: Bin, 3:
Pixel Addressing Value of 5 is not
5.10 Let us Sum Up
In this lesson we have learnt about
a) points and lines
b) DDA and Bresenhams algorithm
c) Properties of circle and ellipse
d) Pixel addressing
5.11 Lesson-end Activities
After learning this lesson, try to discuss among your friends and answer these
questions to check your progress.
Discuss about midpoint circle algorithm
Discuss about advantages of Bresenhams algorithm over DDA
5.12 Points for Discussion
Discuss the following
Polar equation of circle
Pixel addressing
5.13 Model answers to “Check your Progress”
In order to check your progress, try to answer the following
a) Algorithm for DDA
b) Algorithm for Bresenhams algorithm
5.14 References
1. Chapter 2 of William M. Newman, Robert F. Sproull, “Principles of
Interactive Computer Graphics”, Tata-McGraw Hill, 2000
2. Chapter 3 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C
Version”, Pearson Education, 2007
3. Chapter 4 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006
4. Chapter 3 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer
Graphics – principles and practice”, Addison-Wesley, 1997
6.1 Aims and Objectives
6.2 Introduction
6.3 Representation of Points/Objects
6.4 Translation
6.5 Scaling
6.6 Rotation
6.7 Shear
6.8 Combining Transformations
6.9 Homogeneous Coordinates
Let us Sum Up
Lesson-end Activities
Points for Discussion
Model answers to “Check your Progress”
6.1 Aim and Objectives
The aim of this lesson is to learn the concept of two dimensional transformations.
The objectives of this lesson are to make the student aware of the following concepts
Homogenous coordinates systems
6.2 Introduction
Transformations are a fundamental part of computer graphics. Transformations
are used to position objects, to shape objects, to change viewing positions, and even to
change how something is viewed. There are 4 main types of transformations that one can
perform in 2 dimensions.
6.3 Representation of Points/Objects
A point p in 2D is represented as a pair of numbers: p= (x, y) where x is the xcoordinate of the point p and y is the y-coordinate of p . 2D objects are often represented
as a set of points (vertices), {p1,p2,...,pn}, and an associated set of edges {e1,e2,...,em}. An
edge is defined as a pair of points e = {pi,pj}.
For example the three points and three edges of the triangle given here are
p1=(1,0), p2=(1.5,2), p3=(2,0), e1={p1,p2}, e2={p2,p3}, and e3={p3,p1}.
We can also write points in vector/matrix notation as
6.4 Translation
Assume you are given a point at (x,y)=(2,1). Where will the point be if you move
it 3 units to the right and 1 unit up?
The Answer is (x',y') = (5,2).
How was this obtained? This is obtained by (x',y') = (x+3,y+1). That is, to move a point
by some amount dx to the right and dy up, you must add dx to the x-coordinate and add
dy to the y-coordinate.
For example to move the green triangle, represented by 3 points given below, to the red
triangle we need dx = 3 and dy = -5.
greentriangle = { p1=(1,0), p2=(2,0), p3=(1.5,2) }
Matrix/Vector Representation of Translations
A translation can also be represented by a pair of numbers, t=(tx,ty) where tx is the
change in the x-coordinate and ty is the change in y coordinate. To translate the point p by
t, we simply add to obtain the new (translated) point q = p + t.
 x  t x   x  t x 
q  pt      
 y  t y   y  t y 
6.5 Scaling
Suppose we want to double the size of a 2-D object. What do we mean by double?
Double in size, width only, height only, along some line only? When we talk about
scaling we usually mean some amount of scaling along each dimension. That is, we must
specify how much to change the size along each dimension. Below we see a triangle and
a house that have been doubled in both width and height (note, the area is more than
The scaling for the x dimension does not have to be the same as the y dimension. If these
are different, then the object is distorted. What is the scaling in each dimension of the
pictures below?
And if we double the size, where is the resulting object? In the pictures above, the
scaled object is always shifted to the right. This is because it is scaled with respect to the
origin. That is, the point at the origin is left fixed. Thus scaling by more than 1 moves the
object away from the origin and scaling of less than 1 moves the object toward the origin.
This is because of how basic scaling is done. The above objects have been scaled
simply by multiplying each of its points by the appropriate scaling factor. For example,
the point p = (1.5, 2) has been scaled by 2 along x and 0.5 along y. Thus, the new point is
q = (2*1.5, 5*2) = (1, 1).
Matrix/Vector Representation of Scaling
Scaling transformations are represented by matrices. For example, the above scaling of 2
and 0.5 is represented as a matrix:
s x
scale matrix : s  
s x
po int : q  
 2 0
sy 
 0 5
0 x
 x  sx 
s y   y
y  sy 
Scaling about a Particular Point
What do we do if we want to scale the objects about their center as show below?
Let the fixed point (xf, yf) be the center of the object, then the equation for scaling with
respect to (xf, yf) is given by
x   x  x f  s x  x f
y   y  y f  s y  y f
6.6 Rotation
Consider rotation of a point (x,y) with respect to origin in the anti clock wise
direction. Let x , y   be the new point after rotation and let the angular displacement (ie.
Angle of rotation) be  as shown in figure.
Let  be the distance of the points from the origin. And let  be the angle
between x-axis and the line joining the point (x,y) to the origin.
Now applying trigonometric identities, we get the following equations for x , y  
x    cos      cos  cos   sin  sin 
y    sin       cos  sin    sin  cos
Similarly for (x,y), we get the following equation
x   cos 
y   sin 
---- (a)
Substituting (b) in (a), we the get equation for rotating a point with respect to origin as
x   x cos  y sin 
y   x sin   y cos
Matrix/Vector Representation of Translations
 x   cos
 y    sin 
  
 sin    x 
cos   y 
Now suppose we want to rotate an object with respect to some fixed point (xf,yf)
as shown in the following figure. Then what will be the equation for rotation for a point
with respect to the fixed point (xf,yf).
The equation for rotation of a point with respect to a fixed point (xf,yf) can be given as
x   x f  x  x f cos  y  y f sin 
y   y f  x  x f sin   y  y f cos
6.7 Shear
A transformation that distorts the shape of an object such that the transformed
shape appears as if the object were composed of internal layers that had been caused to
slide over each other is called a shear.
An x-direction shear relative to x-axis can be given as
x   x  sh x  y
y  y
Similarly, y-direction shear relative to y-axis can be given as
x  x
y   y  sh y  x
Matrix/Vector Representation of Shearing
In matrix representation, the x-direction shear equation can be given as
 x   1 sh x   x 
 y   0 1   y 
  
  
Similarly, the y-direction shear can be given as
 x  1
 y    sh
   y
0  x 
1  y 
6.8 Combining Transformations
We saw that the basic scaling and rotating transformations are always with respect
to the origin. To scale or rotate about a particular point (the fixed point) we must first
translate the object so that the fixed point is at the origin. We then perform the scaling or
rotation and then the inverse of the original translation to move the fixed point back to its
original position. For example, if we want to scale the triangle by 2 in each direction
about the point fp = (1.5,1), we first translate all the points of the triangle by T = (-1.5,1),
scale by 2 (S) , and then translate back by -T=(1.5,1). Mathematically this looks like
 x   2 0   x1   1.5  1.5
q   2  
   y     1     1 
  1  
  
 2 
Order Matters!
Notice the order in which these transformations are performed. The first
(rightmost) transformation is T and the last (leftmost) is -T. If you apply these
transformations in a different order then you will get very different results. For example,
what happens when you first apply T followed by -T followed by S? Here T and -T
cancel each other out and you are simply left with S
Sometimes (but be careful) order does not matter, For example, if you apply multiple 2D
rotations, order makes no difference:
R1 R2 = R2 R1
But this will not necessarily be true in 3D!!
6.9 Homogeneous Coordinates
In general, when you want to perform a complex transformation, you usually
make it by combining a number of basic transformations. The above equation for q,
however, is awkward to read because scaling is done by matrix multiplication and
translation is done by vector addition. In order to represent all transformations in the
same form, computer scientists have devised what are called homogeneous coordinates.
Do not try to apply any exotic interpretation to them. They are simply a mathematical
trick to make the representation be more consistent and easier to use.
Homogeneous coordinates (HC) add an extra virtual dimension. Thus 2D HC are
actually 3D and 3D HC are 4D. Consider a 2D point p = (x,y). In HC, we represent p as p
= (x,y,1). An extra coordinate is added whose value is always 1. This may seem odd but
it allows us to now represent translations as matrix multiplication instead of as vector
addition. A translation (dx, dy) which would normally be performed as
 x   dx 
q  
 y   dy 
is now written as
 x 
1 0 dx   x 
q   y   T  p  0 1 dy    y 
 1 
0 0 1   1 
Now, we can write the scaling about a fixed point as simply a matrix multiplication:
q = (-T) S T p = A p,
where A = (-T) S T
The matrix A can be calculated once and then applied to all the points in the
object. This is much more efficient than our previous representation. It is also easier to
identify the transformations and their order when everything is in the form of matrix
The matrix for scaling in HC is
s x
S   0
 0
and the matrix for rotation is
R   sin 
 0
 sin 
Let us Sum Up
In this lesson we have learned about two dimensional geometric transformations.
Lesson-end Activities
Do it yourself: - What are the points and edges in this picture of a house? What are
the transformations required to move this house so that the peak of the roof is at the
origin? What is required to move the house as shown in animation?
Points for Discussion
After learning this lesson, try to discuss among your friends and answer these
questions to check your progress.
a) Define two dimensional translation
b) Discuss about the rotation with respect to a fixed point
Model answers to “Check your Progress”
To check your progress, try to answer the following questions
a) Define scaling with respect to a fixed point
b) What is the need of homogenous coordinate systems
1. Chapter 4 of William M. Newman, Robert F. Sproull, “Principles of
Interactive Computer Graphics”, Tata-McGraw Hill, 2000
2. Chapter 4 of Steven Harrington, “Computer Graphics – A programming
approach”, McGraw Hill, 1987
3. Chapter 5 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C
Version”, Pearson Education, 2007
4. Chapter 6 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006
5. Chapter 5 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer
Graphics – principles and practice”, Addison-Wesley, 1997
7.1 Aim and Objectives
7.2 Introduction
7.3 Line Clipping
Clipping Individual Points
Simultaneous Equations
Cohen-Sutherland Line Clipping
Liang-Barsky Line Clipping
7.4 Viewing
Window To Viewport Transformation
Viewport to Physical Device Transformation
7.5 Let us Sum Up
7.6 Lesson-end Activities
7.7 Points for Discussion
7.8 Model answers to “Check your Progress”
7.9 References
7.1 Aims and Objectives
The aim of this lesson is to learn the concept of two dimensional viewing and line
The objectives of this lesson are to make the student aware of the following
Window to Viewport Transformation
And lineclipping
7.2 Introduction
Clipping refers to the removal of part of a scene. Internal clipping removes parts of a
picture outside a given region; external clipping removes parts inside a region. We'll
explore internal clipping, but external clipping can almost always be accomplished as a
There is also the question of what primitive types can we clip? We will consider line
clipping and polygon clipping. A line clipping algorithms takes as input two endpoints of
line segment and returns one (or more) line segments. A polygon clipper takes as input
the vertices of a polygon and returns one (or more) polygons.
There are other issues in clipping and some of these are:
Text character clipping
Scissoring -- clips the primitive during scan conversion to pixels
Bit (Pixel) block transfers (bitblts/pixblts)
o Copy a 2D array of pixels from a large canvas to a destination window
o Useful for text characters, pulldown menus, etc.
7.3 Line Clipping
Line clipping is the process of removing lines or portions of lines outside of an
area of interest. Typically, any line or part thereof which is outside of the viewing area is
This section treats clipping of lines against rectangles. Although there are
specialized algorithms for rectangle and polygon clipping, it is important to note that
other graphic primitives can be clipped by repeated application of the line clipper.
7.3.1 Clipping Individual Points
Before we discuss clipping lines, let's look at the simpler problem of clipping
individual points.
If the x coordinate boundaries of the clipping rectangle are Xmin and Xmax, and
the y coordinate boundaries are Ymin and Ymax, then the following inequalities must be
satisfied for a point at (X,Y) to be inside the clipping rectangle:
Xmin < X < Xmax
and Ymin < Y < Ymax
If any of the four inequalities does not hold, the point is outside the clipping rectangle.
7.3.2 Simultaneous Equations
To clip a line, we need to consider only its endpoints, not its infinitely many
interior points. If both endpoints of a line lie inside the clip rectangle, the entire line lies
inside the clip rectangle and can be trivially accepted. If one endpoint lies inside and one
outside, the line intersects the clip rectangle and we must compute the intersection point.
If both endpoints are outside the clip rectangle, the line may or may not intersect with the
clip rectangle, and we need to perform further calculations to determine whether there are
any intersections.
The brute-force approach to clipping a line that cannot be trivially accepted is to
intersect that line with each of the four clip-rectangle edges to see whether any
intersection points lie on those edges; if so, the line cuts the clip rectangle and is partially
inside. For each line and clip-rectangle edge, we therefore take the two mathematically
infinite lines that contain them and intersect them. Next, we test whether this intersection
point is "interior" -- that is, whether it lies within both the clip rectangle edge and the
line; if so, there is an intersection with the clip rectangle.
7.3.3 Cohen-Sutherland Line Clipping
The Cohen-Sutherland algorithm clips a line to an upright rectangular window.
The algorithm extends window boundaries to define 9 regions:
top-left, top-center, top-right,
center-left, center, center-right,
bottom-left, bottom-center, and bottom-right.
See figure 1 below. These 9 regions can be uniquely identified using a 4 bit code, often
called an outcode. We'll use the order: left, right, bottom, top (LRBT) for these four bits.
In particular, for each point
Left (first) bit is set to 1 when p lies to left of window
Right (second) bit is set to 1 when p lies to right of window
Bottom (third) bit is set to 1 when p lies below window
Top (fourth) bit set is set to 1 when p lies above window
The LRBT (Left, Right, Bottom, Top) order is somewhat arbitrary, but once an order is
chosen we must stick with it. Note that points on the clipping window edge are
considered inside (the bits are left at 0).
Figure 1: The nine region defined by an upright window and their outcodes.
Given a line segment with end points
flow of the Cohen-Sutherland algorithm:
, here's the basic
1. Compute 4-bit outcodes LRBT0 and LRBT1 for each end-point
2. If both outcodes are 0000, the trivially visible case, pass end-points to draw
routine. This occurs when the bitwise OR of outcodes yields 0000.
3. If both outcodes have 1's in the same bit position, the trivially invisible case, clip
the entire line (pass nothing to the draw routine). This occurs when the bitwise
AND of outcodes is not 0000.
4. Otherwise, the indeterminate case, - line may be partially visible or not visible.
Analytically compute the intersection of the line with the appropriate window
Let's explore the indeterminate case more closely. First, one of two end-points must be
outside the window, pretend it is
1. Read P1's 4-bit code in order, say left-to-right.
2. When a set bit (1) is found, compute intersection point I of corresponding
window edge with line from p0 to p1.
As an example, pretend the right bit is set so we want to compute the intersection with the
right clipping window edge, also, pretend we've already done the homogeneous divide, so
the right edge is x=1, and we need to find y. The y value of the intersection is found by
substituting x=1 into the line equation (from p0 to p1)
and solving for y
Other cases are handled similarly.
7.3.4 Liang-Barsky Line Clipping
Liang and Barsky have created an algorithm that uses floating-point arithmetic
but finds the appropriate end points with at most four computations. This algorithm uses
the parametric equations for a line and solves four inequalities to find the range of the
parameter for which the line is in the viewport.
be the line which we want to study. The parametric equation of the
line segment from gives x-values and y-values for every point in terms of a parameter
that ranges from 0 to 1. The equations are
We can see that when t = 0, the point computed is P(x1,y1); and when t = 1, the point
computed is Q(x2,y2).
1. Set
2. Calculate the values of tL, tR, tT, and tB (tvalues).
ignore it and go to the next edge
otherwise classify the tvalue as entering or exiting value (using inner
product to classify)
if t is entering value set
; if t is exiting value set
3. If
then draw a line from (x1 + dx*tmin, y1 + dy*tmin) to (x1 +
dx*tmax, y1 + dy*tmax)
4. If the line crosses over the window, you will see (x1 + dx*tmin, y1 + dy*tmin)
and (x1 + dx*tmax, y1 + dy*tmax) are intersection between line and edge.
Example 1 - Line Passing Through Window
The next step we consider if tvalue is entering or exiting by using inner product.
(Q-P) = (15+5,9-3) = (20,6)
At left edge (Q-P)nL = (20,6)(-10,0) = -200 < 0 entering so we set tmin = 1/4
At right edge (Q-P)nR = (20,6)(10,0) = 200 > 0 exiting so we set tmax = 3/4
then we draw a line from (-5+(20)*(1/4), 3+(6)*(1/4)) to (5+(20)*(3/4), 3+(6)*(3/4))
Example 2 - Line Not Passing Through Window
The next step we consider if tvalue is entering or exiting by using inner product.
(Q-P) = (2+8,14-2) = (10,12)
At top edge (Q-P)nT = (10,12)(0,10) = 120 > 0 exiting so we set tmax = 8/12
At left edge (Q-P)nL = (10,12)(-10,0) = -100 < 0 entering so we set tmin = 8/10
Because tmin > tmax then we don't draw a line.
7.4 Viewing
When we define an image in some world coordinate system, to display that image
we must map the image to the physical output device. This is a two stage process. For 3
dimensional images we must first determine the 3D camera viewpoint, called the View
Reference Point (VRP) and orientation. Then we project from 3D to 2D, since our display
device is 2 dimensional. Next, we must map the 2D representation to the physical device.
We will first discuss the concept of a Window on the world (WDC), and then a Viewport
(in NDC), and finally the mapping WDC to NDC to PDC.
7.4.1 Window
When we model an image in World Device Coordinates (WDC) we are not
interested in the entire world but only a portion of it. Therefore we define the portion of
interest which is a polygonal area specified in world coordinates, called the "window".
Example: Want to plot x vs. cos(x) for x between 0.0
and 2Pi. Now cos x will be between -1.0 and +1.0. So
we want the window as shown here.
The command to set a window is Set_window2( Xwmin, Xwmax, Ywmin, Ywmax ). So
for plot above use the following command:
Set_window2(0, 6.28, -1.0, +1.0 )
We can use the window to change the apparent size and/or location of objects in
the image. Changing the window affects all of the objects in the image. These effects are
called "Zooming" and "Panning".
a) Zooming
Assume you are drawing a house:
Now increase the window size and the
house appears smaller, i.e., you have
zoomed out:
Set_window( -60, +60, -30, +30 )
If you decrease the window size the
house appears larger, i.e., you have
zoomed in:
Set_window( -21, +21, -11, +11 )
So we can change the apparent size of an image, in this case a house, by changing the
window size.
b) Panning
What about the position of the image?
A. Set_window(-40, +20,-15,+15)
B. Set_window(-20,+40,-15,+15)
Moving all objects in the scene by changing the window is called "panning".
7.4.2 Viewport
The user may want to create images on different parts of the screen so we define a
viewport in Normalized Device Coordinates (NDC). Using NDC also allows for output
device independence. Later we will map from NDC to Physical Device Coordinates
Normalized Device Coordinates: Let the entire display
surface have coordinate values 0.0 <= x,y <= 1.0
Command: Set_viewport2(Xvmin,Xvmax,Yvmin,Yvmax)
To draw in bottom 1/2 of screen Set_viewport2( 0.0, 1.0, 0.0, 0.5)
To draw in upper right hand corner: Set_viewport2( 0.5, 1.0, 0.5,
1.0 )
We can also display multiple images in different viewports:
Set_window( -30, +30, -15, +15);
Set_viewport(0.0, 0.5, 0.0, 0.5); -- lower
Set_viewport(0.5, 1.0, 0.0, 0.5); -- lower
Set_viewport(0.0, 0.5, 0.5, 1.0); -- upper
Set_viewport( 0.5, 1.0, 0.5, 1.0); -- upper
This is gives the image as shown here.
7.4.3 2D Window To Viewport Transformation
The 2D viewing transformation performs the mapping from the window (WDC)
to the viewport (NDC) and to the physical output device (PDC). Usually all objects are
clipped to the window before the viewing transformation is performed.
We want to map a point from WDC to NDC, as shown below:
We can see from above that to maintain relative position we must have the
following relationship:
 V
We can rewrite above as
X V  X VMIN 
 X VMIN X W  X WMIN 
 S X  X W  X WMIN  X VMIN
 S Y  YW  YWMIN  YVMIN
where S X 
 X VMIN 
Y  YVMIN 
and S Y  VMAX
Note that Sx,
"scaling" factors.
If Sx = Sy the objects will
retain same shape, else will
be distorted, as shown in the
7.4.4 Viewport to Physical Device Transformation
Now we need to transform to Physical Device Coordinates (PDC), which we can
do by just multiplying the Normalized Device Coordinates (NDC) by the resolution in
Xp = Xv * Xnum
Yp = Yv * Ynum
Note: Remember the aspect ratio problem, e.g., for CGA mode 6 (640 x 200) =>
2.4 horizontal pixels = 1 vertical pixel.
Therefore: 200 vertical pixels = 480 horizontal. pixels
So use Ynum = 199 {0 => 199}
Xnum = 479 {0 =>479}
Also have problem with 0, 0 being upper left rather than lower left so actual equation
used is:
Xp = Xv * Xnum
Yp = Ynum - Yv * Ynum
As a check if
Xv = 0.0 => Xp = 0 ( left )
Xv = 1.0 => Xp = 479 ( right )
Yv = 0.0 => Yp = 199 - 0 = 199 (Bottom)
Yv = 1.0 => Yp = 199 - 199 = 0 (Top)
7.5 Let us Sum Up
In this lesson we have learnt about two dimensional viewing and line clipping.
7.6 Lesson-end Activities
After learning this lesson, try to discuss among your friends and answer these
questions to check your progress.
a) Need of line clipping
b) How window to viewport transformation is done
7.7 Points for Discussion
Discuss the following
a) Liang-Barsky Line Clipping
b) Cohen-Sutherland
7.8 Model answers to “Check your Progress”
In order to check your progress, try to answer the following
a) Define Viewport
b) Define Window
c) Discuss about window to viewport transformation
7.9 References
1. Chapter 5 of William M. Newman, Robert F. Sproull, “Principles of
Interactive Computer Graphics”, Tata-McGraw Hill, 2000
2. Chapter 6 of Steven Harrington, “Computer Graphics – A programming
approach”, McGraw Hill, 1987
3. Chapter 6 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C
Version”, Pearson Education, 2007
4. Chapter 5 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006
5. Chapter 5 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer
Graphics – principles and practice”, Addison-Wesley, 1997
8.1 Aims and Objectives
8.2 Introduction
8.3 Modes of Input
Request Mode
Sample Mode
Event Mode
8.4 Classes of Logical Input
8.5 Software Techniques
Modular Constraints
Directional Constraints
Gravity Field Effect
Scales and Guidelines
8.6 Let us Sum Up
8.7 Lesson-end Activities
8.8 Points for Discussion
8.9 Model answers to “Check your Progress”
8.10 References
8.1 Aims and Objectives
The aim of this lesson is to learn the concept of GUI and interactive methods of
The objectives of this lesson are to make the student aware of the following concepts
Modes of input
Classes of logical input
And Software techniques.
8.2 Introduction
Most of the programs written today are interactive to some extent. The days when
programs were punched on cards and left in a tray to be collected and run by the
computer operators, who then returned cards and printout to the users' pigeonholes
several hours later, are now past. `Batch-processing', as this rather slow and tedious
process was called, may be a very efficient use of machine time but it is very wasteful of
programmers' time, and as the cost of hardware falls and that of personnel raises so
installations move from batch to interactive use. Interactive use generally results in a less
efficient use of the mainframe computer, but gives the programmer a much faster
response time, and so speeds up the development of software.
If you are not sharing a mainframe, but using your own microcomputer, then for
most of the time the speed is limited by the human response time, not that of the
computer. When you come to graphics programs, there are some additional modes of
graphics input in addition to the normal interactive input you have used before. For
example, GKS has three modes of interactive input and six classes of logical input. These
are described here, since they are typical of the type of reasoning required to write such
8.3 Modes of Input
8.3.1 Request Mode
This is the mode you will find most familiar. The program issues a request for
data from a device and then waits until it has been transferred. It might do this by using a
`Read' statement to transfer characters from the keyboard, in which case the program will
pause in its execution and wait until the data has been typed on the keyboard and the
return key pressed to indicate the end of the input. The graphical input devices such as
mouse, cursor or digitizing tablet can also be programmed in this way.
8.3.2 Sample Mode
In this case the input device sends a constant stream of data to the computer and
the program samples these values as and when it is ready. The excess data is overwritten
and lost. A digitising tablet may be used in this way - it will continually send the latest
coordinates of its puck position to the buffer of the serial port and the program may copy
values from this port as often as needed.
8.3.3 Event Mode
This is similar to sample mode, but no data is lost. Each time the device transmits
a value, the program must respond. It may do so by placing the value in an event queue
for later processing, so that the logic of the program is very similar to sample mode, but
there may also be some data values which cause a different response. This type of
interrupt can be used to provide a very powerful facility.
8.4 Classes of Logical Input
8.4.1 Locator
This inputs the (x,y) coordinates of a position. It usually comes from a cursor,
controlled either by keys or by a mouse, and has to be transferred from Device
Coordinates to Normalised Device Coordinates to World Coordinates. If you have several
overlapping viewports, they must be ordered so that the one with the highest priority can
be used to calculate these transformations. Each pixel position on the screen must
correspond to a unique value in world coordinates. It need not remain the same
throughout the running of the program, since the priorities of the viewports may be
changed. At every moment there must be a single unambiguous path from cursor position
to world coordinates.
8.4.2 Pick
This allows the user to identify a particular object or segment from all those
displayed on the screen. It is usually indicated by moving the cursor until it coincides
with the required object, and then performing some other action such as pressing a mouse
button or a key on the keyboard to indicate that the required object is now identified. The
value transferred to this program is usually a segment identifier.
8.4.3 Choice
This works in a very similar manner to the pick input. You now have a limited set
of choices, as might be displayed in a menu, and some means of indicating your choice.
Only one of the limited list of choices is acceptable as input, any attempt to choose some
other segment displayed on the screen will be ignored.
8.4.4 Valuator
This inputs a single real number by some means, the simplest method being
typing it in from the keyboard.
8.4.5 String
This inputs a string of characters, again the simplest method is to type them in
from the keyboard.
8.4.6 Stroke
This inputs a series of pairs of (x,y) coordinates. The combination of Stroke, Input
and Sample Mode from a digitising tablet is a very fast method of input.
Most of the terminals or microcomputers you will meet will have some form of
cursor control for graphic input. You can write your programs using which ever
combination of logical input class and mode is most convenient. Alternatively, you could
ignore all forms of graphic input and merely rely on `Read' statements and data typed
from the keyboard. The choice is yours.
8.5 Software Techniques
8.5.1 Locating
Probably you have all used software in which the cursor is moved around the
screen by means of keys or a mouse. The program may well give the impression that the
cursor and mouse are linked together so that any movement of the mouse is automatically
indicated by movement of the cursor on the screen. In fact, this effect is achieved by
means of a graphics program which has to read in the new coordinates indicated by the
mouse, delete the previous drawing of the cursor and then redraw it at the new position.
This small program runs very quickly and gives the impression of a continuous process.
Usually this software also contains a test for input from the keyboard and when a
suitable key is pressed, the current position of the cursor is recorded. This allows fast
input of a number of points to form a picture or diagram on the screen. Some means of
storing the data and terminating the program is also required.
Such points are recorded to the full accuracy of the screen, which has both
advantages and disadvantages. If you are using a digitising tablet instead of a mouse, then
the accuracy is even greater and the resulting problems even more extreme. You very
seldom want to record information to the nearest 0.1mm, usually to the nearest millimetre
is quite sufficient. Problems arise when you want to select the same point a second time.
Whatever accuracy you have chosen, you must be able to indicate the point to this
accuracy in order to reselect it, as you might need to do if you had several lines meeting
at a point. To achieve this more easily, software involving the use of various types of
constraint may be used to speed up the input process.
8.5.2 Modular Constraints
In this case, you should imagine a grid, which may be visible or invisible, placed
across the screen. Now, whenever you indicate a position with the cursor, the actual
coordinates are replaced by the coordinates of the nearest point on the grid. So to indicate
the same point a second time, you merely have to get sufficiently close to the same grid
point. Provided the grid allows enough flexibility to choose the shapes required in the
diagram, this gives much faster input.
8.5.3 Directional Constraints
These can be useful when you want some lines to be in a particular direction, such
as horizontal or vertical. You can write software to recalculate the coordinates so that a
line close to vertical becomes exactly vertical. You can choose whether this is done
automatically for every line within a few degrees of vertical or only applied when
requested by the user. If the constraint is applied automatically, then you can choose how
close the line must be to the required direction before it is moved and how the
recalculation is computed. You may wish to move both vertices by a small amount, or
one vertex by a larger amount, and if you are only moving one, you must specify some
rule or rules to decide which one.
8.5.4 Gravity Field Effect
The name implies that the line should be visualised as lying at the bottom of a
gravity well and points close to the line slide down on to it. As each line is added to the
diagram, a small area is defined which surrounds it. When a new point is defined which
lies inside the area, its actual coordinates are replaced by the coordinates of the nearest
point on the line.
There are two commonly used shapes for this area. In each case, along most of the
line, two parallel lines are drawn, one each side of the line and a small distance t from it.
In the one shape, each vertex at the end of the line is surrounded by a semi-circle of
radius t. In the other shape, each vertex is surrounded by a circle of radius greater than t,
giving a dumb-bell shape to the entire area. This second case expresses the philosophy
that users are much more likely to want to connect other lines to the vertices than to
points along the line.
8.5.5 Scales and Guidelines
Just as you may use a ruler when measuring distances on a piece of paper, so you
may wish to include software to calculate and display a ruler on the screen. The choice of
scales and the way in which the ruler is positioned on the screen must be decided when
the software is written.
8.5.6 Rubber-banding
This is name given to the technique where a line connects the previous point to
the present cursor position. This line expands or contracts like a rubber band as the cursor
is moved. To produce this effect, the lines must be deleted and re-drawn whenever the
cursor is moved.
8.5.7 Menus
Many programs display a menu of choices somewhere on the screen and allow the
user to indicate a choice of option by placing the cursor over the desired symbol.
Alternatively, the options could be numbered and the choice could be indicated by typing
a number on the keyboard. In either case, the resulting action will depend on the program.
8.5.8 Dragging
Many software packages provide a selection of commonly used shapes, and allow
the user to select a shape and use the cursor to drag a copy of the shape to any required
position in the drawing. Some packages continually delete and redraw the shape as it is
dragged, others only redraw it when the cursor halts or pauses.
8.5.9 Inking-in
Another type of software imitates the use of pen or paintbrush in leaving a track
as it is drawn across the paper. These routines allow the user to set the width and colour
of the pen and some also allow patterned `inks' in two or more colours. Then as the
cursor is moved across the screen, a large number of coordinates are recorded and the
lines joining these points are drawn as required.
All these techniques may be coded, using a combination of graphical input and
output. The success of such software depends very much on the user-interface. If it is
difficult or inconvenient to use, then as soon as something better comes along, the
previous software will be ignored. When designing your own graphical packages, you
need to have a clear idea of the purpose for which your package is designed and also the
habits and experience of the users for whom it is intended.
8.6 Let us Sum Up
In this lesson we have learnt about GUI and interactive import methods
8.7 Lesson-end Activities
After learning this lesson, try to discuss among your friends and answer these
questions to check your progress.
a) Explain about pointing
b) Explain about inking
8.8 Points for Discussion
Discuss the following
a) Locator
b) Stroke
8.9 Model answers to “Check your Progress”
In order to check your progress, try to answer the following
a) Modes of input
b) Classes of logical input
8.10 References
1. Chapter 11, 12, 13, 14 of William M. Newman, Robert F. Sproull, “Principles
of Interactive Computer Graphics”, Tata-McGraw Hill, 2000
2. Chapter 7 of Steven Harrington, “Computer Graphics – A programming
approach”, McGraw Hill, 1987
3. Chapter 8 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C
Version”, Pearson Education, 2007
4. Chapter 3 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006
5. Chapter 8, 9, 10 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer
Graphics – principles and practice”, Addison-Wesley, 1997
9.1 Aims and Objectives
9.2 Introduction
9.3 Descriptions of 3D Objects
9.4 Three-dimensional Drawings
Intensity Cues
Hidden-line and hidden-surface removal
Kinetic Depth Effect
Perspective Projections
Stereographic Projection
9.5 Projections into Two-dimensional Space
Parallel Projections
Isometric Projection
Perspective Projections
9.6 Let us Sum Up
9.7 Lesson-end Activities
9.8 Points for Discussion
9.9 Model answers to “Check your Progress”
9.10 References
9.1 Aims and Objectives
The aim of this lesson is to learn the concept of three dimensional graphics
The objectives of this lesson are to make the student aware of the following concepts
a) Description of 3D objects
b) Issues in 3D drawings
c) Projections
9.2 Introduction
In the following sections, we shall discuss the topics as though we were dealing
with idealised mathematical objects, points with position but no size and lines and planes
of zero thickness. Obviously this does not correspond to the real world where even the
thinnest plane is hundreds of atoms in thickness. However the ideas can be developed
without bothering about the effects of thickness, the need to specify whether we are
discussing the centre of the line or one of its outer edges, and these extra complications
can be considered later.
When we come to draw the resulting diagrams on paper or a computer screen, we
shall have to move from the mathematical ideal to a pattern of lines of known thickness
or of pixels on a screen which may be interpreted by those looking at them as a
representation of the mathematical ideal. In addition, we are attempting to represent the
idea of three-dimensional objects by a pattern of lines or dots in two-dimensions. There
are certain well-known techniques (you can call them tricks if you are feeling unkind)
which fool the human eye into imagining a solid three-dimensional object. These will
also be discussed in this section.
When dealing with three-dimensional graphics, three coordinates are needed to
specify a point uniquely. These are usually the coordinates (x,y,z) relating to a set of
Cartesian coordinates, but this is not essential. For example, polar coordinates may be
used and values quoted for latitude, longitude and radius. These will give a unique
position for each point and once again three values are needed to specify it.
Right and Left Handed Axes
The three-dimensional Cartesian coordinates may be either right-handed or lefthanded. To visualize this, you should hold up your right or left hand, with the thumb and
first two fingers at right angles to each other and this will demonstrate the direction of
these axes.
When we come to use these coordinates on a computer terminal, most systems
still use the two-dimensional version with the origin in the bottom left-hand corner of the
screen, the x-axis running from left to right along the bottom of the screen and the y-axis
from bottom to top up the left-hand side of the screen. A right-handed set of axes then has
the z-axis coming out of the screen towards the user and a left-handed set has the z-axis
going into the screen away from the user.
Most software then has some means of reducing the three-dimensional object to a
two-dimensional drawing in order to view the object. Possible means are to ignore the zvalue, thus giving a parallel or orthogonal projection onto the screen, to calculate a
perspective projection onto the screen or to take a slice through the object in the plane of
the screen. The projections may or may not have any provision for hidden-line or hiddensurface removal.
9.3 Descriptions of 3D Objects.
Consider a simple example of an object in three-dimensions, namely the cube
shown below:
Object in 3 Dimensions
To store sufficient information to produce this wire-frame drawing of the object,
you need to store the coordinates of the vertices and some means of indicating which
vertices are connected by straight lines. So one essential requirement is an array
containing the coordinates of the vertices and in the example given, the coordinates
describe a cube centred on the origin. To obtain any other cube, you need to apply shift,
scale and rotation transformations to the vertices before re-drawing the cube in its new
position. These transformations will be discussed in the next section.
Array of coordinates [x,y,z] for each vertex.
Name. Index.
The remaining information can be coded in many ways, one of which is as an
array of edges. For each vertex, you store the index numbers of the other vertices to
which it is joined and in this case when all edges are straight lines, there is no need to
store information on the type of curve used to join the vertices.
Array of edges.
From. To.
The first row of this array indicates that vertex 1(A) is joined to vertex 2(B), to
vertex 4(D) and to vertex 5(E). The second row indicates that vertex 2(B) is joined to
vertex 1(A), which is certainly correct, but wasteful, as it implies that this line is drawn
twice. Examination of the array shows that the same is true of all other lines in the
diagram. To save time by only drawing it once, you need to draw only those cases where
the order of the vertices is increasing, that is you omit all the connections marked with a
star when drawing the object.
These two arrays are sufficient to produce a wire-frame drawing such as that
shown in the above figure. However if you wish to discuss solid faces, or use any form of
shading or texturing, you will need to move to a more complex representation such as
that for boundary representation models used in geometric modelling. In this case, the
following data structure is appropriate.
1) Vertex.
Set of [x,y,z] coordinates.
2) Edge.
Start and end vertices.
Equation of curve joining them.
3) Loop.
List of edges making up the loop.
Direction in which they are drawn.
4) Face.
List of loops forming boundary.
Equation of surface.
Direction of outward normal(s).
5) Shell.
List of faces making up the shell.
6) Object.
List of shells making up surface of object.
One is identified as the `outer shell'.
When you come to consider the cube, this is a very simple object. All the edges
are straight lines and all the faces are planes. If you choose to define the loops as the
square outline made up of 4 edges, then each face has one loop as its boundary.
Alternatively, you could have two edges to a loop and then each face would require two
loops to specify its boundary. When you come to study geometric modelling, you will
find that there are often several equally correct solutions to any given problem.
The array of coordinates for the 8 vertices has
already been described.
There are 12 edges, all straight lines joining pairs of vertices. They
may be traversed in either direction.
You may choose to specify 6 loops, each consisting of 4 edges. As an
example, the top face may be bounded by the loop consisting of edges
[ AE, EF, FB, BA ]. You will again find that each edge is traversed
twice, once in each direction, in the complete description of the object.
All solutions will have 6 faces and in one choice of loop, each face
will be bounded by one loop. It is usual to adopt a standard convention
connecting the direction
of circulation of the loops and the
directions of the outward-facing normals to the face. In this example,
each face will be a plane and will have a single direction for its
This consists of the 6 faces.
This is made up of the single shell and the volume contained within it.
9.4 Three-dimensional Drawings
When a three-dimensional object is represented by a two-dimensional drawing,
various techniques may be used to indicate depth. We shall think initially of wire-frame
drawings, but many of the same ideas can apply to shaded drawings of solid objects.
9.4.1 Intensity Cues
The points or lines closer to the viewpoint appear brighter if drawn on the screen
and are drawn with thicker lines when output to the plotter. A shaded drawing on the
screen can adjust the intensity, pixel by pixel, giving a result similar to a grey-scale
9.4.2 Hidden-line and hidden-surface removal
Hidden lines may be removed or indicated with dotted lines, thus leading to an
easier understanding of the shape.
9.4.3 Kinetic Depth Effect
Rotation of the object, combined with hidden-line removal gives a very realistic
effect. It is probably the best representation, but can only be produced at a specialpurpose graphics workstation since it requires considerable computing power to carry out
the hidden-line calculations in real time.
9.4.4 Perspective Projections
If we have some means of knowing the relative size of the objects, then the fact
that the perspective transformation makes the closer objects appear larger will give a
good effect of depth. If the objects are easily recognized then knowledge about their
relative sizes (e.g. a hill is usually larger than a house or a tree) will be interpreted as
information about their distance from the viewer. It is only when we have a number of
objects, such as cubes or spheres, which are completely separate in space and we have no
information on their relative size, that the perspective transformation cannot be
interpreted by the viewer in this way.
9.4.5 Stereographic Projection
In this case, we have two perspective projections, one for each eye. We need
some method of ensuring that each eye sees only its own view and then we can rely on
the human brain to merge the views together and let us see a three-dimensional object.
One method is to produce separate views at the correct distance and scale for use
with a stereographic viewer. This allows for black-and-white or colour drawings to be
seen in their true colour.
The other method is necessarily polychrome. It requires two perspective
projections, one from each eye position. Let us assume the view for the left eye is drawn
in one colour ( and the view for the right eye is drawn in another colour (e.g.
Each eye must see only its own view. So if the view from the left eye is drawn in
blue, and the right eye views the drawings through a blue filter then the blue lines will be
invisible to the right eye since they will blend into the white background when viewed
through a blue filter.
Similarly if the drawing for the right eye is in red, and the left eye has a filter of
the same colour, then the drawing for the right eye will be invisible to the left eye.
In the figure, the eyes are assumed to be distance 2e apart (usually about 3 inches)
and the plane onto which the pictures are projected is distance d from the eyes (frequently
12 to 15 inches). So for the left eye, we need to move the axes a distance e in the xdirection and then project onto the plane, and finally shift the drawing back again. Thus
the projection for the left eye means that the point (x,y,z) becomes the point ((x+e)*d/z e, y*d/z, 0).
For the right eye, the axes must be moved a distance -e and then the point (x,y,z)
is projected onto the plane and becomes ((x-e)*d/z + e, y*d/z , 0)
When this has been done for all the vertices, they are joined up and the object is
drawn in the appropriate colours.
9.5 Projections into Two-dimensional Space
We are dealing with objects defined in three-dimensional space, but all the
graphics devices to which we have access are two-dimensional. This means that we
require some way of representing (i.e drawing) the three-dimensional objects in twodimensions in order to see the results. It is possible to calculate the intersection of any
plane with the object and draw a sucession of slices, but it is usually easier to understand
what is going on if we calculate the projection of the three dimensional object on to a
given plane and then draw its projection.
There are two types of projection, parallel (usually orthographic) and perspective.
We shall discuss both of these in the remainder of this chapter and also consider some
other ways of giving the impression of a three-dimensional object in two-dimensions.
One of the most important is the removal of hidden lines or surfaces and this is discussed
in another section
9.5.1 Parallel Projections
The simplest example of an orthographic projection occurs when you project onto
the plane z=0. You achieve this by ignoring the values of the z-coordinates and drawing
the object in terms of its x and y coordinates only.
In general, an orthographic projection is carried out by drawing lines normal to
the specified plane from each of the vertices of the object and the projected points are the
intersections of these lines with the plane. Then the projected vertices are joined up to
give the two-dimensional drawing. (It is also possible to project the drawing onto a plane
which is not orthogonal (at right angles) to the direction of projection.)
Since calculating the intersections of general lines and planes is somewhat
tedious, you may instead apply transformations to the objects so that the plane onto
which you wish to project the drawing becomes the plane z=0. Then your final
transformation into two-dimensions is obtained by discarding the z-coordinates. A simple
example of an orthographic projection is shown in the figure below.
Orthographic Projection
There are several types of `axonometric' or parallel projections commonly in use.
Let us look at some of them:
Here the coordinate axes remain orthogonal when projected.
Two of the three axes are equally foreshortened when projected.
All three axes are equally foreshortened when projected.
The diagram below shows an example of a surface drawn using an isometric
projection. It used a right-handed set of axes with the Ox axis to the right and inclined at
30degrees to the horizontal, the Oy axis to the left and inclined at the same angle, while
the Oz axis is vertically upwards. The Oz axis has been drawn at the edges of the picture
to avoid over-writing the graph.
Example of an Isometric Drawing of a Surface
9.5.2 Isometric Projection
There are three stages in this projection.
1). Rotate through angle A about Oy axis.
2). Rotate through angle B about Ox axis.
3). Project onto plane z=0.
After this transformation, the unit vectors along the three axes must still be equal in
sinBsinA cosB
1 0
0 cosB
0 sinB
0 0
cosA 0 sinA
1 0
-sinA 0 cosA
0 0
-sinBcosA 0
Apply this transformation to the three unit vectors, namely xT=(1,0,0,1),
yT=(0,1,0,1) and zT = (0,0,1,1) and you get the vectors (cosA,sinBsinA,0,1), (0,cosB,0,1)
and (sinA,-sinBcosA,0,1). The magnitudes of the three vectors must be equal after the
transformation, which gives us the following equations for the length L
L = SQRT( cos 2A + sin 2A.sin2B +1 )
= SQRT( sin2A + sin2B.cos2A + 1)
= SQRT (cos2B + 1)
These equations can be re-arranged and solved for A and B. Eventually they give:
B = 35.26439 degrees, since sinB = 1/3 and A = 45 degrees, since cosA = 1/SQRT(2)
From these values of A and B we can calculate the transformation matrix
which is the transformation matrix for an isometric projection.
9.5.3 Perspective Projections
Perspective projections are often preferred because they make the more distant
objects appear smaller than those closer to the viewpoint. They involve more calculation
than parallel projections, but are often preferred for their greater realism. Note that a
parallel projection may be considered as a special case of the perspective projection
where the viewpoint is at infinity.
A perspective transformation produces a projection from a viewpoint E onto a
given plane. Because you can always move the axes to ensure that the plane coincides
with z=0 and the normal from the plane through the point E lies along the z-axis, you
may restrict the discussion to this simple case.
Perspective Projection
The above figure shows an example of the perspective projection from the point E
at (0,0,-d) to the z=0 plane.
The projection is obtained by joining E to each vertex in turn and finding the
intersection of this line with the plane z=0. The vertices are then joined by straight lines
to give the wire-frame drawing of the object in the plane.
This method of drawing the object, makes use of some of the well-known
properties of perspective projections, namely that straight lines are projected into straight
lines and facets ( A facet is a closed sequence of co-planar line segments, a polygon in
other words) are projected into facets. Parallel sets of lines may be projected into a set of
parallel lines or into a set of lines meeting at the `vanishing point'.
We may consider the equation of the projection either as the result of the
transformation matrix or derive it from the following diagram
Consider the diagram first. This shows the y=0 plane with the Ox and Oz axes.
The point of projection is E at the point (0,0,-d) on the z-axis, so the distance OE is of
length d. The point P (with values x and -z) projects into the point P' while the point Q
(with values X and Z) projects into the point Q'.
From the first set of similar triangles, we can see that d/x' = (d-z)/x and so x'=d*x/(d-z)
From the second set of similar triangles, we can see that d/X'=(d+Z)/x and so
Thus if we are careful to take the correct sign for z in each case, we can quote the general
x' = d*x/(d+z)
and we have a similar position for the y-coordinate when looking at the x=0 plane.
Now let us turn to the transformation matrix. In this case it becomes,
which gives the four equations.
H = (z+d)/d
To get back to the homogeneous coordinates, we need to make H=1 and so we have to
divide throughout by (z+d)/d. This gives:
X = d*x/(z+d),
Y = d*y/(z+d),
Z=0 and H=1
Hence we get the same expression for this derivation.
The closer the point of projection, E, is to the object, the more widely divergent are the
lines from E to the vertices and the greater the change in size of the projected object.
Conversely, the further away we move E, the closer the lines get to a parallel set and the
smaller the change in size of the object. Thus we may think of the parallel projection as
being an extreme case of perspective when the point of projection E is an infinite distance
from both the object and the plane.
This perspective projection is an example of a `single-point perspective' and the
consequence of this is shown in the next figure.
The one set of parallel lines forming edges of the cube meet at the vanishing point, while
the other sets meet at infinity (i.e. they remain parallel). The transformation matrix for
this projection may be written in the form given below, where r = 1/d.
T1 =
When we come to deal with two or three point perspectives, then we have two or
three sets of parallel lines meeting at their respective vanishing points. The matrices for
these are given below:
T2 =
T3 =
The following figure shows an example of a three-point perspective.
We now have enough information to specify the form of a general transformation matrix.
This divides into four areas, each of which relates to a different form of transformation.
shear, scale &
T2 usually zero
T2 is zero for all affine transformations and when we are dealing with perspective
projections, the number of non-zero elements in T2 will tell us whether it is a one-, twoor three-point perspective.
9.6 Let us Sum Up
In this lesson we have learnt about three dimensional concepts, object representation
and projections
9.7 Lesson-end Activities
After learning this lesson, try to discuss among your friends and answer these
questions to check your progress.
Define three types of projections
How to draw 3D objects in 2D screen
9.8 Points for Discussion
Discuss the following
How to represent 3D objects
Perspective Projections
9.9 Model answers to “Check your Progress”
In order to check your progress try to answer the following
Issues in three-dimensional Drawings
Parallel Projections
9.10 References
1. Chapter 20, 21, 22 of William M. Newman, Robert F. Sproull, “Principles of
Interactive Computer Graphics”, Tata-McGraw Hill, 2000
2. Chapter 8 of Steven Harrington, “Computer Graphics – A programming
approach”, McGraw Hill, 1987
3. Chapter 9, 10, 11, 12 of Donald Hearn, M. Pauline Baker, “Computer
Graphics – C Version”, Pearson Education, 2007
4. Chapter 7 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006
5. Chapter 6 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer
Graphics – principles and practice”, Addison-Wesley, 1997
6. Computer Graphics by Susan Laflin. August 1999.
Aims and Objective
Intersection Test
Angle Test
Linear Algorithm for Polygon Shading
Floodfill Algorithm for Polygon Shading
Polygon in detail
Plane Equations
Polygon meshes
Curved lines and surfaces
Let us Sum Up
Lesson-end Activities
Points for Discussion
Model answers to “Check your Progress”
10.1 Aims and Objectives
The aim of this lesson is to learn the concept of polygons, curved lines and surfaces.
The objectives of this lesson are to make the student aware of the following concepts
10.2 Introduction
A polygon is an area enclosed by a sequence of linear segments. There is no
restriction on the complexity of the shape produced by these segments, but the last point
must always be connected to the first one giving a closed boundary. This differs from
Polyline which may produce an open curve with the first and last points being any
distance apart. Since a polygon defines an area, then it is possible to decide whether any
other point is inside or outside the polygon and there are two very simple tests to
determine this.
The word polygon is a combination of two Greek words: "poly" means many and
"gon" means angle. Along with its angles, a polygon also has sides and vertices. "Tri"
means "three," so the simplest polygon is called the triangle, because it has three angles.
It also has three sides and three vertices. A triangle is always coplanar, which is
not true of many of the other polygons.
A regular polygon is a polygon with all angles and all sides congruent, or equal.
Here are some regular polygons.
We can use a formula to find the sum of the interior angles of any polygon. In this
formula, the letter n stands for the number of sides, or angles, that the polygon has.
sum of angles = (n – 2)180°
Let's use the formula to find the sum of the interior angles of a triangle.
Substitute 3 for n. We find that the sum is 180 degrees. This is an important fact
to remember.
sum of angles = (n – 2)180°
= (3 – 2)180° = (1)180° = 180°
To find the sum of the interior angles of a quadrilateral, we can use the formula
again. This time, substitute 4 for n. We find that the sum of the interior angles of a
quadrilateral is 360 degrees.
sum of angles = (n – 2)180°
= (4 – 2)180° = (2)180° = 360°
Polygons can be separated into triangles by drawing all the diagonals that can be
drawn from one single vertex. Let's try it with the quadrilateral shown here. From vertex
A, we can draw only one diagonal, to vertex D. A quadrilateral can therefore be separated
into two triangles.
If you look back at the formula, you'll see that n – 2 gives the number of triangles
in the polygon, and that number is multiplied by 180, the sum of the measures of all the
interior angles in a triangle. Do you see where the "n – 2" comes from? It gives us the
number of triangles in the polygon. How many triangles do you think a 5-sided polygon
will have?
Here's a pentagon, a 5-sided polygon. From vertex A we can draw two diagonals
which separates the pentagon into three triangles. We multiply 3 times 180 degrees to
find the sum of all the interior angles of a pentagon, which is 540 degrees.
sum of angles = (n – 2)180°
= (5 – 2)180° = (3)180° = 540°
The GKS Fillarea function has the same parameters as polyline, but will always
produce a closed polygon. The filling of this polygon depends on the setting of the
appropriate GKS parameter. The FillAreaStyles are hollow, solid, pattern and hatch. The
hollow style produces a closed polygon with no filling. Solid fills with a solid colour.
Pattern uses whatever patterns are offered by the particular system. Hatch will fill it with
lines in one or two directions. Algorithms for hatching and cross-hatching are described
in this section.
10.2.1 Intersection Test
Consider the situation illustrated in the figure below. To determine which of the
points Pi, (i=1,2 or 3) lie inside the polygon, it is necessary to draw a line from Pi in
some direction and project it beyond the area covered by the polygon. If the number of
intersections of the line from Pi with the sides of the polygon is an even number, then the
point lies outside the polygon. (Note that the line must start at Pi and may not extend
back behind Pi - it is an half-infinite line from Pi). This means that the triangle to the
right of the line Q4 Q5 in the figure counts as outside the polygon. If you wished to make
it doubly inside, you would have to introduce a parameter equal to the minimum number
of intersections of all half-infinite lines through Pi.
Intersection Test
It is then easy to see that the figure gives values of 0, 2 or 4 for lines through P1
with a minimum of 0. For P2, there are values of 1 or 3 with the minimum=1. All lines
from P3 have a value 2.
One possible problem arises when lines pass through a vertex. The line P2 Q5
must count the vertex Q5 as two intersections while P2 Q2 must only count Q2 once. The
easy way to avoid this problem is to omit all lines which pass through vertices. This still
leaves plenty of lines to test the position.
10.2.2 Angle Test
Here the point Pi is joined to all the vertices and the sum of the angles Qk Pi
Q(k+1) is calculated. If counter-clockwise angles are positive and clockwise ones are
negative, then for a point Pi outside the polygon there will be some positive angles and
some negative and the resulting sum will be zero.
For a point Pi inside the polygon, the points will be either all positive or all
negative and the sum will have a magnitude of 360 degrees. The next figure illustrates
this for the same polygon as the previous figure and a point P inside the triangle (but
outside the polygon).
Angle Test
Here the sum of angles is 2 * 360 degrees, thus implying that it is doubly inside
the polygon. To give consistency with the Intersection Test, this test must be carefully
worded. Having evaluated the sum of angles, it will be n * 360 degrees. If n is an even
number, then the point P lies outside the polygon, while if n is an odd number, then P lies
inside the polygon.
Once a unique method of deciding whether a point is inside or outside a polygon
has been agreed, it then becomes possible to derive algorithms to shade the inside of a
polygon. The two main methods here are linear, which is similar to shading the polygon
by drawing parallel lines with a crayon, and floodfill, which is similar to starting with a
blob of wet paint at some interior point and spreading it out to fill the polygon.
10.3 Linear Algorithm for Polygon Shading
Hatching a Triangle
This involves shading the polygon by drawing a series of parallel lines throughout
the interior. The lines may be close enough to touch, giving a solid fill or they may be a
noticeable distance apart, giving a hatched fill. If you have two sets of hatched lines at
right angles to each other, this gives a "cross-hatched" result. The figure shows a triangle
in the process of being hatch-filled with horizontal lines. For each horizontal scan-line,
the following process must be applied.
1. Assume (or check) that the edge of the screen is outside the polygon.
2. Calculate the intersections Pi of this horizontal line with each edge of the polygon and
store the coordinates of these intersections in an array.
3. Sort them into increasing order of one coordinate.
4. Draw the segments of the hatch-line from P1 to P2, P3 to P4 and so on. Do not draw
the intervening segments.
5. Repeat this process for each scan-line.
Problems at Vertices
(Note that this does not rely on the lines being horizontal, although scan-lines
parallel to one of the axes makes the calculation of the intersection points very much
easier). The figure shows one problem with this approach. The scan-line s1 will work
correctly since it has four distinct intersections, but the scan-line s2 has two coincident
intersection points at the vertex Q6. This is detectable since the number of intersection
points will be an odd number.
Looking at the vertices, you can see that moving from Q5 to Q6, y decreases as x
decreases and from Q6 to Q1, y increases as x decreases. In this case, the hatch lines have
the equation y=constant and so this reversal in the direction of y indicates a vertex which
must be included twice and consequently known as a Type 2 vertex. Q1 on the other hand
is a Type 1 vertex since y continues to increase when going from Q6 through Q1 to Q2. If
the shading uses vertical lines (x = constant) then it is necessary to study the behaviour of
x to determine the types of vertex.
If you have an odd number of intersections and only one of them coincides with a
vertex, then it is usually safe to assume that this value needs to be included twice. This
may save some time in your algorithm, and will shade most polygons successfully. The
full method, testing the type of vertex whenever a vertex is included in the intersection
list, will successfully shade even the few cases when two Type Two vertices appear in the
intersection list thus giving an even number of points and at least one segment incorrectly
The other problem case occurs when one of the sides of the polygon is parallel to
the direction of shading. Mathematically this has an infinite number of intersection
points, but computationally only the two end points should be entered in the array so that
the whole line is shaded as part of the interior.
10.4 Floodfill Algorithm for Polygon Shading
This works in terms of pixels and is applied after the lines forming the boundary
have been converted into pixels by a DDA algorithm. The background of the screen has
one pixel-value, called "old-value" and the points forming the boundary have another,
called "edge-value". The aim of the algorithm is to change all interior pixels from "oldvalue" to "new-value". (e.g. from black to red) Assume the following software is
a) A function Read-Pixel(x,y) which takes device coordinates (x,y) and returns the value
of the pixel at this position.
b) A routine Write-Pixel(x,y,p) which sets the new value p to the pixel at the position
(x,y) in device coordinates.
Then, starting at the designated seed point, the algorithm moves out from it in all
directions, stopping when an "edge value" is found. Each pixel with value "old value" is
changed to "new-value". The recursive method stops when all directions have come to an
"edge value".
Because this method is applied to pixels on the screen or in the display buffer, it
may run into problems arising from the quantization into pixels of a mathematical line
which is infinitely thin and recorded to the full accuracy of a floating-point number
within the computer.
Intersecting Lines
One such problem concerns the method of identifying an intersection of two lines.
If you calculate it mathematically, then the equation will give a correct result unless the
lines are parallel or nearly parallel. On the other hand, on some hardware it may be
quicker to check whether the two lines have any pixels in common and this can be
dangerously misleading in some cases. The previous figure shows two lines, one at an
angle of 45o and the other at an angle of 135o which cross near the centre of the diagram
without having any pixels in common. This type of problem is unlikely to affect the
Floodfill routine given above, since the scan-lines move parallel to the x and y axes and
the DDA algorithm described earlier ensures that every line has at least one pixel
illuminated on each scan-line.
However the next figure illustrates another possible problem. Note that in a
complex polygon with sides crossing each other, you will need one seed point in each
section of the interior to floodfill the whole area. This also occurs in a polygon as shown
in below, even though it does not have any of its sides, indicated by the lines in the
figure, crossing each other.
Quantisation of Pixels
Mathematically it is all one contiguous area and any tests on the equations of the
sides for intersections will confirm this. However two of the lines are nearly parallel and
very close together and consequently although both lines are quite separate and distinct in
their mathematical equations, they lead to the same row of pixels after quantisation. The
scale of this figure has been enlarged so that the quantisation into pixels appears very
coarse in order to emphasize this problem. This polygon will require two seed points in
order to shade it completely using the Floodfill algorithm. A little thought will allow you
to produce many other similar examples and these can readily be studied by drawing the
polygons on squared paper and then marking in the pixel patterns which result.
This approach remains of interest in spite of its problems because some terminals
provide a very fast hardware polygon fill from a given seed point. Similarly, some
microcomputers provide a function to fill the interior of a triangle. To use this facility,
you must first split the polygon into triangles and while this is easy for a convex polygon
(one whose internal angles are all less than 180o) it is very much more difficult for the
general case where you may have sides crossing each other and holes inside the polygon.
10.5 Polygon in detail
What is a Polygon?
A closed plane figure made up of several line segments that are joined together. The sides
do not cross each other. Exactly two sides meet at every vertex.
Types of Polygons
Regular - all angles are equal and all sides are the same length. Regular polygons are
both equiangular and equilateral.
Equiangular - all angles are equal.
Equilateral - all sides are the same length.
Convex - a straight line drawn through a convex polygon crosses at most
two sides. Every interior angle is less than 180°.
Concave - you can draw at least one straight line through a concave
polygon that crosses more than two sides. At least one interior angle is
more than 180°.
Polygon Formulas
(N = # of sides and S = length from center to a corner)
Area of a regular polygon = (1/2) N sin(360°/N) S2
Sum of the interior angles of a polygon = (N - 2) x 180°
The number of diagonals in a polygon = 1/2 N(N-3)
The number of triangles (when you draw all the diagonals from one vertex) in a
polygon = (N - 2)
Polygon Parts
Side - one of the line segments that make up the
Vertex - point where two sides meet. Two or more
of these points are called vertices.
Diagonal - a line connecting two vertices that isn't
a side.
Interior Angle - Angle formed by two adjacent
sides inside the polygon.
Exterior Angle - Angle formed by two adjacent
sides outside the polygon.
Special Polygons
Special Quadrilaterals - square, rhombus, parallelogram, rectangle, and the trapezoid.
Special Triangles - right, equilateral, isosceles, scalene, acute, obtuse. Names
Generally accepted names
Names for other polygons have been proposed.
Nonagon, Enneagon
Undecagon, Hendecagon
Tridecagon, Triskaidecagon
Tetradecagon, Tetrakaidecagon
Pentadecagon, Pentakaidecagon
Hexadecagon, Hexakaidecagon
Heptadecagon, Heptakaidecagon
Octadecagon, Octakaidecagon
Enneadecagon, Enneakaidecagon
Hectogon, Hecatontagon
To construct a name, combine the prefix+suffix
Sides Prefix
20 Icosikai...
Sides Suffix
+1 ...henagon
+2 ...digon
+3 ...trigon
+6 ...hexagon
+7 ...heptagon
+8 ...octagon
+4 ...tetragon
+5 ...pentagon
+9 ...enneagon
46 sided polygon - Tetracontakaihexagon
28 sided polygon - Icosikaioctagon
However, many people use the form n-gon, as in 46-gon, or 28-gon instead of these
10.6 Plane Equations
This is another useful way to describe planes. It is known as the cartesian form
of the equation of a plane because it is in terms of the cartesian coordinates x, y and z.
The working below follows on from the pages in this section on finding vector equations
of planes and equations of planes using normal vectors.
The form Ax + By + Cz = D is particularly useful because we can arrange things so that
D gives the perpendicular distance from the origin to the plane.
To get this nice result, we need to work with the unit normal vector. This is the vector
of unit length which is normal to the surface of the plane. (There are two choices here,
depending on which direction you choose, but one is just minus the other).
I'll call this unit normal vector n.
Next we see how using n will give us D, the perpendicular distance from the origin to the
plane. In the picture below, P is any point in the plane. It has position vector r from the
origin O.
Now we work out the dot product of r and n. This gives us r.n = |r||n|cos A.
But |n| = 1 so we have r.n = |r|cos A = D. This will be true wherever P lies in the plane.
Next, we split both r and n into their components.
We write
r = xi + yj + zk and n = n1i + n2j + n3k.
r.n = (xi + yj + zk) . (n1i + n2j + n3k) = D
r.n = xn1 + yn2 + zn3 = D.
We see that n1, n2 and n3 (the components of the unit surface normal vector) give us the
A, B and C in the equation Ax + By + Cz = D.
A numerical example
I've put this in here so that you can see everything actually happening and see how it ties
back to the earlier pages in this section.
We start with the plane I show below.
We'll let
s = i - 6j + 2k and t = 2i - 2j - k
We'll take m, the position vector of the known point M in the plane, to be
m = 2i + 3j + 5k.
P is any point in the plane, with OP = r = xi + yj + zk.
First, we find N, a normal vector to the plane, by working out the cross product of s and t.
This gives s x t = 10i + 5j + 10k = N.
The length of this vector is given by the square root of (102 + 52 + 102) = 15.
So the unit normal vector, n, is given by
n = 1/15(10i + 5j + 10k) = 2/3i +1/3j + 2/3k.
Now we use n.r = n.m = D to get the equation of the plane.
This gives us
(2/3i +1/3j + 2/3k).(xi + yj + zk) = (2/3i +1/3j + 2/3k).(2i + 3j + 5k)
or 2/3x + 1/3y + 2/3z = 4/3 + 3/3 + 10/3 = 17/3.
The perpendicular distance of this plane from the origin is 17/3 units.
So what would have happened if we had found the equation of the plane using the first
normal vector we found?
Using N.r = N.m gives
(10i + 5j + 10k).(xi + yj + zk) = (10i + 5j + 10k).(2i + 3j + 5k)
or 10x + 5y + 10z = 20 + 15 + 50 = 85.
It is exactly the same equation as the one we found above except that it is
multiplied through by a factor of 15, and 85 gives us 15 times the perpendicular distance
of the origin from the plane.
Also, are you confident that you will get the same equation for the plane if you start out
with the position vector of a different known point in it?
The point L also lies in this plane. Its position vector l is given by l = 7i - 7j + 5k.
Check that working with l instead of m does give you the same equation for the plane.
Geometrically, you can see that this will be so.
L and M are both just possible positions of P, so that both n.l and n.m give the distance
Try one for yourself!
The two vectors s = 4i + 3k and t = 8i - j + 3k lie in plane Q.
The point M also lies in Q and its position vector from the origin is given by
m = 2i + 4j + 7k.
Show that the perpendicular distance of the origin to this plane is 2 units and find its
The general case
This is how the working goes with letters taking the place of the numbers we have used
in the numerical example.
m is the position vector of the known point in the plane.
n is the unit surface normal to the plane.
We'll let m = x0i + y0j + z0k and n = Ai + Bj + Ck.
The position vector of the general point P in the plane is given by
r = xi + yj + zk where the values of x, y and z vary according to the particular P chosen.
Now we use n.r = n.m = D to write down the equation of the plane. This gives us
(Ai + Bj + Ck) . (xi + yj + zk)= (Ai + Bj + Ck) . (x0i + y0j + z0k). = D
so Ax + By + Cz = Ax0 + By0 + Cz0 = D
or, if you prefer, you can write
A(x-x0) + B(y-y0) + A(z-z0) = 0.
If you have found a normal vector which is not of unit length, you will first need to scale
it down.
Suppose you have found N = N1i + N2j + N3k.
Then the length of N is given by
and n, the unit normal vector, is given by
Now, putting n = Ai + Bj + Ck, we have
10.7 Polygon meshes
A polygon mesh or unstructured grid is a collection of vertices and polygons that
defines the shape of an polyhedral object in 3D computer graphics.
Meshes usually consist of triangles, quadrilaterals or other simple convex
polygons, since this simplifies rendering, but they can also contain objects made of
general polygons with optional holes.
Example of a triangle mesh representing a dolphin.
Examples of internal representations of an unstructured grid:
Simple list of vertices with a list of indices describing which vertices are linked to
form polygons; additional information can describe a list of holes
List of vertices + list of edges (pairs of indices) + list of polygons that link edges
Winged edge data structure
The choice of the data structure is governed by the application: it's easier to deal with
triangles than general polygons, especially in computational geometry. For optimized
algorithms it is necessary to have a fast access to topological information such as edges
or neighboring faces; this requires more complex structures such as the winged-edge
10.8 Curved lines and surfaces
Curved surfaces are one of the most popular ways of implementing scalable
geometry. Games applying curved surfaces look fantastic. UNREAL's characters looked
smooth whether they are a hundred yards away, or coming down on top of you. QUAKE
3: ARENA screen shots show organic levels with stunning smooth, curved walls and
tubes. There are a number of benefits to using curved surfaces. Implementations can be
very fast, and the space required to store the curved surfaces is generally much smaller
than the space required to store either a number of LOD models or a very high detail
The industry demands tools that can make creation and manipulation of curves
more intuitive. A Bezier curve is a good starting point, because it can be represented and
understood with a fair degree of ease. To be more specific, we choose cubic Bezier
curves and bicubic Bezier patches for the reason of simplicity.
Bezier Curves
A cubic Bezier curve is simply described by four ordered control points, p0, p1,
p2, and p3. It is easy enough to say that the curve should "bend towards" the points. It has
three general properties:
1. The curve interpolates the endpoints: we want the curve to start at p0 and end at p3.
2. The control points have local control: we'd like the curve near a control point to move
when we move the control point, but have the rest of the curve not move as much.
3. The curve stays within the convex hull of the control points. It can be culled against
quickly for visibility culling or hit testing.
A set of functions, called the Bernstein basis functions, satisfy the three general
properties of cubic Bezier curves.
If we were considering general Bezier curves, we'd have to calculate n choose i.
Since we are only considering cubic curves, though, n = 3, and i is in the range [0,3].
Then, we further note the n choose i is the ith element of the nth row of Pascal's traingle,
{1,3,3,1}. This value is hardcoded rather than computed in the demo program.
Bezier Patches
Since a Bezier curve was a function of one variable, f(u), it's logical that a surface
would be a function of two variables, f(u,v). Following this logic, since a Bezier curve
had a one-dimentional array of control points, it makes sense that a patch would have a
two-dimensional array of control points. The phrase "bicubic" means that the surface is a
cubic function in two variables - it is cubic along u and also along v. Since a cubic Bezier
curve has a 1x4 array of control points, bicubic Beizer patch has a 4x4 array of control
To extend the original Bernstein basis function into two dimension, we evaluate
the influence of all 16 control points:
The extension from Bezier curves to patches still satisfies the three properties:
1. The patch interpolates p00, p03, p30, and p33 as endpoints.
2. Control points have local control: moving a point over the center of the patch will most
strongly affect the surface near that point.
3. The patch remains within the convex hull of its control points.
Let us Sum Up
In this lesson we have learnt about
a) Polygon surfaces
b) Curved lines and surfaces
10.10 Lesson-end Activities
After learning this lesson, try to discuss among your friends and answer these
questions to check your progress.
a) Intersection Test
b) Angle Test
10.11 Points for Discussion
Discuss the following
a) Importance of Polygon surface in Computer Graphics
b) Discuss about drawing curved lines
10.12 Model answers to “Check your Progress”
In order to check your progress try to answer the following questions
a) Plane Equations
b) Polygon meshes
10.13 References
1. Chapter 21 of William M. Newman, Robert F. Sproull, “Principles of
Interactive Computer Graphics”, Tata-McGraw Hill, 2000
2. Chapter 11 of Steven Harrington, “Computer Graphics – A programming
approach”, McGraw Hill, 1987
3. Chapter 10 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C
Version”, Pearson Education, 2007
4. Chapter 12 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006
5. Chapter 11 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer
Graphics – principles and practice”, Addison-Wesley, 1997
11.1 Aims and Objectives
11.2 Introduction
11.3 Definition of an H. S. R. Algorithm
11.4 Taxonomy of Hidden Surface Removal Algorithms
11.5 Back face detection
11.6 Depth-Buffer Method
11.7 Let us Sum Up
11.8 Lesson-end Activities
11.9 Points for Discussion
11.10 Model answers to “Check your Progress”
11.11 References
11.1 Aims and Objectives
The aim of this lesson is to learn the concept of surface detection methods
The objectives of this lesson are to make the student aware of the following concepts
a) classification of surface detection algorithms
b) Back face detection and
c) Depth buffer algorithms
11.2 Introduction
Computer Graphics attempts to represent objects in the general three-dimensional
universe. Most objects are not transparent and so we are interested in their outer surfaces,
which have properties such as shape, colour and texture which affect the graphical
representation. A wire-frame drawing of a solid object is less realistic because it includes
parts of the object which are hidden in reality, and this generates a need for some form of
hidden-line or hidden-surface removal. It is important to realise that there is no single
algorithm which works equally well in all cases. Most algorithms achieve greater speed
and efficiency by taking the format of the data into account and this automatically
restricts their use.
Since the amount of data needed to store the position of every point on the surface
of even quite a small object is impossibly large, we have to make some simplifying
assumptions. The choice of these simplifications will decide the form of data structure
used to store the objects and will also restrict the choice of hidden-surface algorithm
available. A typical set of simplifying assumptions might be those given below.
a) Divide the surface of the object into a number of faces surrounded by "boundary
curves" or "contours". The contours may be any closed curves and the faces may be
curved, so some means of specifying the equations of the surfaces is needed.
b) Restrict the description to allow only flat or planar faces. The contours must now be
closed polygons in the plane. (Since two planes must intersect in a straight line, an object
without any holes must have its edge curves made up of straight lines.)
c) Subdivide the polygons until they are all convex.
d) Subdivide the polygons until the object is described in terms of triangular facets.
At each simplification, the amount of data needed to describe one face is reduced.
This should also reduce the time taken for the related calculations. However some objects
require many more faces to give an acceptable approximation to the object. A simple
example of an object which requires very many triangular facets to give an acceptable
approximation is a sphere.
11.3 Definition of an H. S. R. Algorithm
One of the earliest attempts to produce a rigorous definition of these algorithms
occurs in the text by Giloi. Since it is still relevant, let us consider it: An hidden-surface
algorithm is said to consist of five components, namely:
The set of objects in 3D space (input data).
The set of objects in 2D space (results).
The set of intermediate representations (workspace). Some algorithms require
little or no intermediate storage. These representations, if used, may be in either
2D or 3D.
The set of transition functions, usually implemented as subroutines or
procedures. The following five transition functions will be required in some
PM = Projective Mapping.
IS = Inter-Section Function.
CT = Containment Test.
DT = Depth Test.
VT = Visibility Test.
Strategy Function or Overall Method. This specifies the order in which the
transition functions are applied to the input data to produce the results and it
may also include instructions to display the results.
In fact this may be an over simplification, since we frequently find that the
transition functions are used in combination with each other. For example, to decide
whether one point in 3D space is hidden by another, it will usually be necessary to apply
a combination of both Projective Mapping and Depth Test. Projective Mapping may be
either Perspective or Orthogonal. Consider the situation where we have a view point V on
one side of the objects to be drawn and are projecting them on to a plane on the far side
of the objects. Now assume we have rotated the entire figure so that the plane is z=0, the
viewpoint is on the z-axis (coordinates 0,0,Z), and all other z values lie between 0 and Z.
Now let us consider two overlapping graphical elements, to determine which of
the two is closer to the viewpoint and hence which covers the other when the diagram is
drawn. In the following discussion, it is assumed that the graphical elements are plane
polygonal facets and each vertex of one facet is tested against all vertices of the other.
a) Perspective projection.
Consider any two points P1 and P2. P2 is hidden by P1 if and only if
(i) V, P1 and P2 are co-linear.
(ii) P1 is closer to V. i.e. VP1 < VP2.
Consider the test-point P1 (usually a vertex of the first facet F1) and facet F2. Connect
the viewpoint V and test-point P1 and calculate P2, the intersection of the line VP1
(continued if necessary) and the facet F2. Calculate the lengths of VP1 and VP2.
Then if VP2 is greater than VP1, P1 is not hidden by F2.
If VP1 = VP2, then the two points coincide and we may choose which of them to
consider visible.
b) Orthogonal Projection.
Again the viewer is at the height V and looking down on the plane z=0, but now all the
lines are parallel. Indeed for an orthogonal projection, they are all perpendicular to the
plane z=0 and so parallel to the z-axis. So the point P2 is hidden by P1 if and only if
(ii) The z-coordinate P1(z) > P2(z).
This is equivalent to moving the point V a very large distance from the plane z=0 ("V
tends to infinity").
Consider the projection onto the plane z=0 and use the values of z to assign priorities to
the faces. In this case, we wish to compare facets F1 and F2. After projection onto the
plane z=0, F1 is projected on to S1 and F2 is projected onto S2. The intersection of S1
and S2 is called S. If S is empty, then the projections do not overlap and the priority is
Any point (x,y,0), lying in S, corresponds to the point (x,y,z1) in F1 and the point
(x,y,z2) in F2. If z1 is greater than z2 for all these points, then "F1 has priority over F2".
However if this is true for some points in S and false for others, then the two facets
intersect each other and we cannot assign priorities. It will be necessary to calculate the
line of intersection of the two facets and split one of them along this line. If F1 is split
into F1a and F1b, then we can number them so that F1a has priority over F2 and F2 has
priority over F1b.
Using these priorities, it is possible to get a unique ordering, showing which
facets lie in front of which others and use this to provide the correct output to draw the
visible facets.
Intersection Function
Note that in each of these cases, it was necessary to discuss the intersection of the
projection of a vertex of one facet and the projection of the other facet. This may be dealt
with by use of the Intersection Function, which defines how to calculate the intersection
of two graphic elements. Other examples are the intersection of two lines, the intersection
of two segments (lines of fixed length) or the intersection of a line and a plane. In this
actual case, it is probably more relevant to use a Containment Test.
Containment Test
The Containment Test considers the question "Does the point P lie inside the
polygon F ?" and returns the result "true" or "false". It is usually applied after projection
into two dimensions and so the methods discussed in that section are immediately
applicable. Either the angle test or the intersection test may be used.
Visibility Tests
So far, we have discussed the special case of one or more objects defined as plane
facets and considered whether or not one of the facets obscures another. This is a very
slow process, especially when all of the very large number of facets have to compared
with all the others. There is one very simple consideration which will about halve the
number of facets to be considered. If we assume that the facets form the outer surfaces of
one or more solid objects, then those facets on the back of the object (relative to the
viewing position) cannot be seen and so a test to identify these will remove them from the
testing early in the process.
This uses a "Visibility Test" which is applied to solid objects, to distinguish
between the potentially visible "front faces" and the invisible "back faces". If our picture
consists of a single convex object, then all the potentially visible faces are visible and the
object may be drawn very quickly and easily.
If a perspective projection is being used, then the "line of sight" is the line from
the viewpoint V to the point on the surface. However, if a parallel projection is being
used, then the relevant line is one parallel to the viewing direction which passes through
the point on the surface. In either case, let this direction be denoted by the vector d.
The surface normal, denoted by n, is the outward-pointing normal from this point,
normal to the surface of the plane. To decide whether the plane is potentially visible or
always invisible, it is necessary to consider the angle between d and n and we may use
the dot product d.n to decide this. Let A be the angle between these vectors. If A is
greater than 90degrees, then the surface is potentially visible, otherwise the surface is
invisible. If both vectors are scaled to have unit length, so that we are dealing with
direction cosines, then the dot product gives the value of cosA. Thus the face is
potentially visible if the dot product is negative.
In the above figure, the parallel projection has d = [1,0,0] and face A has outward
normal n1 = [-1,2,0] while face B has outward normal n2 = [1,0,0]. The dot product of
the visible face A has value -1 and the dot product of the invisible face has the value 1.
When the dot product is zero the face is "edge-on" to the viewing direction and may be
Strategy Function or Overall Method.
In discussing the method used by any Hidden-surface Removal Algorithm, we
shall also need to discuss the input data (O) and the results (S) and indeed it may be
useful to classify these algorithms according to the form of input data they can handle.
The set (I) of intermediate representations may be important in deciding practical aspects
such as the total amount of storage needed by the algorithm, and will be closely
connected with the precise form of some of the Transition functions used. Let us consider
a number of algorithms, classified according to the form of their input data. Because the
same general method may have considerable variation in the details, we shall tend to get
groups of algorithms which differ only in their fine detail.
11.4 Taxonomy of Hidden Surface Removal Algorithms
The text by Giloi includes a classification based on the form of the input data and
provides examples of algorithms for some of these. This classification has been
simplified slightly (four classes reduced to three) and the algorithms identified. It is not
complete, other algorithms do not fall into these categories and other methods of
classifying these algorithms are also possible.
a) Class One
These include `solids' made up of plane polygonal facets. The resulting object
may be represented as a set of `contour lines' and `lines of intersection' or it may be
output as a shaded object. e.g. Appel's method, or Watkin's method, or Encarnacao's
Priority Method (requires input data as triangles).
b) Class Two
These are `surfaces' made up of curved faces. The resulting object is represented
as a net of grid lines. e.g. Encarnacao's Scan-Grid Method.
c) Class Three
These are general objects defined analytically. No example of an algorithm for
this class is given in Giloi.
Alternatively the methods may be grouped according to the type of method. This
gives the following:
a) Scan-line Methods
These include Watkin's method and a number of others. These work in terms of
scan lines with the pixel-colour at each point along the line calculated and output. If there
is enough storage to hold a copy of the entire screen, instead of just one line across it, we
may use a `Z-buffer' algorithm, in which the z value corresponding to each pixel is used
to decide on the colour of that pixel. The polygons may be added in any order, but the zvalue is used to decide whether a pixel should be changed or not as the next polygon is
added. Again coherence may be used to reduce the number of tests needed.
When we come to consider the special case of drawing an isometric projection
drawing of a surface (fishnet output), one of the methods of deciding which parts of the
drawing should be visible is the `Template method'. Here the order of output is chosen to
work from the front of the surface and calculate each section of the drawing in turn. In
parallel with this, for each x-value on the screen, a y-value is stored indicating the largest
value currently output. This builds up a `template' of the area of screen currently covered
by the drawing. New lines are only drawn if they lie outside this area (i.e. if the new yvalues are greater than those previously stored). This allows fast, accurate output of the
drawing of the surface.
b) List-Priority Methods
Depth-sort or Painters' Algorithm.
This relies on the polygons for output being sorted into order, so that the polygon
furthest from the viewer is output first. It also assumes that output of a second polygon on
top of the first will overwrite it and none of the earlier output will remain. This is true on
most screens, but not on most printers or plotters. It is similar to the method used by
painters in situations where the latest coat of paint conceals the ones below.
This may also be used for the case where the output is an image of the isometric
projection of a surface. In this case, it is easy to output the patches of the surface with
those furthest from the viewpoint being output first and the later ones drawn on top.
Another method of this type is Encarnacao's Priority Method .
c) Ray-tracing Methods.
These use the idea of dropping a line, or ray, from the viewpoint (or eye of the
viewer) onto parts of the objects and on to the viewing plane. Appel's method of hidden
surface removal introduces the concept of `quantitative invisibility' (counting the number
of faces between the surface being tested and the viewer) and uses coherence to reduce
the number of tests to give the correct output.
d) Methods for Curved Surfaces.
One example of such a method is Encarnacao's Scan-Grid Method .
11.5 Back face detection
A fast and simple object-space method for identifying the back faces of a polyhedron is
based on the “inside-outside” tests. A point (x,y,z) is ‘inside’ a polygon surface with
plane parameters A, B, C and D if
When an inside point is along the line of sight to the surface, the polygon must be a back
If V is a vector in the viewing direction from the eye position and N is the normal vector
N to a polygon surface, then the polygon is a back face if
11.6 Depth-Buffer Method
A commonly used image space approach to detecting visible surfaces is the depth
buffer method, which compares surface depths at each pixel position on the projection
A depth buffer is used to store depth values for each (x,y) position as surfaces are
processed, and the refresh buffer stores the intensity values for each position. Initially,
all positions in the depth buffer are set to 0 (minimum depth) and the refresh buffer is
inialialized to the background intensity. Each surface listed in the polygon tables is then
processed, one scan line at a time, calculating the depth (z value) at each (x,y) pixel
position. The calculated depth is compared to the value previously stored in the depth
buffer at that position. If the calculated depth is greater than the value stored in the depth
buffer, the new depth value is stored, and the surface intensity at that position is
determined and placed in the same xy location in the refresh buffer.
1. Initialize the depth buffer and refresh buffer so that for all buffer positions (x,y)
Depth(x,y) = 0
Referesh(x,y) = Ibackgnd
2. For each position on each polygon surface, compare depth values to previously
stored values in the depth buffer to determine visibility.
Calculate the depth z for each (x,y) position on the polygon
If z > depth(x,y), then set
Depth(x,y) = z,
Refersh(x,y) = Isurf(x,y)
where Ibackgnd is the value for the background intensity, and Isurf(x,y) is the
projected intensity value for the surface at pixel position (x,y). After all surfaces
have been processed, the depth buffer contains depth values for the visible
surfaces and the refresh buffer contains the corresponding intensity values for
those surfaces.
11.7 Let us Sum Up
In this lesson we have learnt about various surface detection methods.
11.8 Lesson-end Activities
After learning this lesson, try to discuss among your friends and answer these
questions to check your progress.
Classification of surface detection algorithms
Depth buffer method
11.9 Points for Discussion
Discuss the following
a) Containment Test
b) Visibility Tests
11.10 Model answers to “Check your Progress”
In order to check your progress, try to answer the following
a) Painters' Algorithm
b) Ray-tracing Methods
11.11 References
1. Chapter 24 of William M. Newman, Robert F. Sproull, “Principles of
Interactive Computer Graphics”, Tata-McGraw Hill, 2000
2. Chapter 9 of Steven Harrington, “Computer Graphics – A programming
approach”, McGraw Hill, 1987
3. Chapter 13 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C
Version”, Pearson Education, 2007
4. Chapter 9 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006
5. Chapter 15 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer
Graphics – principles and practice”, Addison-Wesley, 1997
6. Computer Graphics, Susan Laflin. August 1999.
12.1 Aims and Objective
12.2 Introduction
12.3 History of Multimedia Systems
12.4 Trends in Multimedia
12.5 Applications
12.6 Let us Sum Up
12.7 Lesson-end Activities
12.8 Points for Discussion
12.9 Model answers to “Check your Progress”
12.10 References
12.1 Aims and Objectives
The aim of this lesson is to learn the concept of multimedia
The objectives of this lesson are to make the student aware of the following concepts
a) Introduction to multimedia
b) History and
c) applications
12.2 Introduction
Multimedia is the field concerned with the computer-controlled integration of text,
graphics, drawings, still and moving images (Video), animation, audio, and any other
media where every type of information can be represented, stored, transmitted and
processed digitally. A Multimedia Application is an Application which uses a collection
of multiple media sources e.g. text, graphics, images, sound/audio, animation and/or
video. Hypermedia can be considered as one of the multimedia applications.
Multimedia is the combination of text, animated graphics, video, and
sound. It presents information in a way that is more interesting and easier to
grasp than text alone. It has been used for education at all levels, job training,
and games and by the entertainment industry. It is becoming more readily
available as the price of personal computers and their accessories declines.
Multimedia as a human-computer interface was made possible some half-dozen
years ago by the rise of affordable digital technology. Previously, multimedia
effects were produced by computer-controlled analog devices, like videocassette
recorders, projectors, and tape recorders. Digital technology's exponential
decline in price and increase in capacity has enabled it to overtake analog
technology. The Internet is the breeding ground for multimedia ideas and the
delivery vehicle of multimedia objects to a huge audience. This unit reviews the
uses of multimedia, the technologies that support it, and the larger architectural
and design issues.
Nowadays, multimedia generally indicates a rich sensory interface between
humans and computers or computer-like devices--an interface that in most cases gives the
user control over the pace and sequence of the information. We all know multimedia
when we see and hear it, yet its precise boundaries elude us. For example, movies on
demand, in which a viewer can select from a large library of videos and then play, stop,
or reposition the tape or change the speed is generally considered multimedia. However,
watching the movie on a TV set attached to a videocassette recorder (VCR) with the
same abilities to manipulate the play is not considered multimedia. Unfortunately, we
have yet to find a definition that satisfies all experts.
Recent multimedia conferences, such as the IEEE International Conference on
Multimedia Computing and Systems, ACM Multimedia, and Multimedia Computing and
Networking, provide a good start for identifying the components of multimedia. The
range of multimedia activity is demonstrated in papers on multimedia authoring (i.e.,
specification of multimedia sequences), user interfaces, navigation (user choices),
effectiveness of multimedia in education, distance learning, video conferencing,
interactive television, video on demand, virtual reality, digital libraries, indexing and
retrieval, and support of collaborative work. The wide range of technologies is evident in
papers on disk scheduling, capacity planning, resource management, optimization,
networking, switched Ethernet LANs, Asynchronous Transfer Mode (ATM) networking,
quality of service in networks, Moving Picture Expert Group (MPEG**) encoding,
compression, caching, buffering, storage hierarchies, video servers, video file systems,
machine classification of video scenes, and Internet audio and video.
Multimedia systems need a delivery system to get the multimedia objects to the
user. Magnetic and optical disks were the first media for distribution. The Internet, as
well as the Transmission Control Protocol/Internet Protocol (TCP/IP) protocol suite or
Net BIOS on isolated or campus LANs, became the next vehicles for distribution. The
rich text and graphics capabilities of the World Wide Web browsers are being augmented
with animations, video, and sound. Internet distribution will be augmented by distribution
via satellite, wireless, and cable systems.
12.3 History of Multimedia Systems
Newspaper were perhaps the first mass communication medium to employ
Multimedia -- they used mostly text, graphics, and images.
In 1895, Gugliemo Marconi sent his first wireless radio transmission at Pontecchio, Italy.
A few years later (in 1901) he detected radio waves beamed across the Atlantic. Initially
invented for telegraph, radio is now a major medium for audio broadcasting.
Television was the new media for the 20th century. It brings the video and has
since changed the world of mass communications.
Some of the important events in relation to Multimedia in Computing include:
1945 - Bush wrote about Memex
1967 - Negroponte formed the Architecture Machine Group at MIT
1969 - Nelson & Van Dam hypertext editor at Brown
Birth of The Internet
1971 - Email
1976 - Architecture Machine Group proposal to DARPA: Multiple Media
1980 - Lippman & Mohl: Aspen Movie Map
1983 - Backer: Electronic Book
1985 - Negroponte, Wiesner: opened MIT Media Lab
1989 - Tim Berners-Lee proposed the World Wide Web to CERN
(European Council for Nuclear Research)
1990 - K. Hooper Woolsey, Apple Multimedia Lab, 100 people, educ.
1991 - Apple Multimedia Lab: Visual Almanac, Classroom MM Kiosk
1992 - the first M-bone audio multicast on the Net
1993 - U. Illinois National Center for Supercomputing Applications:
NCSA Mosaic
1994 - Jim Clark and Marc Andreesen: Netscape
1995 - JAVA for platform-independent application development. Duke is
the first applet.
1996 - Microsoft, Internet Explorer.
12.4 Trends in Multimedia
Current big applications areas in Multimedia include
World Wide Web
-- Hypermedia systems -- embrace nearly all multimedia technologies and
application areas. Ever increasing popularity.
-- Multicast Backbone: Equivalent of conventional TV and Radio on the Internet.
Enabling Technologies
-- developing at a rapid rate to support ever increasing need for Multimedia.
Carrier, Switching, Protocol, Application, Coding/Compression, Database,
Processing, and System Integration Technologies at the forefront of this.
12.5 Applications
Examples of Multimedia Applications include:
World Wide Web
Hypermedia courseware
Video conferencing
Interactive TV
Home shopping
Virtual reality
Digital video editing and production systems
Multimedia Database systems
Multimedia applications are primarily existing applications that can be made less
expensive or more effective through the use of multimedia technology. In addition, new,
speculative applications, like movies on demand, can be created with the technology. We
present here a few of these applications.
Video on demand (VOD), also called movies on demand, is a service that provides
movies on an individual basis to television sets in people's homes. The movies are stored
in a central server and transmitted through a communication network. A set-top box
(STB) connected to the communication network converts the digital information to
analog and inputs it to the TV set. The viewer uses a remote control device to select a
movie and manipulate play through start, stop, rewind, and visual fast forward buttons.
The capabilities are very similar to renting a video at a store and playing it on a VCR.
The service can provide indices to the movies by title, genre, actors, and director. VOD
differs from pay per view by providing any of the service's movies at any time, instead of
requiring that all purchasers of a movie watch its broadcast at the same time. Enhanced
pay per view, also a broadcast system, shows the same movie at a number of staggered
starting times.
Home shopping and information systems - Services to the home that provide
video on demand will also provide other, more interactive, home services. Many kinds of
goods and services can be sold this way. The services will help the user navigate through
the available material to plan vacations, renew driver's licenses, purchase goods, etc.
Networked games - The same infrastructure that supports home shopping could be
used to temporarily download video games with graphic-intensive functionality to the
STB, and the games could then be played for a given period of time. Groups of people
could play a game together, competing as individuals or working together in teams.
Action games would require a very fast, or low-latency , network.
Video conferencing - Currently, most video conferencing is done between two
specially set-up rooms. In each room, one or more cameras are used, and the images are
displayed on one or more monitors. Text, images, and motion video are compressed and
sent through telephone lines. Recently, the technology has been expanded to allow more
than two sites to participate. Video conferences can also be connected through LANs or
the Internet. In time, video conferences will be possible from the home.
Education - A wide range of individual educational software employing
multimedia is available on CD-ROM. One of the chief advantages of such multimedia
applications is that the sequence of material presented is dependent upon the student's
responses and requests. Multimedia is also used in the classroom to enhance the
educational experience and augment the teacher's work. Multimedia for education has
begun to employ servers and networks to provide for larger quantities of information and
the ability to change it frequently.
Distance learning - Distance learning is a variation on education in which not all
of the students are in the same place during a class. Education takes place through a
combination of stored multimedia presentations, live teaching, and participation by the
students. Distance learning involves aspects of both teaching with multimedia and video
Just-in-time training - Another variation on education, called just-in-time
training, is much more effective because it is done right when it is needed. In an industry
context, this means that workers can receive training on PCs at their own workplaces at
the time of need or of their choosing. This generally implies storing the material on a
server and playing it through a wide-area network or LAN.
Digital libraries - Digital libraries are a logical extension of conventional
libraries, which house books, pictures, tapes, etc. Material in digital form can be less
expensive to store, easier to distribute, and quicker to find. Thus digital technology can
save money and provide better capabilities. The Vatican Library has an extraordinary
collection of 150 000 manuscripts, including early copies of works by Aristotle, Dante,
Euclid, Homer, and Virgil. However, only about 2000 scholars a year are able to
physically visit the library in Rome. Thus, the IBM Vatican Library Project, which makes
digitized copies of some of the collection available to scholars around the world, is a very
valuable service, especially if the copies distributed are of high quality.
Virtual reality - Virtual reality provides a very realistic effect through sight and
sound, while allowing the user to interact with the virtual world. Because of the ability of
the user to interact with the process, realistic visual effects must be created ``on the fly.''
Telemedicine - Multimedia and telemedicine can improve the delivery of health
care in a number of ways. Digital information can be centrally stored, yet simultaneously
available at many locations. Physicians can consult with one another using video
conference capabilities, where all can see the data and images, thus bringing together
experts from a number of places in order to provide better care. Multimedia can also
provide targeted education and support for the patient and family.
12.6 Let us Sum Up
In this lesson we have learnt about
a) introduction
b) History and
c) Applications of multimedia
12.7 Lesson-end Activities
After learning this lesson, try to discuss among your friends and answer these
questions to check your progress.
Define multimedia
Discuss the history of multimedia
12.8 Points for Discussion
Discuss the following
a) Application of multimedia in medicine
b) Application of multimedia in education
12.9 Model answers to “Check your Progress”
To check your progress, try to answer the following
Video on demand
Digital libraries
12.10 References
Chapter 15, 16 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006
Z.S. Bojkovic, D.A Milovanovic, Multimedia Communication Systems, PHI,
S.J. Gibbs, D.C. Tsichritzis, Multimedia Programming, Addison-Wesley, 1995
J.F. Koegel, Multimedia Systems, Pearson Education, 2001
13.1 Aims and Objectives
13.2 Introduction
13.3 Building blocks of Multimedia
13.4 What is HyperText and HyperMedia?
13.5 Characteristics of a Multimedia System
13.6 Challenges for Multimedia Systems
13.7 Desirable Features for a Multimedia System
13.8 Components of a Multimedia System
13.9 Multimedia technology
13.10 Multimedia architecture
13.11 Let us Sum Up
13.12 Lesson-end Activities
13.13 Points for Discussion
13.14 Model answers to “Check your Progress”
13.15 References
13.1 Aims and Objectives
The aim of this lesson is to learn the concept of multimedia building blocks.
The objectives of this lesson are to make the student aware of the following concepts
building blocks
13.2 Introduction
Multimedia is obviously a fertile ground for both research and the development of
new products, because of the breadth of possible usage, the dependency on a wide range
of technologies, and the value of reducing cost by improving the technology. Now that
the technology has been developed, however, the marketplace will determine future
direction. The technology will be used when clear value is found. For example,
multimedia is widely used on PCs using CDs to store the content. The CDs are
inexpensive to reproduce, and the players are standard equipment on most PCs purchased
today. The acceptance caused a greater demand for players, which, in turn, caused greater
production and further reduced prices.
The computer industry is providing demand, and an expanding market, for the key
hardware technologies that underlie multimedia. These include solid-state memory, logic,
microprocessors, modems, switches, and disk storage. The price declines of 30-60% per
year that we have seen for several decades will continue into the foreseeable future. As a
result, the application of multimedia, which appears expensive now, will become less
expensive and more attractive. An exception to this fast rate of improvement is the cost of
data communications. Communications depend both on technology with rapidly
decreasing cost and on mundane and basically unchanging tasks such as laying cable with
the help of a backhoe or stringing cables from poles. The cost of communication is not
likely to decline significantly for quite a while.
We feel that multimedia will spread from low-bit-rate to high-bit-rate, and will
begin on established intranets first, move to the Internet, and finally be transmitted on
broadband connections (ADSL or cable modems) to the home.
The initial uses will be information dissemination, education, and training on
campus LANs. Multimedia will be used in education, government, and business over
campus LANs, with low-bit-rate video that will not place excessive stress on the
infrastructure. The availability of switched LAN technology and faster LANs will allow
increases in both the bit rate per user and the number of users. As the cost of
communications decreases, the cost for Internet attachment for servers will decline, and
higher-quality video will be used on the Internet. Multimedia will be a compelling
interface for commerce and advertising on the Internet. Eventually, cable modems and/or
ADSL will provide bandwidth for movies to the home, and the declining computer and
switching costs will allow a cost-effective service. The winner between ADSL and cable
modems will have as much to do with the ability of cable companies and RBOCs to raise
capital as with the inherent cost and value of the two technologies.
IBM researchers continue to play an active role in developing technology,
including MPEG encoding and decoding, video servers, delivery systems, digital
libraries, applications for indexing and searching for content, and collaboration.
Researchers are also engaged in many uses of multimedia technology and in building
advanced systems with IBM customers.
13.3 Building blocks of Multimedia
The building blocks of multimedia includes
Hyper Text and Hypermedia
13.4 What is HyperText and HyperMedia?
Hypertext is a text which contains links to other texts. The term was invented by
Ted Nelson around 1965. Hypertext is therefore usually non-linear (as indicated below).
HyperMedia is not constrained to be text-based. It can include other media, e.g.,
graphics, images, and especially the continuous media - sound and video. Apparently,
Ted Nelson was also the first to use this term.
The World Wide Web (WWW) is the best example of hypermedia applications.
13.5 Characteristics of a Multimedia System
A Multimedia system has four basic characteristics:
Multimedia systems must be computer controlled.
Multimedia systems are integrated.
The information they handle must be represented digitally.
The interface to the final presentation of media is usually interactive.
13.6 Challenges for Multimedia Systems
Multimedia systems may have to render a variety of media at the same instant -- a
distinction from normal applications. There is a temporal relationship between many
forms of media (e.g. Video and Audio. These 2 are forms of problems here
 Sequencing within the media -- playing frames in correct order/time frame
in video
 Synchronisation -- inter-media scheduling (e.g. Video and Audio). Lip
synchronisation is clearly important for humans to watch playback of
video and audio and even animation and audio. Ever tried watching an out
of (lip) sync film for a long time?
The key issues multimedia systems need to deal with here are:
 How to represent and store temporal information.
 How to strictly maintain the temporal relationships on play back/retrieval
 What processes are involved in the above.
Data has to represented digitally so many initial source of data needs to be digitise
-- translated from analog source to digital representation. This will involve scanning
(graphics, still images) and sampling (audio/video) although digital cameras now exist for
direct scene to digital capture of images and video.
13.7 Desirable Features for a Multimedia System
Given the above challenges the following feature a desirable (if not a prerequisite)
for a Multimedia System:
Very High Processing Power -- needed to deal with large data processing and real time
delivery of media.
Multimedia Capable File System -- needed to deliver real-time media -- e.g.
Video/Audio Streaming. Special Hardware/Software needed e.g RAID technology.
Data Representations/File Formats that support multimedia -- Data
representations/file formats should be easy to handle yet allow for
compression/decompression in real-time.
Efficient and High I/O -- input and output to the file subsystem needs to be efficient and
fast. Needs to allow for real-time recording as well as playback of data. e.g. Direct to
Disk recording systems.
Special Operating System -- to allow access to file system and process data efficiently
and quickly. Needs to support direct transfers to disk, real-time scheduling, fast interrupt
processing, I/O streaming etc.
Storage and Memory -- large storage units (of the order of 50 -100 Gb or more) and
large memory (50 -100 Mb or more). Large Caches also required and frequently of Level
2 and 3 hierarchy for efficient management.
Network Support -- Client-server systems and distributed systems may be supported
Software Tools -- user friendly tools needed to handle media, design & develop
applications, and deliver media.
13.8 Components of a Multimedia System
Now let us consider the Components (Hardware and Software) required for a multimedia
Capture devices
-- Video Camera, Video Recorder, Audio Microphone, Keyboards, mice, graphics
tablets, 3D input devices, tactile sensors, VR devices. Digitising/Sampling
Storage Devices
-- Hard disks, CD-ROMs, Jaz/Zip drives, DVD, etc
Communication Networks
-- Ethernet, Token Ring, FDDI, ATM, Intranets, Internets.
Computer Systems
-- Multimedia Desktop machines, Workstations, MPEG/VIDEO/DSP Hardware
Display Devices
-- CD-quality speakers, HDTV,SVGA, Hi-Res monitors, Colour printers etc.
13.9 Multimedia technology
A wide variety of technologies contribute to multimedia. Some of the
technologies are going through rapid improvement and deployment because of demand
for PCs and workstations. As a result, multimedia benefits from lower-cost, betterperformance microprocessors, memory chips, and disk storage. Other technologies are
being developed specifically for multimedia systems.
13.9.1 Networks
Telephone networks dedicate a set of resources that forms a complete path from
end to end for the duration of the telephone connection. The dedicated path guarantees
that the voice data can be delivered from one end to the other end in a smooth and timely
way, but the resources remain dedicated even when there is no talking. In contrast, digital
packet networks, for communication between computers, use time-shared resources
(links, switches, and routers) to send packets through the network. The use of shared
resources allows computer networks to be used at high utilization, because even small
periods of inactivity can be filled with data from a different user. The high utilization and
shared resources create a problem with respect to the timely delivery of video and audio
over data networks. Current research centers around reserving resources for timesensitive data, which will make digital data networks more like telephone voice networks.
13.9.2 Internet
The Internet and intranets, which use the TCP protocol suite, are the most
important delivery vehicles for multimedia objects. TCP provides communication
sessions between applications on hosts, sending streams of bytes for which delivery is
always guaranteed by means of acknowledgments and retransmission. User Datagram
Protocol (UDP) is a ``best-effort'' delivery protocol (some messages may be lost) that
sends individual messages between hosts. Internet technology is used on single LANs
and on connected LANs within an organization, which are sometimes called intranets,
and on ``backbones'' that link different organizations into one single global network.
Internet technology allows LANs and backbones of totally different technologies to be
joined together into a single, seamless network.
Part of this is achieved through communications processors called routers.
Routers can be accessed from two or more networks, passing data back and forth as
needed. The routers communicate information on the current network topology among
themselves in order to build routing tables within each router. These tables are consulted
each time a message arrives, in order to send it to the next appropriate router, eventually
resulting in delivery.
Token ring is a hardware architecture for passing packets between stations on a
LAN. Since a single circular communication path is used for all messages, there must be
a way to decide which station is allowed to send at any time. In token ring, a ``token,''
which gives a station the right to transmit data, is passed from station to station. The data
rate of a token ring network is 16 Mb/s.
Ethernet LANs use a common wire to transmit data from station to station.
Mediation between transmitting stations is done by having stations listen before sending,
so that they will not interfere with each other. However, two stations could begin to send
at the same time and collide, or one station could start to send significantly later than
another but not know it because of propagation delay. In order to detect these other
situations, stations continue to listen while they transmit and determine whether their
message was possibly garbled by a collision. If there is a collision, a retransmission takes
place (by both stations) a short but random time later. Ethernet LANs can transmit data at
10 Mb/s. However, when multiple stations are competing for the LAN, the throughput
may be much lower because of collisions and retransmissions.
Switched Ethernet - Switches may be used at a hub to create many small LANs
where one large one existed before. This reduces contention and permits higher
throughput. In addition, Ethernet is being extended to 100Mb/s throughput. The
combination, switched Ethernet, is much more appropriate to multimedia than regular
Ethernet, because existing Ethernet LANs can support only about six MPEG video
streams, even when nothing else is being sent over the LAN.
Asynchronous Transfer Mode(ATM) is a new packet-network protocol designed
for mixing voice, video, and data within the same network. Voice is digitized in
telephone networks at 64 Kb/s (kilobits per second), which must be delivered with
minimal delay, so very small packet sizes are used. On the other hand, video data and
other business data usually benefit from quite large block sizes. An ATM packet consists
of 48 octets (the term used in communications for eight bits, called a byte in data
processing) of data preceded by five octets of control information. An ATM network
consists of a set of communication links interconnected by switches. Communication is
preceded by a setup stage in which a path through the network is determined to establish
a circuit. Once a circuit is established, 53-octet packets may be streamed from point to
ATM networks can be used to implement parts of the Internet by simulating links
between routers in separate intranets. This means that the ``direct'' intranet connections
are actually implemented by means of shared ATM links and switches.
ATM, both between LANs and between servers and workstations on a LAN, will
support data rates that will allow many users to make use of motion video on a LAN.
13.9.3 Data-transmission techniques
a) Modems - Modulator/demodulators, or modems, are used to send digital data over
analog channels by means of a carrier signal (sine wave) modulated by changing the
frequency, phase, amplitude, or some combination of them in order to represent digital
data. (The result is still an analog signal.) Modulation is performed at the transmitting end
and demodulation at the receiving end. The most common use for modems in a computer
environment is to connect two computers over an analog telephone line. Because of the
quality of telephone lines, the data rate is commonly limited to 28.8 Kb/s. For
transmission of customer analog signals between telephone company central offices, the
signals are sampled and converted to ``digital form'' (actually, still an analog signal) for
transmission between offices. Since the customer voice signal is represented by a stream
of digital samples at a fixed rate (64 Kb/s), the data rate that can be achieved over analog
telephone lines is limited.
b) ISDN - Integrated Service Digital Network (ISDN) extends the telephone company
digital network by sending the digital form of the signal all the way to the customer.
ISDN is organized around 64Kb/s transmission speeds, the speed used for digitized voice.
An ISDN line was originally intended to simultaneously transmit a digitized voice signal
and a 64Kb/s data stream on a single wire. In practice, two channels are used to produce a
128Kb/s line, which is faster than the 28.8Kb/s speed of typical computer modems but
not adequate to handle MPEG video.
c) ADSL - Asymmetric Digital Subscriber Lines (ADSL) extend telephone company
twisted-pair wiring to yet greater speeds. The lines are asymmetric, with an outbound
data rate of 1.5 Mb/s and an inbound rate of 64 Kb/s. This is suitable for video on
demand, home shopping, games, and interactive information systems (collectively known
as interactive television), because 1.5 Mb/s is fast enough for compressed digital video,
while a much slower ``back channel'' is needed for control. ADSL uses very high-speed
modems at each end to achieve these speeds over twisted-pair wire.
ADSL is a critical technology for the Regional Bell Operating Companies (RBOCs),
because it allows them to use the existing twisted-pair infrastructure to deliver high data
rates to the home.
d) Cable systems - Cable television systems provide analog broadcast signals on a coaxial
cable, instead of through the air, with the attendant freedom to use additional frequencies
and thus provide a greater number of channels than over-the-air broadcast. The systems
are arranged like a branching tree, with ``splitters'' at the branch points. They also require
amplifiers for the outbound signals, to make up for signal loss in the cable. Most modern
cable systems use fiber optic cables for the trunk and major branches and use coaxial
cable for only the final loop, which services one or two thousand homes. The root of the
tree, where the signals originate, is called the head end.
e) Cable modems are used to modulate digital data, at high data rates, into an analog 6MHz-bandwidth TV-like signal. These modems can transfer 20 to 40 Mb/s in a frequency
bandwidth that would have been occupied by a single analog TV signal, allowing
multiple compressed digital TV channels to be multiplexed over a single analog channel.
The high data rate may also be used to download programs or World Wide Web content
or to play compressed video. Cable modems are critical to cable operators, because it
enables them to compete with the RBOCs using ADSL.
f) Set-top box - The STB is an appliance that connects a TV set to a cable system,
terrestrial broadcast antenna, or satellite broadcast antenna. The STB in most homes has
two functions. First, in response to a viewer's request with the remote-control unit, it
shifts the frequency of the selected channel to either channel 3 or 4, for input to the TV
set. Second, it is used to restrict access and block channels that are not paid for.
Addressable STBs respond to orders that come from the head end to block and unblock
g) Admission control - Digital multimedia systems that are shared by multiple clients
can deliver multimedia data to a limited number of clients. Admission control is the
function which ensures that once delivery starts, it will be able to continue with the
required quality of service (ability to transfer isochronous data on time) until completion.
The maximum number of clients depends upon the particular content being used and
other characteristics of the system.
h) Digital watermarks - Because it is so easy to transmit perfect copies of digital objects,
many owners of digital content wish to control unauthorized copying. This is often to
ensure that proper royalties have been paid. Digital watermarking consists of making
small changes in the digital data that can later be used to determine the origin of an
unauthorized copy. Such small changes in the digital data are intended to be invisible
when the content is viewed. This is very similar to the ``errors'' that mapmakers introduce
in order to prove that suspect maps are copies of their maps. In other circumstances, a
visible watermark is applied in order to make commercial use of the image impractical.
i) Authoring systems - Multimedia authoring systems are used to edit and arrange
multimedia objects and to describe their presentation. The authoring package allows the
author to specify which objects may be played next. The viewer dynamically chooses
among the alternatives. Metadata created during the authoring process is normally saved
as a file. At play time, an ``execution package'' reads the metadata and uses it as a script
for the playout.
Authoring systems, as well as systems for gathering information for multimedia
presentations (scanning, classifying, indexing and processing images, audio, and video)
are very active research areas. Particularly challenging, and also very useful, are
techniques that can be applied to compressed data. Entirely new techniques are required,
and the human factors involved in the processing of this new data must be understood.
13.10 Multimedia architecture
In this section we show how the multimedia technologies are organized in order
to create multimedia systems, which in general consist of suitable organizations of
clients, application servers, and storage servers that communicate through a network.
Some multimedia systems are confined to a stand-alone computer system with content
stored on hard disks or CD-ROMs. Distributed multimedia systems communicate through
a network and use many shared resources, making quality of service very difficult to
achieve and resource management very complex.
• Single-user stand-alone systems - Stand-alone multimedia systems use CD-ROM disks
and/or hard disks to hold multimedia objects and the scripting metadata to orchestrate the
playout. CD-ROM disks are inexpensive to produce and hold a large amount of digital
data; however, the content is static--new content requires creation and physical
distribution of new disks for all systems. Decompression is now done by either a special
decompression card or a software application that runs on the processor. The technology
trend is toward software decompression.
• Multi-user systems
Video over LANs - Stand-alone multimedia systems can be converted to
networked multimedia systems by using client-server remote-file-system technology to
enable the multimedia application to access data stored on a server as if the data were on
a local storage medium. This is very convenient, because the stand-alone multimedia
application does not have to be changed. LAN throughput is the major challenge in these
systems. Ethernet LANs can support less than 10 Mb/s, and token rings 16 Mb/s. This
translates into six to ten 1.5Mb/s MPEG video streams. Admission control is a critical
problem. The OS/2* LAN server is one of the few products that support admission
control. It uses priorities with token-ring messaging to differentiate between multimedia
traffic and lower-priority data traffic. It also limits the multimedia streams to be sure that
they do not sum to more than the capacity of the LAN. Without some type of resource
reservation and admission control, the only way to give some assurance of continuous
video is to operate with small LANs and make sure that the server is on the same LAN as
the client. In the future, ATM and fast Ethernet will provide capacity more appropriate to
Direct Broadcast Satellite - Direct Broadcast Satellite (DBS), which broadcasts
up to 80 channels from a satellite at high power, arrived in 1995 as a major force in the
delivery of broadcast video. The high power allows small (18-inch) dishes with line-ofsight to the satellite to capture the signal. MPEG compression is used to get the maximum
number of channels out of the bandwidth. The RCA/Hughes service employs two
satellites and a backup to provide 160 channels. This large number of channels allows
many premium and special-purpose channels as well as the usual free channels. Many
more pay-per-view channels can be provided than in conventional cable systems. This
allows enhanced pay-per-view, in which the same movie is shown with staggered starting
times of half an hour or an hour.
DBS requires a set-top box with much more function than a normal cable STB.
The STB contains a demodulator to reconstruct the digital data from the analog satellite
broadcast. The MPEG compressed form is decompressed, and a standard TV signal is
produced for input to the TV set. The STB uses a telephone modem to periodically verify
that the premium channels are still authorized and report on use of the pay-per-view
channels so that billing can be done.
Interactive TV and video to the home - Interactive TV and video to the home
allow viewers to select, interact with, and control video play on a TV set in real time. The
user might be viewing a conventional movie, doing home shopping, or engaging in a
network game. The compressed video flowing to the home requires high bandwidth, from
1.5 to 6 Mb/s, while the return path, used for selection and control, requires far lower
The STB used for interactive TV is similar to that used for DBS. The
demodulation function depends upon the network used to deliver the digital data. A
microprocessor with memory for limited buffering as well as an MPEG decompression
chip is needed. The video is converted to a standard TV signal for input to the TV set.
The STB has a remote-control unit, which allows the viewer to make choices from a
distance. Some means are needed to allow the STB to relay viewer commands back to the
server, depending upon the network being used.
Cable systems appear to be broadcast systems, but they can actually be used to
deliver different content to each home. Cable systems often use fiber optic cables to send
the video to converters that place it on local loops of coaxial cable. If a fiber cable is
dedicated to each final loop, which services 500 to 1500 homes, there will be enough
bandwidth to deliver an individual signal to many of those houses. The cable can also
provide the reverse path to the cable head end. Ethernet-like protocols can be used to
share the same channel with the other STBs in the local loop. This topology is attractive
to cable companies because it uses the existing cable plant. If the appropriate amplifiers
are not present in the cable system for the back channel, a telephone modem can be used
to provide the back channel.
As mentioned above, the asymmetric data rates of ADSL are tailored for
interactive TV. The use of standard twisted-pair wire, which has been brought to virtually
every house, is attractive to the telephone industry. However, the twisted pair is a more
noisy medium than coaxial cable, so more expensive modems are needed, and distances
are limited. ADSL can be used at higher data rates if the distance is further reduced.
Interactive TV architectures are typically three-tier, in which the client and server
tiers interact through an application server. (In three-tier systems, the tier-1 systems are
clients, the tier-2 systems are used for application programs, and the tier-3 systems are
data servers.) The application tier is used to separate the logic of looking up material in
indexes, maintaining the shopping state of a viewer, interacting with credit card servers,
and other similar functions from the simple function of delivering multimedia objects.
The key research questions about interactive TV and video-on-demand are not
computer science questions at all. Rather, they are the human-factors issues concerning
ease of the on-screen interface and, more significantly, the marketing questions regarding
what home viewers will find valuable and compelling.
Internet over cable systems - World Wide Web browsing allows users to see a
rich text, video, sound, and graphics interface and allows them to access other
information by clicking on text or graphics. Web pages are written in HyperText Markup
Language (HTML) and use an application communications protocol called HTTP. The
user responses, which select the next page or provide a small amount of text information,
are normally quite short. On the other hand, the graphics and pictures require many times
the number of bytes to be transmitted to the client. This means that distribution systems
that offer asymmetric data rates are appropriate.
Cable TV systems can be used to provide asymmetric Internet access for home
computers in ways that are very similar to interactive TV over cable. The data being sent
to the client is digitized and broadcast over a prearranged channel over all or part of the
cable system. A cable modem at the client end tunes to the right channel and demodulates
the information being broadcast. It must also filter the information destined for the
particular station from the information being sent to other clients. The low-bandwidth
reverse channel is the same low-frequency band that is used in interactive TV. As with
interactive TV, a telephone modem might be used for the reverse channel. The cable head
end is then attached to the Internet using a router. The head end is also likely to offer
other services that Internet Service Providers sell, such as permanent mailboxes. This
asymmetric connection would not be appropriate for a Web server or some other type of
commerce server on the Internet, because servers transmit too much data for the lowspeed return path. The cable modem provides the physical link for the TCP/IP stack in
the client computer. The client software treats this environment just like a LAN
connected to the Internet.
Video servers on a LAN - LAN-based multimedia systems go beyond the simple,
client-server, remote file system type of video server, to advanced systems that offer a
three-tier architecture with clients, application servers, and multimedia servers. The
application servers provide applications that interact with the client and select the video
to be shown. On a company intranet, LAN-based multimedia could be used for just-intime education, on-line documentation of procedures, or video messaging. On the
Internet, it could be used for a video product manual, interactive video product support,
or Internet commerce. The application server chooses the video to be shown and causes it
to be sent to the client.
There are three different ways that the application server can cause playout of the
video: By giving the address of the video server and the name of the content to the client,
which would then fetch it from the video server; by communicating with the video server
and having it send the data to the client; and by communicating with both to set up the
The transmission of data to the client may be in push mode or pull mode. In push
mode, the server sends data to the client at the appropriate rate. The network must have
quality-of-service guarantees to ensure that the data gets to the client on time. In pull
mode, the client requests data from the server, and thus paces the transmission.
The current protocols for Internet use are TCP and UDP. TCP sets up sessions,
and the server can push the data to the client. However, the ``moving-window'' algorithm
of TCP, which prevents client buffer overrun, creates acknowledgments that pace the
sending of data, thus making it in effect a pull protocol. Another issue in Internet
architecture is the role of firewalls, which are used at the gateway between an intranet
and the Internet to keep potentially dangerous or malicious Internet traffic from getting
onto the intranet. UDP packets are normally never allowed in. TCP sessions are allowed,
if they are created from the inside to the outside. A disadvantage of TCP for isochronous
data is that error detection and retransmission is automatic and required--whereas it is
preferable to discard garbled video data and just continue.
Resource reservation is just beginning to be incorporated on the Internet and
intranets. Video will be considered to have higher priority, and the network will have to
ensure that there is a limit to the amount of high-priority traffic that can be admitted. All
of the routers on the path from the server to the client will have to cooperate in the
reservation and the use of priorities.
Video conferencing - Video conferencing, which will be used on both intranets
and the Internet, uses multiple data types, and serves multiple clients in the same
conference. Video cameras can be mounted near a PC display to capture the user's
picture. In addition to the live video, these systems include shared white boards and show
previously prepared visuals. Some form of mediation is needed to determine which
participant is in control. Since the type of multimedia data needed for conferencing
requires much lower data rates than most other types of video, low-bit-rate video, using
approximately eight frames per second and requiring tens of kilobits per second, will be
used with small window sizes for the ``talking heads'' and most of the other visuals.
Scalability of a video conferencing system is important, because if all participants send to
all other participants, the traffic goes up as the square of the number of participants. This
can be made linear by having all transmissions go through a common server. If the
network has a multicast facility, the server can use that to distribute to the participants.
13.11 Let us Sum Up
In this lesson we have learnt about multimedia building blocks.
13.12 Lesson-end Activities
After learning this lesson, try to discuss among your friends and answer these
questions to check your progress.
Multimedia Architecture
Multimedia building blocks
13.13 Points for Discussion
Discuss the following
Characteristics of a Multimedia System
Challenges of Multimedia System
13.14 Model answers to “Check your Progress”
In order to check your progress, try to answer the following questions
a) Data-transmission techniques
b) Desirable Features for a Multimedia System
13.15 References
1. Chapter 15, 16 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006
2. Z.S. Bojkovic, D.A Milovanovic, Multimedia Communication Systems, PHI,
3. S.J. Gibbs, D.C. Tsichritzis, Multimedia Programming, Addison-Wesley,
4. J.F. Koegel, Multimedia Systems, Pearson Education, 2001Multimedia--An
introduction by R. J. Flynn and W. H. Tetzlaff, Volume 42, No 2, 1998, IBM
Journal of Research and Development.
14.1 Aims and Objectives
14.2 Introduction to Text
14.3 Multimedia Sound
14.4 The MIDI Format
14.5 The RealAudio Format
14.6 The AU Format
14.7 The AIFF Format
14.8 The SND Format
14.9 The WAVE Format
14.10 The MP3 Format (MPEG)
14.11 Let us Sum Up
14.12 Lesson-end Activities
14.13 Points for Discussion
14.14 Model answers to “Check your Progress”
14.15 References
14.1 Aims and Objectives
The aim of this lesson is to learn the concept of text and sound in multimedia
The objectives of this lesson are to make the student aware of the following concepts
a) text
b) sound
c) sound formats
14.2 Introduction
Text is the most widely used and flexible means of presenting information on
screen and conveying ideas. The designer should not necessarily try to replace textual
elements with pictures or sound, but should consider how to present text in an acceptable
way and supplementing it with other media. For a public system, where the eyesight of its
users will vary considerably, a clear reasonably large font should be used. Users will also
be put off by the display of large amounts of text and will find it hard to scan. To present
tourist information about a hotel, for example, information should be presented concisely
under clear separate headings such as location, services available, prices, contact details
Guidelines Conventional upper and lower case text should be used for the presentation
since reading is faster compared to all upper case text.
All upper case can be used if a text item has to attract attention as in warnings and alarm
The length of text lines should be no longer than around 60 characters to achieve optimal
reading speed.
Only one third of a display should be filled with text.
Proportional spacing and ragged lines also minimizes unpleasant visual effects.
12 point text is the practical minimum to adopt for PC based screens, with the use of 14
point or higher for screens of poorer resolution than a normal desktop PC
If the users do not have their vision corrected for VDU use e.g. the public. It is
recommended that text of 16 point is preferred if it is to be usable by people with visual
Sentences should be short and concise and not be split over pages.
Technical expressions should be used only where the user is familiar with them from
their daily routine, and should be made as understandable as possible e.g. "You are now
contacting with Paul Andrews" rather than "Connection to Multipoint Control Unit".
The number of abbreviations used in an application should be kept to a minimum. They
should be used only when the abbreviation is routinely used and where the shorter words
lead to a reduction of information density.
Abbreviations should be used in a consistent way throughout an entire multimedia
An explanation of the abbreviations used in the system should be readily available to the
user through on-line help facilities or at least through written documentation.
Strictly, speaking, text is created on a computer, so it doesn't really extend a computer
system the way audio and video do. But, understanding how text is stored will set the
scene for understanding how multimedia is stored. Interestingly, when computers were
first developed, it was thought that their major use would be processing numbers (called
number-crunching). This is not the major use of computers today. Processing words (not
called word-crunching!) is the major use.
Question: So, how are words stored?
Answer: Character by character.
Characters can be more than letters - they can be digits, punctuation. Even the carriagereturn when you hit the return key is stored as a character. Computers deal with all data
by turning switches off and on in a sequence. We look at this by calling an off switch "0"
and and on switch "1". These 0's and 1's are called bits. Everything in a computer is
ultimately represented by sequences of 0's and 1's - bits. If the sequence were of length 2,
we could have 00, 01, 10, or 11. Four items. Similarly, we find that a sequence of length
3 can represent 8 items (000, 001, 010, ...). A sequence of length 4 can represent 16
things (0000, 0001, 0010, ...). There are about 128 characters that a computer has to
store. This should take a sequence of length 7. In reality, 8 bits are used instead of 7 (the
8th bit is used to check on the data). The point to remember here is that: n bits can
represent 2^n items
14.3 Multimedia Sound
Multimedia Sound is a CD-ROM resource for Physics education. It is a
collection of real sounds generated by musical instruments, laboratory sound sources and
everyday objects such as glass beakers and plastic straws. Tools provided on the disc
allow students to compare and contrast the waveforms and frequency spectra generated
by the sound sources, and to measure amplitudes and frequencies as functions of time.
The linear dimensions of the sound sources can determined from calibrated photographs
enabling investigation of, for example, the relationship between the length of a string or a
pipe and the fundamental frequency of the sound it generates. An audio narration
describes the key concepts illustrated by each example and a supporting text file provides
essential data, poses challenging questions and suggests possible investigations.
Additional features of the disc include: a six component sound synthesiser which
students can use to generate their own sound samples; the facility to import sounds
recorded with a microphone plugged into the PC's sound card or taken from an audio CD;
extensive help files outlining the fundamental Physics of sound.
14.4 The MIDI Format
The MIDI (Musical Instrument Digital Interface) is a format for sending music
information between electronic music devices like synthesizers and PC sound cards.
The MIDI format was developed in 1982 by the music industry. The MIDI format is very
flexible and can be used for everything from very simple to real professional music
MIDI files do not contain sampled sound, but a set of digital musical instructions
(musical notes) that can be interpreted by your PC's sound card.
The downside of MIDI is that it cannot record sounds (only notes). Or, to put it another
way: It cannot store songs, only tunes.
The upside of the MIDI format is that since it contains only instructions (notes), MIDI
files can be extremely small. The example above is only 23K in size but it plays for
nearly 5 minutes.
The MIDI format is supported by many different software systems over a large range of
platforms. MIDI files are supported by all the most popular Internet browsers.
Sounds stored in the MIDI format have the extension .mid or .midi.
14.5 The RealAudio Format
The RealAudio format was developed for the Internet by Real Media. The format also
supports video.
The format allows streaming of audio (on-line music, Internet radio) with low
bandwidths. Because of the low bandwidth priority, quality is often reduced.
Sounds stored in the RealAudio format have the extension .rm or .ram.
14.6 The AU Format
The AU format is supported by many different software systems over a large range of
Sounds stored in the AU format have the extension .au.
14.7 The AIFF Format
The AIFF (Audio Interchange File Format) was developed by Apple.
AIFF files are not cross-platform and the format is not supported by all web browsers.
Sounds stored in the AIFF format have the extension .aif or .aiff.
14.8 The SND Format
The SND (Sound) was developed by Apple.
SND files are not cross-platform and the format is not supported by all web browsers.
Sounds stored in the SND format have the extension .snd.
14.9 The WAVE Format
The WAVE (waveform) format is developed by IBM and Microsoft.
It is supported by all computers running Windows, and by all the most popular web
Sounds stored in the WAVE format have the extension .wav.
14.10 The MP3 Format (MPEG)
MP3 files are actually MPEG files. But the MPEG format was originally developed for
video by the Moving Pictures Experts Group. We can say that MP3 files are the sound
part of the MPEG video format.
MP3 is one of the most popular sound formats for music recording. The MP3 encoding
system combines good compression (small files) with high quality. Expect all your future
software systems to support it.
Sounds stored in the MP3 format have the extension .mp3, or .mpga (for MPG Audio).
What Format To Use?
The WAVE format is one of the most popular sound format on the Internet, and it is
supported by all popular browsers. If you want recorded sound (music or speech) to be
available to all your visitors, you should use the WAVE format.
14.11 Let us Sum Up
In this lesson we have learnt about multimedia text and sound
14.12 Lesson-end Activities
After learning this lesson, try to discuss among your friends and answer these
questions to check your progress.
a) Sound
b) Text
14.13 Points for Discussion
Discuss about the following
Various text formats
Various sound formats
14.14 Model answers to “Check your Progress”
In order to check your progress, try to answer the following
a) MIDI format
b) MP3 format
14.15 References
Chapter 15, 16 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006
Z.S. Bojkovic, D.A Milovanovic, Multimedia Communication Systems, PHI, 2002
S.J. Gibbs, D.C. Tsichritzis, Multimedia Programming, Addison-Wesley, 1995
J.F. Koegel, Multimedia Systems, Pearson Education, 2001
15.1 Aims and Objectives
15.2 Introduction
15.3 Different Graphic Formats?
15.4 Pixels and the Web
15.5 Meta/Vector Image Formats
15.6 What's A Bitmap?
15.7 Compression
15.8 The GIF Image Formats
15.9 Animation
Interlaced vs. Non-Interlaced GIF
JPEG Image Formats
Progressive JPEGs
Which image do I use where?
How do I save in these formats?
Do you edit and create images in GIF or JPEG?
Multimedia Animation
15.19 Let us Sum Up
15.20 Lesson-end Activities
15.21 Points for Discussion
15.22 Model answers to “Check your Progress”
15.23 References
15.1 Aims and Objectives
The aim of this lesson is to learn the concept of images and animations in multimedia.
The objectives of this lesson are to make the student aware of the following concepts
various imaging formats
15.2 Introduction
If you really want to be strict, computer pictures are files, the same way word
documents or solitaire games are files. They're all a bunch of ones and zeros all in a row.
But we do have to communicate with one another so let's decide.
Image. We'll use "image". That seems to cover a wide enough topic range.
I went to my reference books and there I found that "graphic" is more of an adjective,
as in "graphic format." You see, we denote images on the Internet by their graphic
format. GIF is not the name of the image. GIF is the compression factors used to create
the raster format set up by CompuServe. (More on that in a moment).
So, they're all images unless you're talking about something specific.
15.3 Different Graphic Formats?
It does seem like a big number, doesn't it? In reality, there are not 44 different graphic
format names. Many of the 44 are different versions under the same compression
umbrella, interlaced and non-interlaced GIF, for example.
There actually are only two basic methods for a computer to render, or store and
display, an image. When you save an image in a specific format you are creating either a
raster or meta/vector graphic format. Here's the lowdown:
Raster image formats (RIFs) should be the most familiar to Internet users. A Raster
format breaks the image into a series of colored dots called pixels. The number of ones
and zeros (bits) used to create each pixel denotes the depth of color you can put into your
If your pixel is denoted with only one bit-per-pixel then that pixel must be black or
white. Why? Because that pixel can only be a one or a zero, on or off, black or white.
Bump that up to 4 bits-per-pixel and you're able to set that colored dot to one of 16
colors. If you go even higher to 8 bits-per-pixel, you can save that colored dot at up to
256 different colors.
Does that number, 256 sound familiar to anyone? That's the upper color level of a GIF
image. Sure, you can go with less than 256 colors, but you cannot have over 256.
That's why a GIF image doesn't work overly well for photographs and larger images.
There are a whole lot more than 256 colors in the world. Images can carry millions. But if
you want smaller icon images, GIFs are the way to go.
Raster image formats can also save at 16, 24, and 32 bits-per-pixel. At the two highest
levels, the pixels themselves can carry up to 16,777,216 different colors. The image looks
great! Bitmaps saved at 24 bits-per-pixel are great quality images, but of course they also
run about a megabyte per picture. There's always a trade-off, isn't there?
The three main Internet formats, GIF, JPEG, and Bitmap, are all Raster formats.
Some other Raster formats include the following:
Windows Clipart
ZOFT Paintbrush
OS/2 Warp format
Kodak's FlashPic
GEM Paint format
JPEG Related Image format
MAC MacPaint
MacPaint New Version
Macintosh PICT format
ZSoft Paintbrush
Portable Pixel Map (UNIX)
Paint Shop Pro format
RAW Unencoded image format
(Used to lower image bit rates)
TIFF Aldus Corporation format
WPG WordPerfect image format
15.4 Pixels and the Web
Since I brought up pixels, I thought now might be a pretty good time to talk about
pixels and the Web. How much is too much? How many is too few?
There is a delicate balance between the crispness of a picture and the number of pixels
needed to display it. Let's say you have two images, each is 5 inches across and 3 inches
down. One uses 300 pixels to span that five inches, the other uses 1500. Obviously, the
one with 1500 uses smaller pixels. It is also the one that offers a more crisp, detailed
look. The more pixels, the more detailed the image will be. Of course, the more pixels the
more bytes the image will take up.
So, how much is enough? That depends on whom you are speaking to, and right now
you're speaking to me. I always go with 100 pixels per inch. That creates a ten-thousand
pixel square inch. I've found that allows for a pretty crisp image without going overboard
on the bytes. It also allows some leeway to increase or decrease the size of the image and
not mess it up too much.
The lowest I'd go is 72 pixels per inch, the agreed upon low end of the image scale. In
terms of pixels per square inch, it's a whale of a drop to 5184. Try that. See if you like it,
but I think you'll find that lower definition monitors really play havoc with the image.
15.5 Meta/Vector Image Formats
You may not have heard of this type of image formatting, not that you had heard of
Raster, either. This formatting falls into a lot of proprietary formats, formats made for
specific programs. CorelDraw (CDR), Hewlett-Packard Graphics Language (HGL), and
Windows Metafiles (EMF) are a few examples.
Where the Meta/Vector formats have it over Raster is that they are more than a simple
grid of colored dots. They're actual vectors of data stored in mathematical formats rather
than bits of colored dots. This allows for a strange shaping of colors and images that can
be perfectly cropped on an arc. A squared-off map of dots cannot produce that arc as
well. In addition, since the information is encoded in vectors, Meta/Vector image formats
can be blown up or down (a property known as "scalability") without looking jagged or
crowded (a property known as "pixelating").
So that I do not receive e-mail from those in the computer image know, there is a
difference in Meta and Vector formats. Vector formats can contain only vector data
whereas Meta files, as is implied by the name, can contain multiple formats. This means
there can be a lovely Bitmap plopped right in the middle of your Windows Meta file.
You'll never know or see the difference but, there it is. I'm just trying to keep everybody
15.6 What's A Bitmap?
I get that question a lot. Usually it's followed with "How come it only works on
Microsoft Internet Explorer?" The second question's the easiest. Microsoft invented the
Bitmap format. It would only make sense they would include it in their browser. Every
time you boot up your PC, the majority of the images used in the process and on the
desktop are Bitmaps.
If you're using an MSIE browser, you can view this first example. The image is St.
Sophia in Istanbul. The picture is taken from the city's hippodrome.
Against what I said above, Bitmaps will display on all browsers, just not in the
familiar <IMG SRC="--"> format we're all used to. I see Bitmaps used mostly as return
images from PERL Common Gateway Interfaces (CGIs). A counter is a perfect example.
Page counters that have that "odometer" effect are Bitmap images created by the server,
rather than as an inline image. Bitmaps are perfect for this process because they're a
simple series of colored dots. There's nothing fancy to building them.
It's actually a fairly simple process. In the script that runs the counter, you "build"
each number for the counter to display. Note the counter is black and white. That's only a
one bit-per-pixel level image. To create the number zero in the counter above, you would
build a grid 7 pixels wide by 10 pixels high. The pixels you want to remain black, you
would denote as zero. Those you wanted white, you'd denote as one.
Bitmaps are good images, but they're not great. If you've played with Bitmaps versus
any other image formats, you might have noticed that the Bitmap format creates images
that are a little heavy on the bytes. The reason is that the Bitmap format is not very
efficient at storing data. What you see is pretty much what you get, one series of bits
stacked on top of another.
15.7 Compression
I said above that a Bitmap was a simple series of pixels all stacked up. But the same
image saved in GIF or JPEG format uses less bytes to make up the file. How?
"Compression" is a computer term that represents a variety of mathematical formats
used to compress an image's byte size. Let's say you have an image where the upper
right-hand corner has four pixels all the same color. Why not find a way to make those
four pixels into one? That would cut down the number of bytes by three-fourths, at least
in the one corner. That's a compression factor.
Bitmaps can be compressed to a point. The process is called "run-length encoding."
Runs of pixels that are all the same color are all combined into one pixel. The longer the
run of pixels, the more compression. Bitmaps with little detail or color variance will
really compress. Those with a great deal of detail don't offer much in the way of
compression. Bitmaps that use the run-length encoding can carry either the common
".bmp" extension or ".rle". Another difference between the two files is that the common
Bitmap can accept 16 million different colors per pixel. Saving the same image in runlength encoding knocks the bits-per-pixel down to 8. That locks the level of color in at no
more than 256. That's even more compression of bytes to boot.
15.8 The GIF Image Formats
So, why wasn't the Bitmap chosen as the King of all Internet Images? Because Bill
Gates hadn't yet gotten into the fold when the earliest browsers started running inline
images. I don't mean to be flippant either; I truly believe that.
GIF, which stands for "Graphic Interchange Format," was first standardized in 1987
by CompuServe, although the patent for the algorithm (mathematical formula) used to
create GIF compression actually belongs to Unisys. The first format of GIF used on the
Web was called GIF87a, representing its year and version. It saved images at 8 pits-perpixel, capping the color level at 256. That 8-bit level allowed the image to work across
multiple server styles, including CompuServe, TCP/IP, and AOL. It was a graphic for all
seasons, so to speak.
CompuServe updated the GIF format in 1989 to include animation, transparency, and
interlacing. They called the new format, you guessed it: GIF89a.
There's no discernable difference between a basic (known as non-interlaced) GIF in 87
and 89 formats.
15.9 Animation
I remember when animation really came into the mainstream of Web page
development. I was deluged with e-mail asking how to do it. There's been a tutorial up
for a while now at Stop by and see it
for instruction on how to create the animations yourself.
What you are seeing in that example are 12 different images, each set one "hour"
farther ahead than the one before it. Animate them all in a row and you get that stopwatch
The concept of GIF89a animation is much the same as a picture book with small
animation cells in each corner. Flip the pages and the images appear to move. Here, you
have the ability to set the cell's (technically called an "animation frame") movement
speed in 1/100ths of a second. An internal clock embedded right into the GIF keeps count
and flips the image when the time comes.
The animation process has been bettered along the way by companies who have found
their own method of compressing the GIFs further. As you watch an animation you might
notice that very little changes from frame to frame. So, why put up a whole new GIF
image if only a small section of the frame needs to be changed? That's the key to some of
the newer compression factors in GIF animation. Less changing means fewer bytes.
15.10 Transparency
Again, if you'd like a how-to, I have one you for you at A transparent GIF is fun but limited in
that only one color of the 256-shade palette can be made transparent.
As you can see, the bytes came out the same after the image was put through the
transparency filter. The process is best described as similar to the weather forecaster on
your local news. Each night they stand in front of a big green (sometimes blue) screen
and deliver the weather while that blue or green behind them is "keyed" out and replaced
by another source. In the case of the weather forecaster, it's usually a large map with lots
of Ls and Hs.
The process in television is called a "chroma key." A computer is told to hone in on a
specific color, let's say it's green. Chroma key screens are usually green because it's the
color least likely to be found in human skin tones. You don't want to use a blue screen
and then chroma out someone's pretty blue eyes. That chroma (color) is then "erased" and
replaced by another image.
Think of that in terms of a transparent GIF. There are only 256 colors available in the
GIF. The computer is told to hone in on one of them. It's done by choosing a particular
red/green/blue shade already found in the image and blanking it out. The color is
basically dropped from the palette that makes up the image. Thus whatever is behind it
shows through.
The shape is still there though. Try this: Get an image with a transparent background
and alter its height and width in your HTML code. You'll see what should be the
transparent color seeping through.
Any color that's found in the GIF can be made transparent, not just the color in the
background. If the background of the image is speckled then the transparency is going to
be speckled. If you cut out the color blue in the background, and that color also appears
in the middle of the image, it too will be made transparent.
When I put together a transparent image, I make the image first, then copy and paste it
onto a slightly larger square. That square is the most hideous green I can mix up. I'm sure
it doesn't appear in the image. That way only the background around the image will
become clear.
15.11 Interlaced vs. Non-Interlaced GIF
The GIF images of me playing the Turkish Sitar were non-interlaced format images.
This is what is meant when someone refers to a "normal" GIF or just "GIF".
When you do NOT interlace an image, you fill it in from the top to the bottom, one
line after another. The following image is of two men coming onto a boat we used to
cross from the European to the Asian side of Turkey. The flowers they are carrying were
sold in the manner of roses we might buy our wife here in the U.S. I bought one. (What a
Hopefully, you're on a slower connection computer so you got the full effect of
waiting for the image to come in. It can be torture sometimes. That's where the brilliant
Interlaced GIF89a idea came from.
Interlacing is the concept of filling in every other line of data, then going back to the
top and doing it all again, filling in the lines you skipped. Your television works that way.
The effect on a computer monitor is that the graphic appears blurry at first and then
sharpens up as the other lines fill in. That allows your viewer to at least get an idea of
what's coming up rather than waiting for the entire image, line by line.
Both interlaced and non-interlaced GIFs get you to the same destination. They just do
it differently. It's up to you which you feel is better.
15.12 JPEG Image Formats
JPEG is a compression algorithm developed by the people the format is named after,
the Joint Photographic Experts Group. JPEG's big selling point is that its compression
factor stores the image on the hard drive in less bytes than the image is when it actually
displays. The Web took to the format straightaway because not only did the image store
in fewer bytes, it transferred in fewer bytes. As the Internet adage goes, the pipeline isn't
getting any bigger so we need to make what is traveling through it smaller.
For a long while, GIF ruled the Internet roost. I was one of the people who didn't
really like this new JPEG format when it came out. It was less grainy than GIF, but it also
caused computers without a decent amount of memory to crash the browser. (JPEGs have
to be "blown up" to their full size. That takes some memory.) There was a time when
people only had 8 or 4 megs or memory in their boxes. Really. It was way back in the
Dark Ages.
JPEGs are "lossy." That's a term that means you trade-off detail in the displayed
picture for a smaller storage file. I always save my JPEGs at 50% or medium
Here's a look at the same image saved in normal, or what's called "sequential"
encoding. That's a top-to-bottom, single-line, equal to the GIF89a non-interlaced format.
The image is of an open air market in Basra. The smell was amazing. If you like olives,
go to Turkey. Cucumbers, too, believe it or not.
The difference between the 1% and 50% compression is not too bad, but the drop in
bytes is impressive. The numbers I am showing are storage numbers, the amount of hard
drive space the image takes up.
You've probably already surmised that 50% compression means that 50% of the image
is included in the algorithm. If you don't put a 50% compressed image next to an exact
duplicate image at 1% compression, it looks pretty good. But what about that 99%
compression image? It looks horrible, but it's great for teaching. Look at it again. See
how it appears to be made of blocks? That's what's meant by lossy. Bytes are lost at the
expense of detail. You can see where the compression algorithm found groups of pixels
that all appeared to be close in color and just grouped them all together as one. You might
be hard pressed to figure out what the image was actually showing if I didn't tell you.
15.13 Progressive JPEGs
You can almost guess what this is all about. A progressive JPEG works a lot like the
interlaced GIF89a by filling in every other line, then returning to the top of the image to
fill in the remainder.
Obviously, here's where bumping up the compression does not pay off. Rule of
thumb: If you're going to use progressive JPEG, keep the compression up high, 75% or
15.14 Which image do I use where?
There's just not a good answer to this question. No matter what I say, someone else
can give you just as compelling a reason why you should do the opposite. I'll tell you the
rules I follow:
Small images, like icons and buttons: GIF (usually non-interlaced)
Line art, grayscale (black and white), cartoons: GIF (usually non-interlaced)
Scanned images and photographs: JPEG. (I prefer sequential. I'm not a fan of
Large images or images with a lot of detail: JPEG (I prefer sequential)
That said, I also follow the thinking, "Do people really need to see this image?" Can I
get away with text rather than an image link? Can I make links to images allowing the
viewer to choose whether to look or not? The fewer images I have on a page, the faster it
comes in. I also attempt to have the same images across multiple pages, if possible. That
way the viewer only has to wait once. After that, the images are in the cache and they pop
right up.
15.15 How do I save in these formats?
You have to have an image editor. I own three. Most of my graphic work for the Web
is done in PaintShop Pro. I do that because PaintShop Pro is shareware and you can get
your hands on the same copy I have. That way I know if I can do it, you can do it.
To get these formats, you need to make a point of saving in these formats. When your
image editor is open and you have an image you wish to save, always choose SAVE AS
from the FILE menu. You'll get a dialogue box that asks where you'd like to save the
image. Better yet, somewhere on that dialogue box is the opportunity for you to choose a
different image format. Let's say you choose GIF. Keep looking. Somewhere on the same
dialogue box will be an OPTIONS button (or something close). That's where you'll
choose 87a or 89a, interlaced or non-interlaced, formats.
If you choose JPEG, you'll get the option of choosing the compression rate. You may
not get to play with the sliding scale I get. You may only get a series of compression
choices, high, medium, low, etc. Go high.
15.16 Do you edit and create images in GIF or JPEG?
Neither. I always edit in the PaintShop Pro or Bitmap format. Others have told me that
image creation and editing should only be done in a Vector format. Either way, make a
point of editing with large images. The larger the image, the better chance you have of
making that perfect crop. Edit at the highest color level the image program will allow.
You can always resize and save to a low-byte format after you've finished creating the
15.17 Animation
Most Web animation requires special plug-ins for viewing. The exception is the animated
GIF format, which is by far the most prevalent animation format on the Web, followed
closely by Macromedia's Flash format. The animation option of the GIF format combines
individual GIF images into a single file to create animation. You can set the animation to
loop on the page or to play once, and you can designate the duration for each frame in the
Animated GIFs have several drawbacks. One concerns the user interface. GIF animations
do not provide interface controls, so users have no easy way to stop a looping animation
short of closing the browser window. They also lack the means to replay nonlooping
animation. Second, the animated GIF format does not perform interframe compression,
which means that if you create a ten-frame animation and each frame is a 20 KB GIF ,
you'll be putting a 200 KB file on your page. And the final drawback is a concern that
pertains to animations in general. Most animation is nothing more than a distraction. If
you place animation alongside primary content you will simply disrupt your readers'
concentration and keep them from the objective of your site. If you require users to sit
through your spiffy Flash intro every time they visit your site, you are effectively turning
them away at the door.
There is a place for animation on the Web, however. Simple animation on a Web site's
main home page can provide just the right amount of visual interest to invite users to
explore your materials. There, the essential content is typically a menu of links, so the
threat of distraction is less than it would be on an internal content page. Also, subtle
animation such as a rollover can help guide the user to interface elements that they might
otherwise overlook. Animation can also be useful in illustrating concepts or procedures,
such as change over time. When you have animation that relates to the content of your
site, one way to minimize the potential distraction is to present the animation in a
secondary window. This technique offers a measure of viewer control: readers can open
the window to view the animation and then close the window when they're through.
15.18 Multimedia Animation
From the early days of the web, when the only thing that moved on your screen was the
mouse cursor - there's now a bewildering array of methods for animating pages. Here's a
Shockwave, Flash (formerly FutureSplash). Macromedia's Shockwave plug-ins
and Flash are leaders in plug-in animation.
QuickTime is the multi-platform industry-standard multimedia architecture used
by software tool vendors and content creators to create and deliver synchronized
graphics, sound, video, text and music. FLiCs, AVI all require pre-existing
software to be on your computer before you can view them.
mBED. mbedlets are interactive multimedia interfaces within web pages. They
include graphics, animation, sound. They stream data directly off the web as
needed and attempt to use bandwidth as efficiently as possible. They can
communicate back to the server using standard HTTP methods. And they respond
to user actions such as mouse clicks and key events.
and Sizzler.
Javascript animations require preloading and users can disable Javascript in their
Framation (TM) is a technique using a combination of meta-refresh and frames.
GIF animation. Self-contained GIF files are downloaded once and played from
the computer's disk cache. You can download several per page, and even place a
single animated GIF dozens of times on the same page, creating effects that would
not be easy with other solutions. Unlike other movie formats, GIF still supports
transparency, even in animations. They are as simple to use and implement as any
still GIF image. The only thing GIF lacks is sound (and BTW sound has been
added to GIFs in the past) and real-time speed variation (like AVI's ability to skip
frames when on a slow machine).
15.19 Let us Sum Up
In this lesson we have learnt about Images and animation in multimedia.
15.20 Lesson-end Activities
After learning this lesson, try to discuss among your friends and answer these
questions to check your progress.
a) What the various image formats
b) Define animation
15.21 Points for Discussion
Discuss the following
Define Compression
Discuss about JPEG image
15.22 Model answers to “Check your Progress”
In order to check your progress, try to answer the following questions
a) Meta/Vector Image Formats
b) Different Graphic Formats?
15.23 References
Chapter 15, 16 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006
Z.S. Bojkovic, D.A Milovanovic, Multimedia Communication Systems, PHI, 2002
S.J. Gibbs, D.C. Tsichritzis, Multimedia Programming, Addison-Wesley, 1995
J.F. Koegel, Multimedia Systems, Pearson Education, 2001
LESSON – 16:
16.1 Aims and Objectives
16.2 Introduction
16.3 Advantages of Digital Video
16.4 File Size Considerations
16.5 Video Compression
Lossless Compression
Lossy Compression
16.6 Compression Standards
Microsoft’s Video for Windows
Apple’s QuickTime
16.7 Hardware Requirements
16.8 Guidelines for short video sequences
16.9 Multimedia Video Formats
The AVI Format
The Windows Media Format
The MPEG Format
The QuickTime Format
The RealVideo Format
The Shockwave (Flash) Format
16.10 Playing Videos On The Web
Inline Videos
Using A Helper (Plug-In)
Using The <img> Element
Using The <embed> Element
Using The <object> Element
Using A Hyperlink
16.11 Let us Sum Up
16.12 Lesson-end Activities
16.13 Points for Discussion
16.14 Model answers to “Check your Progress”
16.15 References
16.1 Aims and Objectives
The aim of this lesson is to learn the concept of video in multimedia
The objectives of this lesson are to make the student aware of the following concepts
a) Video
b) Various video formats
16.2 Introduction
Video is the most challenging multimedia content to deliver via the Web. One second
of uncompressed NTSC (National Television Standards Committee) video, the
international standard for television and video, requires approximately 27 megabytes of
disk storage space. The amount of scaling and compression required to turn this quantity
of data into something that can be used on a network is significant, sometimes so much
so as to render the material useless. If at all possible, tailor your video content for the
Shoot original video; that way you can take steps to create video that will
compress efficiently and still look good at low resolution and frame rates.
Shoot close-ups. Wide shots have too much detail to make sense at low
Shoot against a simple monochromatic background whenever possible. This will
make small video images easier to understand and will increase the efficiency of
Use a tripod to minimize camera movement. A camera locked in one position will
minimize the differences between frames and greatly improve video compression.
Avoid zooming and panning. These can make low frame-rate movies confusing to
view and interpret and can cause them to compress poorly.
When editing your video, use hard cuts between shots. Don't use the transitional
effects offered by video editing software, such as dissolves or elaborate wipes,
because they will not compress efficiently and will not play smoothly on the Web.
If you are digitizing material that was originally recorded for video or film,
choose your material carefully. Look for clips that contain minimal motion and
lack essential but small details. Motion and detail are the most obvious
shortcomings of low-resolution video.
In the past, video has been defined as multimedia. Video makes use of all of the
elements of multimedia, bringing your products and services alive, but at a high cost.
Scripting, hiring actors, set creation, filming, post-production editing and mastering can
add up very quickly. Five minutes of live action video can cost many times more than a
multimedia production.
The embedding of video in multimedia applications is a powerful way to convey
information which can incorporate a personal element which other media lack. Video
enhances, dramatizes, and gives impact to your multimedia application. Your audience
will better understand the message of your application with the adequate and carefully
planned integration of video. Video is an important way of conveying a message to the
MTV generation. But be careful; good-quality digital video clips require very
sophisticated hardware and software configuration and support.
The advantage of integrating video into a multimedia presentation is the capacity
to effectively convey a great deal of information in the least amount of time. Remember
that motion integrated with sound is a key for your audience's understanding. It also
increases the retention of the presented information (knowledge).
The ability to incorporate digitized video into a multimedia title marked an
important achievement in the evolution of the multimedia industry. Video brings a sense
of realism to multimedia titles and is useful in engaging the user and evoking emotion
There are two basic approaches to delivering video on a computer screen –
analogue and digital video.
Analogue video is essentially a product of the television industry and therefore
conforms to television standards.
Digital video is a product of the computing industry and therefore conforms to
digital data standards.
Video, like audio. Is usually recorded and played as an analog signal. It must
therefore be digitized in order to be incorporated into a multimedia title.
Figure below shows the process for digitizing an analog video signal.
A video source, such as video camera, VCR, TV, or videodisc, is connected to a
video capture card in a computer. As the video source is played, the analog signal is sent
to the video card and converted into a digital file that is stored on the hard drive. At the
same time, the sound from the video source is also digitized.
PAL (Phase Alternating Line) and NTSC (National Television System
Committee) are the two video standards of most importance for analogue video.
PAL is the standard for most of Europe and the Commonwealth, NTSC for North
and South America. The standards are inter-convertible, but conversion normally has to
be performed by a facilities house and some quality loss may occur.
Analogue video can be delivered into the computing interface from any
compatible video source (video recorder, videodisc player, live television) providing the
computer is equipped with a special overlay board, which synchronizes video and
computer signals and displays computer-generated text and graphics over the video.
16.3 Advantages of Digital Video
One of the advantages of digitized video is that it can be easily edited. Analog
video, such as a videotape, is linear; there is a beginning, middle, and end. If you want to
edit it, you need to continually rewind, pause, and fast forward the tape to display the
desired frames.
Digitized video on the other hand, allows random access to any part of the video,
and editing can be as easy as the cut and paste process in a word processing program. In
addition, adding special effects such as fly-in titles and transitions is relatively simple.
Other advantages:
The video is stored as a standard computer file. Thus it can be copied with no loss
in quality, and also can be transmitted over standard computer networks.
Software motion video does not require specialized hardware for playback.
Unlike analog video, digital video requires neither a video board in the computer
nor an external device (which adds extra costs and complexity) such as a
videodisc player.
16.4 File Size Considerations
The embedding of video in multimedia applications is a powerful way to convey
information which can incorporate a personal element which other media lack. Current
technology limits digital video's speed of playback and the size of the window which can
be displayed. When played back from the computer's hard disk, videos are much less
smooth than conventional television images due to the hard disk data transfer rate. Often
compression techniques are used with digital video and as a result resolution is often
compromised. Also, the storage of video files requires a comparatively large amount of
hard disk space.
Digitized video files can be extremely large. A single second of high-quality color
video that takes up only one-quarter of a computer screen can be as large as 1 MB.
Several elements determine the file size; in addition to the length of the video,
these include:
Frame Rate
Image Size
Color Depth
In most cases, a quarter-screen image size (320 x 240), an 8-bit color depth (256
colors), and a frame rate of 15 fps is acceptable for a multimedia title. And even this
minimum results in a very large file size.
16.5 Video Compression
Because of the large sizes associated with video files, video
compression/decompression programs, known as codecs, have been developed. These
programs can substantially reduce the size of video files, which means that more video
can fit on a single CD and that the speed of transferring video from a CD to the computer
can be increased.
There are two types of compression
Lossless compression
Lossy compression
16.5.1 Lossless Compression
Lossless compression preserves the exact image throughout the compression and
decompression process. An example of when this is important is in the use of text
images. Text needs to appear exactly the same before and after file compression. One
technique for text compression is to identify repeating words and assign them a code.
For example, if the word multimedia appears several times in a text file, it would
be assigned a code that takes up less space than the actual word. During decompression,
the code would be changed back to the word multimedia.
16.5.2 Lossy Compression
Lossy compression actually eliminates some of the data in the image and
therefore provides greater compression ratios than lossless compression. The greater the
compression ratio, however, the poorer the decompressed image. Thus, the trade-off is
file size versus image quality. Lossy compression is applied to video because some drop
in the quality is not noticeable in moving images.
16.6 Compression Standards
Certain standards have been established for compression programs, including
Microsoft’s Video for Windows
Apple’s QuickTime
16.6.1 JPEG
Standards developed by Joint Photographic Experts Groups based on compression
of still images. Motion JPEG treats each video frame as a still image. This results in large
file sizes or quality degradation at high compression ratios.
JPEG; Although strictly a still image compression standard, stills can become
movie if delivered at 25 (or 30) frames per second. JPEG compression requires hardware,
but decompression can now be achieved in software only (e.g under QuickTime and
Video forWindows).
Figure below shows how the JPEG process works. Often areas of an image
(especially backgrounds) contain similar information. JPEG compression identifies these
areas and stores them as blocks of pixels instead of pixel by pixel, thus reducing the
amount of information needed to store the image.
The blocks are then reassembled when the file is decompressed Rather than
separately storing data for each of the 256 blue pixels in this block of background color,
JPEG eliminates the redundant information and record just the color, size, and location of
the block in graphic
A higher number of blocks results in a larger file but better quality
Fewer blocks make a smaller file but result in more-lossy compression; data is
irrevocably lost.
Compression rations of 20:1 can be achieved without substantially affecting
image quality. A 20:1 compression ratio would reduce a 1 MB file to only 50 Kb.
16.6.2 MPEG
Standard based on Motion Picture Expert Group uses an asymmetrical algorithm,
which requires a long time to perform real time compression. MPEG is based on frame
differences which results in high compression ratios and small file sizes.
MPEG also add another process to the still image compression when working
with video. MPEG looks for the changes in the image from frame to frame. Keyframes
are identified every few frames, and the changes that occur from keyframe to keyframe
are recorded
MPEG can provide greater compression ratios than JPEG, but it requires
hardware (a card inserted in the computer) that is not needed for JPEG compression. This
limits the use of MPEG compression for multimedia titles, because MPEG cards are
standard on the typical multimedia playback system.
16.6.3 Microsoft’s Video for Windows
Microsoft’s Video for Windows software is based on the .AVI (Audio Video
Interleave) file format where the audio and video are interleaved. This permits the sound
to appear to be in synchronize with the motion of a video file.
16.6.4 Apple’s QuickTime
Apple developed software compression for the Macintosh called QuickTime
which is in the movie format. The movie format has a data structure format used for
production, and a compact format for playback. It uses lossy compression coding and can
achieve ratios of 5:1 to 25:1. The QuickTime player does include a volume control.
QuickTime for Windows integrates video, animation, high-quality still images,
and high quality sound with Windows applications – boosting the impact of all types of
16.7 Hardware Requirements
When a developer is interested in including digitally recorded video in an
application there are elements of hardware which are essential. To begin with, as with
images, there must be some way of gathering the raw video which will be translated into
digital video.
Video camcorders or video tape recorders can be used for gathering this original
information. Depending on the frequency of use of these pieces of equipment it may be
necessary to purchase them as part of the multimedia setup. or it may be better to loan
equipment from Audio/Visual Services near you
Once this media has been collected it is necessary to translate it into a digital
format, using a video digitizing card, so it can be used on the computer. PCs do not
generally come with video cards, but the Macintosh AV series of computers have built in
Video capture boards are designed to either grab still frames or capture motion
video as input into a computer. In some cases, the video plays through into a window on
the monitor.
16.8 Guidelines for short video sequences
Short video sequences, often accompanied by either spoken commentary or
atmospheric music is an attractive way of presenting information.
1. Care should be taken not to present a video just for the sake of it. For example,
voice output only can be as effective, and requires less storage space, than a video
of someone speaking (i.e. a "talking-head").
2. Using video as part of a multimedia application usually requires a quality as high
as that of television sets to fulfil users expectations.
3. Use of techniques such as cut, fade, dissolve, wipe, overlap, multiple exposure,
should be limited to avoid distracting the user from the content.
4. To make proper use of video sequences in multimedia applications, short
sequences are needed as a part of a greater whole. This is different from watching
a film which usually involves watching it from beginning to end in a single
sequence. Video sequences should be limited to about 45 seconds; longer video
sequences can reduce the users concentration.
5. Video should be accompanied by a soundtrack in order to give extra information
or to add specific detail to the information.
6. Videos need time and careful direction if they are to present information
7. If the lighting conditions under which the video is to be viewed may be poor,
controls may be provided for the user to alter display characteristics such as
brightness, contrast, and colour strength.
8. Provide low quality video within a small window, since full screen video raises
the expectation of the user. Often some kind of stage or other 'decoration', e.g. a
cinema metaphor (i.e. background) may be used to show low resolution video in a
part of a screen.
9. The actual position within the video or animation sequence, and the total length of
the sequence, should be shown on a time scale.
10. The user should be able to interrupt the video (or animation) sequence at any time
and to repeat parts of it. The most important controls to provide are: play, pause,
replay from start. However a minimum requirement is that users should be able to
cancel the video or animation sequence at any time, and move on the next part of
the interface.
11. Video controls should be based on the controls on a video recorder VCR or hi-fi
which are familiar to many people.
12. It is also desirable to provide controls to set video characteristics such as
brightness, contrast, colour and hue.
16.9 Multimedia Video Formats
The common digital video formats are :
Motion Pictures Expert Group (.MPG)
Quicktime (.MOV)
Video for Windows (.AVI).
Video can be stored in many different formats.
16.9.1 The AVI Format
The AVI (Audio Video Interleave) format was developed by Microsoft.
The AVI format is supported by all computers running Windows, and by all the
most popular web browsers. It is a very common format on the Internet, but not always
possible to play on non-Windows computers.
Videos stored in the AVI format have the extension .avi.
16.9.2 The Windows Media Format
The Windows Media format is developed by Microsoft.
Windows Media is a common format on the Internet, but Windows Media movies
cannot be played on non-Windows computer without an extra (free) component installed.
Some later Windows Media movies cannot play at all on non-Windows computers
because no player is available.
Videos stored in the Windows Media format have the extension .wmv.
16.9.3 The MPEG Format
The MPEG (Moving Pictures Expert Group) format is the most popular format on
the Internet. It is cross-platform, and supported by all the most popular web browsers.
Videos stored in the MPEG format have the extension .mpg or .mpeg.
16.9.4 The QuickTime Format
The QuickTime format is developed by Apple.
QuickTime is a common format on the Internet, but QuickTime movies cannot be
played on a Windows computer without an extra (free) component installed.
Videos stored in the QuickTime format have the extension .mov.
16.9.5 The RealVideo Format
The RealVideo format was developed for the Internet by Real Media.
The format allows streaming of video (on-line video, Internet TV) with low
bandwidths. Because of the low bandwidth priority, quality is often reduced.
Videos stored in the RealVideo format have the extension .rm or .ram.
16.9.6 The Shockwave (Flash) Format
The Shockwave format was developed by Macromedia.
The Shockwave format requires an extra component to play. This component
comes preinstalled with the latest versions of Netscape and Internet Explorer.
Videos stored in the Shockwave format have the extension .swf.
16.10 Playing Videos On The Web
Videos can be played "inline" or by a "helper", depending on the HTML element you use.
16.10.1 Inline Videos
When a video is included in a web page it is called inline video.
Inline video can be added to a web page by using the <img> element.
If you plan to use inline videos in your web applications, be aware that many
people find inline videos annoying. Also note that some users might have turned off the
inline video option in their browser.
Our best advice is to include inline videos only in web pages where the user
expects to see a video. An example of this is a page which opens after the user has
clicked on a link to see the video.
16.10.2 Using A Helper (Plug-In)
A helper application is a program that can be launched by the browser to "help"
playing a video. Helper applications are also called Plug-Ins.
Helper applications can be launched using the <embed> element, the <applet>
element, or the <object> element.
One great advantage of using a helper application is that you can let some (or all)
of the player settings be controlled by the user.
Most helper applications allow manual (or programmed) control over the volume
settings and play functions like rewind, pause, stop and play.
16.10.3 Using The <img> Element
Internet Explorer supports the dynsrc attribute in the <img> element.
The purpose of this element is to embed multimedia elements in web page:
<img dynsrc="video.avi" />
The code fraction above displays an AVI file embedded in a web page.
Note: The dynsrc attribute is not a standard HTML or XHTML attribute. It is supported
by Internet Explorer only.
16.10.4 Using The <embed> Element
Internet Explorer and Netscape both support an element called <embed>.
The purpose of this element is to embed multimedia elements in web page:
<embed src="video.avi" />
The code fraction above displays an AVI file embedded in a web page.
A list of attributes for the <embed> element can be found in a later chapter of this
Note: The <embed> element is supported by both Internet Explorer and Netscape,
but it is not a standard HTML or XHTML element. The World Wide Web Consortium
(W3C) recommend using the <object> element instead.
16.10.5 Using The <object> Element
Internet Explorer and Netscape both support an HTML element called <object>.
The purpose of this element is to embed multimedia elements in web page:
<object data="video.avi" type="video/avi" />
The code fraction above displays an AVI file embedded in a web page.
A list of attributes for the <object> element can be found in a later chapter of this tutorial.
16.10.6 Using A Hyperlink
If a web page includes a hyperlink to a media file, most browsers will use a
"helper application" to play the file:
<a href="video.avi">
Click here to play a video file
The code fraction above displays a link to an AVI file. If the user clicks on the link,
the browser will launch a helper application like Windows Media Player to play the AVI
16.11 Let us Sum Up
In this lesson we have learnt about
a) Video formats
b) Compression techniques
16.12 Lesson-end Activities
After learning this lesson, try to discuss among your friends and answer these
questions to check your progress.
Discuss about MPEG standard
Discuss about video file size consideration
16.13 Points for Discussion
Discuss the following
a) Lossless compression
b) Lossy compression
16.14 Model answers to “Check your Progress”
To check your progress try answer the following
a) Discuss about AVI format
b) Discuss about MOV format
16.15 References
1. Chapter 15, 16 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006
2. Z.S. Bojkovic, D.A Milovanovic, Multimedia Communication Systems, PHI, 2002
3. S.J. Gibbs, D.C. Tsichritzis, Multimedia Programming, Addison-Wesley, 1995
4. J.F. Koegel, Multimedia Systems, Pearson Education, 2001