Download Pattern recognition has a broad range of applications in many fields

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Stereoscopy wikipedia , lookup

Color wikipedia , lookup

Framebuffer wikipedia , lookup

Color vision wikipedia , lookup

Portable Network Graphics wikipedia , lookup

Stereo display wikipedia , lookup

Anaglyph 3D wikipedia , lookup

Dither wikipedia , lookup

Color Graphics Adapter wikipedia , lookup

Original Chip Set wikipedia , lookup

Waveform graphics wikipedia , lookup

Image editing wikipedia , lookup

Spatial anti-aliasing wikipedia , lookup

MOS Technology VIC-II wikipedia , lookup

Apple II graphics wikipedia , lookup

List of 8-bit computer hardware palettes wikipedia , lookup

GIF wikipedia , lookup

Indexed color wikipedia , lookup

Hold-And-Modify wikipedia , lookup

BSAVE (bitmap format) wikipedia , lookup

Transcript
5. Robot Vision
5.1 Basic Image Processing
In this section we introduce the basics of image processing and review the most
common image processing algorithms that support robot vision.
A computerized image is a two-dimensional, digital representation of a threedimensional scene. Typically, these images are made up of a rectangular array
of gray-level or color pixels as shown below.
Color Image
Gray-Level Image
Color Image Detail
Gray-Level Image Detail
We will work with either 8-bit per pixel gray-level images or 24-bit per pixel RGB
(red, green, blue) color images in our image processing applications.
We will start with gray level images (8-bits/pixel). Each pixel can be set to an
integer (unsigned, character) value 0,1, 2,. . ., 255. representing a gray-level or
brightness from black to white. An uncompressed (24-bit per pixel) RGB color
image can be converted to a gray-level image by setting the three color values of
each pixel to the same value (usually the average of the three).
Image Histogram - The image histogram is the count of the number of
occurrences of each gray-level in the image. The image histogram gives us a
measure of the distribution of the various shades of gray in the image. We can
perform a variety of mapping operations on the pixels as the result of
manipulating the image histogram.
The histogram shown above give the count for each gray level 0-180 shown
here. When images have their most of their pixels concentratee in a small region
it usually means that there is a reduced contrast (foggy or dark image). We can
redistribute the gray-level to enhance the contrast.
original image
histogram equalization
Histogram equalization is a form of redistribution of the gray levels in an image to
increase the contrast as shown in the example above.
Linear Point Operation - Any operation that is performed on every pixel in an
image in the same manner is called a linear point operation.
g(n) =h[ f(n)]
In this expression we refer to the original image as f(n) where n represents each
of the n pixels in the image. The opeation h[ ] is an operation applied to each
pixel of the image to produce a new image g(n).
Additive (offset) - Adding a fixed value L to each pixel is called an offset or
additive point operation. This has the effect of lightening the image (assuming
that the maximum gray level in the original image is not greater than 255-L.
g(n) = f(n) + L.
Multiplicative (scaling) - Multiplying each pixel by some fixed value P is called
scaling or a multiplicative point operation.
g(n) = P f(n).
Image Negative - We can inverte the grayscale of an image to produce a
negative of the original image. This operation involves both an offset and a
scaling).
g(n) = (-1) f(n) + 255.
Image Differencing - We can take the difference between two images in order to
detect a change.
5.2 Pattern Recognition
Pattern recognition has a broad range of applications in many fields. Pattern
recognition can be defined as the study of techniques and algorithms for the
detection, prediction and modeling of repeatable structures in data. The source
of these data can be images, sounds, or quantities derived from data reduction
and analysis of any measurements.
The primary function of the human visual cortex is pattern recognition. It is such
an integral part of our daily lives we find it difficult to recognize its importance or
even that we are doing anything at all. Consider the mental image of a cat. Most
of us can recognize any kind of cat when we see a picture of one. But is the
recognition process just recalling memories of cats we have seen before and
comparing them to the one we are looking at now? It is more likely that we have
acquired a mental model of "cat-ness" that we adapt as needed to fit the new
information.
The more difficult problem is to determine the particular combination of features
in our mental model that are essential to the recognition process. Look at the
outline sketches of the heads of two different kinds of cats.
Can you tell what kinds of cats are being represented here? Something about
these sketches suggests large cats rather than house cats but otherwise we
need additional features to be more specific about the types of cats.
The essential features of our mental models of lions and tigers involve the shape
of the head in profile. However there are other characteristics that distinguish the
lion from the tiger. In this example it is not clear how we would go about building
a machine recognition system for cats. The problem is, we don't have a good
understanding of the process of extracting three dimensional information from a
two-dimensional image. We also don't have much of an idea of what our brain is
doing when we "recognize" a lion or a tiger.
5.3 Graphics File Formats
We have been looking at .RAW images. These are bitmap files containing no
header information. This means that the number of rows and columns and the
encoding of color for each pixel is left unspecified. The program must be
provided these values in order to read and/or display a .raw file correctly. The
advantage of the .RAW format is that, given the file size and configuration, they
are easy to load and store. However, the .raw graphics file format is not practical
for most applications. There are many different graphics file formats for color
and grayscale images using both indexed and RGB pixel representations.
Detailed information about graphics file formats can be found online at many web
sites such as here.
Microsoft Windows Bitmap Graphics File Format
Windows bitmap files are stored in a device-independent bitmap (DIB) format
that allows Windows to display the bitmap on any type of display device. The
term "device independent" means that the bitmap specifies pixel color in a form
independent of the method used by a display to represent color. The default
filename extension of a Windows DIB file is .BMP.
Bitmap-File Structures
Each bitmap file contains a bitmap-file header, a bitmap-information header, a
color table, and an array of bytes that defines the bitmap bits. The file has the
following form:
BITMAPFILEHEADER
BITMAPINFOHEADER
RGBQUAD
BYTE
bmfh;
bmih;
aColors[];
aBitmapBits[];
The bitmap-file header contains information about the type, size, and layout
of a device-independent bitmap file. The header is defined as a
BITMAPFILEHEADER structure.
The bitmap-information header, defined as a BITMAPINFOHEADER structure,
specifies the dimensions, compression type, and color format for the bitmap.
The color table, defined as an array of RGBQUAD structures, contains as many
elements as there are colors in the bitmap. The color table is not present
for bitmaps with 24 color bits because each pixel is represented by 24-bit
red-green-blue (RGB) values in the actual bitmap data area. The colors in the
table should appear in order of importance. This helps a display driver
render a bitmap on a device that cannot display as many colors as there are
in the bitmap. If the DIB is in Windows version 3.0 or later format, the
driver can use the biClrImportant member of the BITMAPINFOHEADER structure
to determine which colors are important.
The BITMAPINFO structure can be used to represent a combined
bitmap-information header and color table. The bitmap bits, immediately
following the color table, consist of an array of BYTE values representing
consecutive rows, or "scan lines," of the bitmap. Each scan line consists of
consecutive bytes representing the pixels in the scan line, in left-to-right
order. The number of bytes representing a scan line depends on the color
format and the width, in pixels, of the bitmap. If necessary, a scan line
must be zero-padded to end on a 32-bit boundary. However, segment
boundaries
can appear anywhere in the bitmap. The scan lines in the bitmap are stored
from bottom up. This means that the first byte in the array represents the
pixels in the lower-left corner of the bitmap and the last byte represents
the pixels in the upper-right corner.
The biBitCount member of the BITMAPINFOHEADER structure determines the
number of bits that define each pixel and the maximum number of colors in the
bitmap. These members can have any of the following values:
Value
1
Meaning
Bitmap is monochrome and the color table contains
two entries. Each bit in the bitmap array represents
a pixel. If the bit is clear, the pixel is displayed
with the color of the first entry in the color
4
8
24
table. If the bit is set, the pixel has the color of
the second entry in the table.
Bitmap has a maximum of 16 colors. Each pixel in the
bitmap is represented by a 4-bit index into the
color table. For example, if the first byte in the
bitmap is 0x1F, the byte represents two pixels. The
first pixel contains the color in the second table
entry, and the second pixel contains the color in
the sixteenth table entry.
Bitmap has a maximum of 256 colors. Each pixel in
the bitmap is represented by a 1-byte index into the
color table. For example, if the first byte in the
bitmap is 0x1F, the first pixel has the color of the
thirty-second table entry.
Bitmap has a maximum of 2^24 colors. The bmiColors
(or bmciColors) member is NULL, and each 3-byte
sequence in the bitmap array represents the relative
intensities of red, green, and blue, respectively,
for a pixel.
The biClrUsed member of the BITMAPINFOHEADER structure specifies the
number of color indexes in the color table actually used by the bitmap. If the
biClrUsed member is set to zero, the bitmap uses the maximum number of colors
corresponding to the value of the biBitCount member. An alternative form of
bitmap file uses the BITMAPCOREINFO, BITMAPCOREHEADER, and
RGBTRIPLE structures.
Bitmap Compression
Windows versions 3.0 and later support run-length encoded (RLE) formats for
compressing bitmaps that use 4 bits per pixel and 8 bits per pixel.
Compression reduces the disk and memory storage required for a bitmap.
Compression of 8-Bits-per-Pixel Bitmaps
When the biCompression member of the BITMAPINFOHEADER structure is set
to BI_RLE8, the DIB is compressed using a run-length encoded format for a
256-color bitmap. This format uses two modes: encoded mode and absolute
mode. Both modes can occur anywhere throughout a single bitmap.
Encoded Mode
A unit of information in encoded mode consists of two bytes. The first byte
specifies the number of consecutive pixels to be drawn using the color index
contained in the second byte. The first byte of the pair can be set to zero
to indicate an escape that denotes the end of a line, the end of the bitmap,
or a delta. The interpretation of the escape depends on the value of the
second byte of the pair, which must be in the range 0x00 through 0x02.
Following are the meanings of the escape values that can be used in the
second byte:
2nd byte
0
1
2
Meaning
End of line.
End of bitmap.
Delta. The two bytes following the escape contain
unsigned values indicating the horizontal and
vertical offsets of the next pixel from the current
position.
Absolute Mode
Absolute mode is signaled by the first byte in the pair being set to zero and
the second byte to a value between 0x03 and 0xFF. The second byte represents
the number of bytes that follow, each of which contains the color index of a
single pixel. Each run must be aligned on a word boundary. Following is an
example of an 8-bit RLE bitmap (the two-digit hexadecimal values in the
second column represent a color index for a single pixel):
Compressed data
03 04
05 06
00 03 45 56 67 00
02 78
00 02 05 01
02 78
00 00
09 1E
00 01
Expanded data
04 04 04
06 06 06 06 06
45 56 67
78 78
Move 5 right and 1 down
78 78
End of line
1E 1E 1E 1E 1E 1E 1E 1E 1E
End of RLE bitmap
Compression of 4-Bits-per-Pixel Bitmaps
When the biCompression member of the BITMAPINFOHEADER structure is set
to BI_RLE4, the DIB is compressed using a run-length encoded format for a
16-color bitmap. This format uses two modes: encoded mode and absolute
mode.
Encoded Mode
A unit of information in encoded mode consists of two bytes. The first byte
of the pair contains the number of pixels to be drawn using the color indexes
in the second byte.
The second byte contains two color indexes, one in its high-order nibble
(that is, its low-order 4 bits) and one in its low-order nibble.
The first pixel is drawn using the color specified by the high-order nibble,
the second is drawn using the color in the low-order nibble, the third is
drawn with the color in the high-order nibble, and so on, until all the
pixels specified by the first byte have been drawn.
The first byte of the pair can be set to zero to indicate an escape that
denotes the end of a line, the end of the bitmap, or a delta. The
interpretation of the escape depends on the value of the second byte of the
pair. In encoded mode, the second byte has a value in the range 0x00 through
0x02. The meaning of these values is the same as for a DIB with 8 bits per
pixel.
Absolute Mode
In absolute mode, the first byte contains zero, the second byte contains the
number of color indexes that follow, and subsequent bytes contain color
indexes in their high- and low-order nibbles, one color index for each pixel.
Each run must be aligned on a word boundary.
Following is an example of a 4-bit RLE bitmap (the one-digit hexadecimal
values in the second column represent a color index for a single pixel):
Compressed data
03 04
05 06
00 06 45 56 67 00
04 78
00 02 05 01
04 78
00 00
09 1E
00 01
Expanded data
0 4 0
0 6 0 6 0
4 5 5 6 6 7
7 8 7 8
Move 5 right and 1 down
7 8 7 8
End of line
1 E 1 E 1 E 1 E 1
End of RLE bitmap
Bitmap Example
The following example is a text dump of a 16-color bitmap (4 bits per pixel):
Win3DIBFile
BitmapFileHeader
Type
19778
Size
3118
Reserved1 0
[00000000]
[00000001]
[00000002]
[00000003]
[00000004]
[00000005]
[00000006]
[00000007]
[00000008]
[00000009]
[0000000A]
[0000000B]
[0000000C]
[0000000D]
[0000000E]
[0000000F]
Reserved2 0
OffsetBits 118
BitmapInfoHeader
Size
40
Width
80
Height
75
Planes
1
BitCount
4
Compression
0
SizeImage
3000
XPelsPerMeter
0
YPelsPerMeter
0
ColorsUsed
16
ColorsImportant 16
Win3ColorTable
Blue Green Red Unused
84
252
84
0
252
252
84
0
84
84
252 0
252
84
252 0
84
252
252 0
252
252
252 0
0
0
0
0
168
0
0
0
0
168
0
0
168
168
0
0
0
0
168 0
168
0
168 0
0
168
168 0
168
168
168 0
84
84
84
0
252
84
84
0
Image
.
.
.
Bitmap data
5.4 Working with Binary Files
Occasionally we need to access files or create files that cannot be read as text
files. For example, .BMP files contain RGB values that are one byte each and
are in the range 0..255. If we were to attempt to read these bytes as characters,
some of them are unprintable and others are text file format control characters
such as the end-of-line, carriage return or line feed. These values will force the
text file reader to skip over some of the file data. As an alternative we can open
and read binary files using sequential or stream I/O.
BMP File Reader
The following example Ada program open and read .BMP files (24 bit
color/uncompressed only) and .WAV files (mono-8 bit). There are provided as
examples but any other graphics or sound file formats can be read and/or
created using stream_io.
with ada.text_io, ada.integer_text_io, ada.short_integer_text_io,
adagraph, ada.short_short_integer_text_io, ada.streams.stream_io;
use ada.text_io, ada.integer_text_io, ada.short_integer_text_io,
adagraph, ada.short_short_integer_text_io, ada.streams.stream_io;
procedure bmp_reader is
f : ada.streams.stream_io.file_type;
s : stream_access;
fname : string(1..30);
fleng : integer;
chr : character;
filesize : integer;
reserved : short_integer;
offset : integer;
headersize : integer;
numcol, numrow : integer;
numplanes : short_integer;
bitsperpix : short_integer;
compression : integer;
bitmapsize : integer;
hres,vres : integer;
numcolors : integer;
sigcolors : integer;
r,g,b : short_short_integer;
dr,dg,db : integer;
scanlinepad : integer;
pad : short_short_integer;
color : extended_color_type;
begin
put("Enter bmp file name... ");
get_line(fname,fleng);
open(f,ada.streams.stream_io.in_file,fname(1..fleng));
s:=stream(f);
put("imagetype = ");
chr:=character'input(s); put(chr);
chr:=character'input(s); put(chr); new_line;
filesize:=integer'input(s);
put("filesize = "); put(filesize,0); new_line;
reserved:=short_integer'input(s);
reserved:=short_integer'input(s);
offset:=integer'input(s);
put("offset = "); put(offset,0); new_line;
headersize:=integer'input(s);
put("headersize = ");
put(headersize,0);
new_line;
numcol:=integer'input(s);
numrow:=integer'input(s);
put("image width = ");
put(numcol,0);
new_line;
put("image height= ");
put(numrow,0);
new_line;
numplanes:=short_integer'input(s);
put("number of image planes = ");
put(numplanes,0);
new_line;
bitsperpix:=short_integer'input(s);
put("bits per pixel = ");
put(bitsperpix,0);
new_line;
compression:=integer'input(s);
put("compression type = ");
put(compression,0);
new_line;
bitmapsize:=integer'input(s);
put("size of bitmap = ");
put(bitmapsize,0);
new_line;
hres:=integer'input(s);
vres:=integer'input(s);
put("horizontal resolution (pixels/meter) = ");
put(hres,0);
new_line;
put("vertical resolution (pixels/meter) = ");
put(vres,0);
new_line;
numcolors:=integer'input(s);
sigcolors:=integer'input(s);
put("number of colors used = ");
put(numcolors,0);
new_line;
put("number of significant colors = ");
put(sigcolors,0); new_line;
open_graph_window(numcol,numrow);
scanlinepad:=(numcol*3) mod 4;
for row in 1..numrow loop
for col in 1..numcol loop
b:=short_short_integer'input(s);
db:=integer(b) mod 256;
g:=short_short_integer'input(s);
dg:=integer(g) mod 256;
r:=short_short_integer'input(s);
dr:=integer(r) mod 256;
color:=closest_color(intensity(dr),intensity(dg),intensity(db));
put_pixel(col,row,color);
end loop;
for i in 1..scanlinepad loop
pad:=short_short_integer'input(s);
end loop;
end loop;
wait_for_key;
close_graph_window;
close(f);
end bmp_reader;
The images shown below are both JPGs but they demonstrate the difference
between the uncompressed .BMP and the image obtained using
adagraph_2000's closest_color( ) function.
Original Image
.
Image Rendered with Adagraph_2000
WAV File Reader
The program below reads mono-8 bits per sample .WAV files. Simple modifications are
possible that permit reading stereo and/or 16 bits per sample .WAV files using stream_io.
with ada.text_io, ada.integer_text_io,
ada.streams.stream_io, adagraph;
use ada.text_io, ada.integer_text_io,
ada.streams.stream_io, adagraph;
procedure wav_reader is
f : ada.streams.stream_io.file_type;
s : stream_access;
fname : string(1..30);
fleng : integer;
chr : character;
size : integer;
format_length : integer;
channel : short_integer;
samp_rate : integer;
bytes_per_sec : integer;
bytes_per_samp : short_integer;
bits_per_samp : short_integer;
data_leng : integer;
a_byte : short_short_integer;
delx : float;
begin
put("Enter name of file to read... ");
get_line(fname,fleng);
open(f,ada.streams.stream_io.in_file,fname(1..fleng));
s:=stream(f);
-- reads the characters "RIFF"
for i in 1..4 loop
chr:=character'input(s);
put(chr);
end loop;
new_line;
-- size of package to follow
size:=integer'input(s);
put("size = "); put(size,0);
new_line;
-- reads the characters "WAVE"
for i in 1..4 loop
chr:=character'input(s);
put(chr);
end loop;
new_line;
-- reads the characters "fmt_"
for i in 1..4 loop
chr:=character'input(s);
put(chr);
end loop;
new_line;
-- reads the length of format segment always 16
format_length:=integer'input(s);
put("format length = "); put(format_length,0);
new_line;
-- reads the 16 bit value 01
put("always = ");
put(integer(short_integer'input(s)),2);
new_line;
-- reads channel number
channel:=short_integer'input(s);
put("channel = "); put(integer(channel),0);
new_line;
-- reads sample rate
samp_rate:=integer'input(s);
put("sample rate (Hz) = "); put(samp_rate,0);
new_line;
-- reads bytes per second
bytes_per_sec:=integer'input(s);
put("bytes per second = "); put(bytes_per_sec,0);
new_line;
-- reads bytes per sample
bytes_per_samp:=short_integer'input(s);
put("bytes per sample = "); put(integer(bytes_per_samp),0);
new_line;
-- reads bites per sample
bits_per_samp:=short_integer'input(s);
put("bits per sample = "); put(integer(bits_per_samp),0);
new_line;
-- reads the characters "data"
for i in 1..4 loop
chr:=character'input(s);
put(chr);
end loop;
new_line;
-- reads data length in bytes
data_leng:=integer'input(s);
put("data length = "); put(data_leng,0);
new_line;
open_graph_window(600,300);
clear_window(blue);
delx:=500.0/float(data_leng);
goto_xy(50,150);
for i in 1..data_leng loop
a_byte:=short_short_integer'input(s);
draw_to(50+integer(float(i)*delx),integer(a_byte) mod 256,yellow);
if i mod 16 = 0 then
new_line;
end if;
end loop;
wait_for_key;
close_graph_window;
close(f);
end wav_reader;
This demo program reads a mono, 8 bits-per-sample WAV file and sketches the
amplitude of the samples as shown below.
This is from the WAV file called shot.wav provided as part of the MicroSoft Media
System files. Any mono 8 bits-per-sample WAV file can be read and displayed
but details of longer WAV files will be lost due to to limited resolution in the
adagraph graphics window. Alternatively the graphics section can be replaced
with the code segment below to generate multiple windows for displaying long
WAV files.
open_graph_window(600,300);
clear_window(blue);
goto_xy(50,150);
for i in 1..data_leng loop
a_byte:=short_short_integer'input(s);
if i mod 500 = 0 then
wait_for_key;
clear_window(blue);
goto_xy(50,120);
end if;
draw_to(50+(integer(float(i)) mod 500) ,
integer(a_byte) mod 256,yellow);
end loop;
wait_for_key;
close_graph_window;
5.5 An example Problem in Image Processing
Once we get past the problem of getting an image into our program we can apply
a wide variety of operations, functions and templates on the individual pixels or
groups of pixels in a practically unlimited variety of ways. But determining what
needs to be done to produce a particular result is not always clear. In this
example we are interested in analyzing leaf shapes on newly sprouted plants.
First the Hard Part
An image is comprised of an array of pixels each with a particular color defined
by varying intensities in the three primary colors: red, green and blue or RGB.
The RGB values can be provided for each pixel or the most popular RGB values
can be stored in a color lookup table and referred to by an index value. We will
deal with uncompressed .BMP files in which each pixel is defined with 3 bytes
(each 0..255), one for each primary color. Microsoft Windows Bitmaps include a
header describing a number of details about the image. The header file
information for a 16 x 16 pixel .BMP (uncompressed) image is shown below.
Enter bmp file name... samp1.bmp
imagetype = BM
filesize = 822
offset = 54
headersize = 40
image width = 16
image height= 16
number of image planes = 1
bits per pixel = 24
compression type = 0
size of bitmap = 768
horizontal resolution (pixels/meter) = 0
vertical resolution (pixels/meter) = 0
number of colors used = 0
number of significant colors = 0
The Ada program below uses adagraph_2000 written and distrubuted by Dr.
Martin Carlsile of the Air Force Academy and available in the useful files directory
of this Web Site. For anyone wishing to translate this code to C++ make note of
the sizes of the Ada data types used in this reader. character = one byte, integer
= 4 bytes, short_integer = 2 bytes, short_integer = 1 byte.
with ada.text_io, adagraph,ada.streams.stream_io;
use ada.text_io, adagraph,ada.streams.stream_io;
procedure bmp_reader_demo is
f : ada.streams.stream_io.file_type;
s : stream_access;
fname : string(1..30);
fleng : integer;
chr : character;
filesize : integer;
reserved : short_integer;
offset : integer;
headersize : integer;
numcol, numrow : integer;
numplanes : short_integer;
bitsperpix : short_integer;
compression : integer;
bitmapsize : integer;
hres,vres : integer;
numcolors : integer;
sigcolors : integer;
r,g,b : short_short_integer;
dr,dg,db : integer;
scanlinepad : integer;
pad : short_short_integer;
color : extended_color_type;
begin
put("Enter bmp file name... ");
get_line(fname,fleng);
open(f,ada.streams.stream_io.in_file,fname(1..fleng));
s:=stream(f);
chr:=character'input(s);
chr:=character'input(s);
filesize:=integer'input(s);
reserved:=short_integer'input(s);
reserved:=short_integer'input(s);
offset:=integer'input(s);
headersize:=integer'input(s);
numcol:=integer'input(s);
numrow:=integer'input(s);
numplanes:=short_integer'input(s);
bitsperpix:=short_integer'input(s);
compression:=integer'input(s);
bitmapsize:=integer'input(s);
hres:=integer'input(s);
vres:=integer'input(s);
numcolors:=integer'input(s);
sigcolors:=integer'input(s);
open_graph_window(numcol,numrow);
scanlinepad:=(numcol*3) mod 4;
for row in 1..numrow loop
for col in 1..numcol loop
b:=short_short_integer'input(s);
db:=integer(b) mod 256;
g:=short_short_integer'input(s);
dg:=integer(g) mod 256;
r:=short_short_integer'input(s);
dr:=integer(r) mod 256;
color:=closest_color(intensity(dr),intensity(dg),intensity(
db));
put_pixel(col,row,color);
end loop;
for i in 1..scanlinepad loop
pad:=short_short_integer'input(s);
end loop;
end loop;
wait_for_key;
close_graph_window;
close(f);
end bmp_reader_demo;
The code segment shown in blue above reads the bitmap in BGR order one byte
at a time. The mod operation converts from an 8 bit signed integer to an
unsigned integer in the range 0.255. In C++ you can specify an unsigned integer
type directly.
Another important detail in graphics file format for .BMP files is that every scan
line must be a multiple of 4 bytes. That is, it must be an integer number of 32 bit
words. When the RGB values (3 btyes) are not evenly divisible by 4 a pad of
between 1 and 3 bytes are added to each scan line. The code marked in red
above accounts for this.
Preliminary Analysis
Now that we can read an image into a program we need to analyze the images to
be processed to determine what can be done to separate the leaves from the
background or, in this case, the ground.
Shown above are a color and grayscale image of sprouting plants. From this we
can see (if we are not color blind) that color will play an important part in a
program to separate the plant leaves from the background. Before we can begin
building filters we need some understanding of the science of color.
White light is composed of all colors but our perception of light and color is very
limited in resolution. For example, we cannot distinguish between light of a
single frequency (say 580 nanometers) and light comprised of two frequencies of
equal intensity on either side of the single frequency in the electromagnetic
spectrum (say 680nm and 480nm). Do not be too discouraged because this fact
makes color photography, computer monitors and television CRTs possible.
Digitized images use varying intensities of the three primary colors red, green
and blue to produce color images. We can use these three values to create filters
to separate objects in the image by their colors. We will use the simple image
below as our test image for building color filters.
Since we are interested in separating leaves from other parts of the image lets
work on a green filter first. Since green is one of our primary colors we can start
by keeping all the pixels in which green is the dominant color.
if green>red and green>blue then
keep this pixel
else
lose this pixel
end if;
Applying this filter we obtain,
which is not too bad. However, we have captured gray areas as well as the
green. If we want to separate out the gray regions we will need to choose pixels
that are more green. In code we could require that the G value be greater than
the sum of the R and B values. In other words the object has to be REALLY
green.
if green>red+blue then
keep this pixel
else
lose this pixel
end if;
This is better but we are picking up pixels that are different shades of green,
yellow and blue. If we want to limit the accepted pixels to pure greens we need
to make sure that the other two colors are nearly equal, say within 20 percent of
each other.
if (green>red+blue) and abs(redblue)/(red+green+blue)<0.2 then
keep this pixel
else
lose this pixel
end if;
Now lets try our filter on a "real" image...
Durp! What's the problem? Maybe we are being a bit too strict on the shades of
green that we accept. Returning to the color dominance levels in which we only
required that green be larger that either red or blue may work better...
Success! At least for this particular image.
bmp_reader.adb
samp1.bmp
samp2.bmp
samp3.bmp
samp4.bmp
samp5.bmp
samp6.bmp
rawimage.bmp