Download Brouwer_791H_Proposal - University of New Hampshire

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA barcoding wikipedia , lookup

DNA sequencing wikipedia , lookup

Agarose gel electrophoresis wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

Molecular evolution wikipedia , lookup

Replisome wikipedia , lookup

Maurice Wilkins wikipedia , lookup

Nucleosome wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

Transformation (genetics) wikipedia , lookup

Molecular cloning wikipedia , lookup

Non-coding DNA wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Community fingerprinting wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Transcript
791H Senior Project Proposal
Image Filtering and Enhancement of Scanning
Transmission Electron Microscope Images
Submitted for Review to:
Dr. Tom Miller
Submitted by:
Nathan P. Brouwer
University of New Hampshire
College of Engineering and Physical Sciences
Department of Electrical and Computer Engineering
55 Edgewood St.
Durham, New Hampshire 03824
Created: October 10, 2010
REVISED: October 25, 2010
REV: Final
1
Table of Contents
Table of Contents .......................................................................................................... 2
1 Abstract...................................................................................................................... 3
2 Project History and Definition ................................................................................... 3
2.1
Background........................................................................................... 3
2.2
Problem................................................................................................. 6
2.3
Project Objective................................................................................... 7
3 Methodology .............................................................................................................. 7
3.1
Three Phase Iterative Approach .......................................................... 7
4 Significance/Implications......................................................................................... 11
5 Personal Outcome .................................................................................................. 11
6 Location ................................................................................................................... 12
7 Preparation/Experience .......................................................................................... 12
8 Time Table .............................................................................................................. 13
9 Appendices.............................................................................................................. 14
9.1
Timeline for Project ............................................................................. 14
9.2
Budget Explanation ............................................................................ 14
9.3
References.......................................................................................... 15
2
1
Abstract
ZSGenetics uses a scanning transmission electron microscope (STEM) to
perform the direct imaging of Deoxyribonucleic acids (DNA) for research
purposes. Because of the high magnification and the way images are formed,
current images from this process are unclear and difficult to analyze directly. The
proposed solution is to construct and implement a variety of image processing
algorithms to improve and enhance the quality of DNA images and better enable
the extraction of information that can be utilized by the scientists at ZSGenetics.
The project will result in a graphical user interface (GUI) that can be used by
researchers to process these images and make analyses much quicker and more
accurate. This is novel work in an emerging engineering field with great potential
for publication at its conclusion.
2
2.1
Project History and Definition
Background
Since the invention of the electron microscope (EM), scientists have dreamed of
using it to determine the sequence of DNA that is essential to understanding its
role as the “code of life”. DNA is the genetic information necessary for the
development and functioning of all living organisms. DNA is synthesized by the
body as two long polymers of simple repeating units called nucleotides that are
attached by hydrogen bonds to form a double stranded helix. Each nucleotide
consists of a nitrogenous base, a simple 5-carbon sugar called deoxyribose, and
a phosphate group (PO42-). There are four potential nitrogenous bases in DNA;
adenine, guanine, cytosine, and thymine. It is the pattern of these four bases that
3
determine the identity, features, and all biological processes of the organism by
encoding for the amino acid sequence of every protein in the body. It is also this
pattern that is determined during sequencing using the STEM technique
(Robinson). The full sequence of these bases is unique to the individual and is
the true “fingerprint” for organisms that can provide insight into its characteristics
and functional capabilities.
Significant work has been done to understand and sequence DNA, but there are
still many mysteries associated with this process and the molecule itself. A
deeper understanding could be useful to cure diseases and other genetic defects,
as well as further other areas of research such as cloning. Using a variety of
costly and time intensive techniques, scientists have discovered properties of the
structure and how to sequence DNA, without directly viewing a sample at the
atomic level. If there were a way to image DNA, it would be possible to use the
images to sequence the DNA without many of the painstaking processes that are
currently used and with much greater accuracy.
In the past, scientists and engineers have faced numerous difficulties in the direct
imaging of DNA using electron microscopes which are the only instruments with a
strong enough magnification. The two main limiting factors currently impeding this
technique are high resolution at high magnification and contrast of the resulting
image at this high magnification. In the last several years, electron microscopes
have made enormous technological advances that have increased their
performance, namely by increasing the possible resolution to under .08nm. This
new advancement allows scientists and engineers to view the building blocks of
even the smallest particles. Since the average distance between base pairs is
4
.34nm, operating at close to ideal conditions, new electron microscopes have
overcome the problem of resolution (Bell).
The scanning transmission electron microscope (STEM) is a variety of EM that
has recently been able to achieve such magnifications to produce the necessary
resolution to view and sequence DNA. A STEM works by accelerating a high
powered ionized beam of concentrated electrons down through a sample to a
highly sensitive camera that will record the scattering of those electrons. The
beam of electrons is raster scanned, line by line, across the sample and the
camera records the intensity of the energy at every position of the beam, resulting
in a two dimensional grid where each pixel is assigned an intensity value. When
the beam is shot through a sample, the electrons are negatively charged and will
therefore be deflected mainly by the magnetic forces caused by the dense
positively charged nuclei in the sample. Due to particle wave duality, an excited
electron acts like a wave, which reasons that collisions with electrons in the
electron cloud of the sample to be negligible. Deflected electrons will not be
detected by the sensor at the pixel the beam was shot from. The larger atoms will
incur a greater number of deflections, and therefore a smaller intensity. On a
graph of intensity values, large atoms will appear as dark spots and smaller
atoms will not be distinguishable due to the inevitable scatter noise.
Once the problem of resolution is overcome, the problem of contrast still presents
itself. DNA is a very “light” molecule on the atomic scale, meaning the atoms have
relatively low atomic numbers, and therefore, small nuclei. The main elements
that make up DNA include; Hydrogen, Carbon, Nitrogen, Oxygen, and
Phosphorus, with an average atomic number of about 5.5. Simply, the sizes of
5
the nuclei are not large enough to cause a significant number of collisions to
detect a perceptible difference.
ZSGenetics is a biomedical company in Danvers, Massachusetts, working on
imaging DNA with a STEM. They have devised a patented method to bind certain
“heavier” atoms to distinct nucleotide pairs. These are called marker atoms
because they have a large enough nucleus to be recognized and distinguished
from the lighter background atoms by a sensor. If a large marker atom is bound to
a specific nucleotide pair, it is possible to tell the exact positions in a given DNA
sample where that nucleotide pair exists. This new phenomenon gives rise to a
new possibility for sequencing DNA through an image.
2.2
Problem
The problem the scientists are facing is that the pictures are very difficult to
analyze because there is a large amount of cluttering information, or noise, that
interferes with the ability to detect these DNA strands. The camera records the
image on a grey scale with 256 shades of grey. Since the average human can
only detect a few dozen shades of grey with the naked eye, it is severely difficult
to accurately analyze these images. Due to this human limitation, these
unprocessed images are virtually useless. With the power of computers and the
advances in digital image processing, it is possible to gather improved data from
the images that can prove useful for human interpretation.
A large problem that is anticipated is inevitable noise distortion. When dealing
with samples at the microscopic level, there is bound to be cluttering noise that
6
will interfere with the actual sample. This is a scatter problem that will be a major
challenge to overcome.
2.3
Project Objective
The project goal is to provide a solution to the DNA imaging problem by using
image processing algorithms and filters to extract information and improve the
images to a point where they can be useful to the scientists at ZSGenetics.
Through a variety of algorithms, it is possible to overcome the scatter problem of
noise, detect marker atoms, and calculate the distance between markers to
determine the number of non-marked base pairs between markers.
This project has the potential to turn into higher-level graduate work as a masters
or even PhD
project that may
lead to common practice in industry of
sequencing DNA automatically through imaging. This could have monumental
effects on the medical community, by enhancing DNA research that searches to
find cures for genetic diseases.
3 Methodology
3.1
Three Phase Iterative Approach
This project will reach its objective using an iterative three-phase approach. The
preliminary phase will consist of data definition and collection. This will include
travelling to Danvers and Cambridge Massachusetts to receive additional data
sets of still images and video sequences of
DNA from ZSGenetics. I will
personally be receiving certified training on how to safely use and operate the
electron microscope at Harvard University. There will be meetings to learn from
the scientists exactly what they are looking for and how they may want the image
7
enhanced in order to better comprehend the data set. Phase I will end with the
compilation of pertinent data sets with a clear idea of what algorithms might
produce the desired results.
Some preliminary images from the ZSGenetics and Harvard STEMs have
already been received. Examples of two types of raw images and the results
from basic enhancements are shown below:
Raw DNA Strand Image
Enhanced DNA Image
The enhanced image above is done using a very rudimentary algorithm called
color space mapping (Gonzalez). The algorithms to be developed and applied by
8
this work will be significantly more complex and, hopefully, more revealing of the
DNA structure.
Raw DNA strand (Dark Field Imaging)
The image above was taken using a new technique called dark field imaging.
This technique does not utilize the STEM camera, instead a single concentrated
beam of excited electrons is shot through the sample. There is a small metal
donut shaped ring that is hit by all the deflected electrons. When the excited
electrons hit the ring, it induces a current proportional to the amount of electrons
deflected. The result is a map of two dimensional positions versus current
(intensity), in which the brightest spots mark the largest atoms and will be the
focus of our attention. This new technique is useful because it provides greater
contrast compared to bright field imaging.
There are a variety of algorithms and extraction techniques to reduce noise and
produce an image that is more coherent for finding the marker atoms. Some
basic techniques to attempt this include; thresholding, pseudo color mapping, and
stretching the magnitude to a logarithmic scale. There are also some higher level
algorithms that may be useful, such as, filtering through time and the maximum
9
entropy method, which estimates the probabilistic noise based on an array of
constraints.
Once the exact nature of the images that need to be improved is understood, the
best combination of image processing algorithms will be determined and applied.
Phase II will be primarily the application of any algorithms identified in Phase I to
the data set and then modifying them with feedback from the experts at
ZSGenetics.
Phase III will be sending our processed images using the current combination of
algorithms back to ZSGenetics for additional feedback. They will evaluate how
successful the attempts were and offer suggestions of what needs to be done to
the images for even better clarity. The iterative part of this project will be using the
feedback from ZSGenetics to go back to the drawing board in order to further
improve our process. A contact from ZSGenetics has already agreed to be
involved in open communication with myself and Professor Messner’s laboratory,
which is a necessity for the project.
Figure 1 below shows graphically how this phased approach will flow.
Research and
Data Collection
Algorithms and
Testing
Evaluation and
Feedback of
Algorithms by
ZSGenetics
Create Graphical
User Interface for
use after project
completion
Final Report and
publication
Figure 1: Project Flow
The end result will be a set of tuned image processing routines and a graphical
user interface (GUI) able to be used by scientists and engineers
for DNA
10
research. We expect that our end results will be publishable and expect to submit
our finding to an appropriate journal for publication.
NOTE: The work done will be with the images and videos of DNA, but not
the DNA itself. This project will have no interaction with any genetic
material.
4 Significance/Implications
This work on imaging DNA in order to identify the specific DNA sequence in a
sample via a scanning electron microscope has never been done before. If the
project is successful, it will provide scientists and medical researchers with a
method to extract information directly from the images of DNA. The publication
will be an intellectual contribution, which could have profound practical
implications. This project may prove to be a direct aid to medical science, allowing
the identification of DNA sequences much more precisely and efficiently.
5 Personal Outcome
This project will hopefully result in publishable material that will be submitted to an
appropriate journal. Such a publication at this stage in my career will help in my
desire to perform graduate work. This project will dramatically increase my
background in image processing and will lead to an interesting and major-related
topic for a senior project. One major goal of this project is to set up a path for
graduate school, by continuing this research after the completion of my
undergraduate career. Additionally for personal interest, I hope to learn more
about the DNA structure and how DNA is analyzed from the researchers at
11
ZSGenetics to gain experience that will make me a better candidate for jobs in
multiple fields of engineering and science.
6 Location
The principle location of the project work will be in Professor Messner’s Image
Processing lab in Kingsbury S326, on the University of New Hampshire campus.
Periodic trips to Danvers, MA and Cambridge, MA to exchange data and to get
feedback with our work is essential for this project. ZSGenetics has a partner
program at Harvard University from which we may be using additional data sets.
Personal cars will be used and gas has been provided for in the proposed
budget. ZSGenetics has already agreed to work cooperatively with us on this
project as described above.
7 Preparation/Experience
I have research experience that has prepared me for this senior project. Last
summer, I participated in a 10-week undergraduate research program at
Colorado State University in Fort Collins, Colorado. I worked on simulating radar
data from the CHILL radar system in MATLAB. Here I gained exposure to the
research environment and process. I have also taken related classes: ECE 633H
and ECE 634 (Signals and Systems 1 and 2), which has given me an essential
background on the topic to be researched. Also, I am currently enrolled in
ECE714 (Digital Signal Processing), which provides the fundamentals of one
dimensional processing.
12
To further my knowledge, I will be following along with the senior-level digital
image processing course on two-dimensional processing, taught by professor
Messner. All of the lecture slides are online, so I will review them weekly and
occasionally meet with Professor Messner to discuss topics to ensure
understanding.
8 Time Table
Observe figure 2 under attachments for a Gantt chart that describes the timeline.
The timetable for the project details the September until mid-April time frame.
September and October have been mainly preliminary researching image
processing algorithms and data collection. There is electron microscope training
at Harvard University and imaging of DNA samples scheduled for November.
November will begin the first session of algorithm implementation to see what is
successful. In December we will meet again with ZSGenetics for feedback and to
evaluate that the work being done is correct and useful.
The start of the second semester is reserved for continued improvement of
algorithms to solidify the best possible approach. By mid-February, we hope to be
finishing up the algorithm testing and begin creating a graphical user interface to
provide a way for ZSGenetics to use the algorithms in a standardized manner.
Nearly a month is designated at the end of the semester for last minute
alterations and, most importantly, the final report and publication of this research.
In April, there is an Undergraduate Research Conference where this research will
be presented.
13
9 Appendices
9.1
Timeline for Project
Figure 2: Gantt chart timeline
9.2
Budget
Supplies
Travel
Other Expenses
Total
Paper
Flash Drives
Durham-Danvers
Durham-Cambridge
Photo Copies
Color Printing
8 GB
96 mi RT
138 mi RT
2 Reams
$35.98
2
$39.98
2 Trips
$48.00
3 Trips
$103.50
250
$25
$252.46
Note: SURF grant has awarded $150 for budget
9.3
Budget Explanation
A. Paper – This will cover the actual paper used for printing and calculations, as well
as the cost of color printing. Any cost above the budgeted amount will be covered
by the ECE department.
B. Flash Drives- It is necessary to find an easy and universal way to transfer and
store images. It will be much simpler to transport the images that will be much too
large to send over email.
C. Travel- It will be necessary for training and data collection in both Danvers, MA
and Cambridge, MA.
D. Photocopies- It will be necessary to reproduce many of the images created. Any
cost above the budgeted amount will be covered by the ECE department.
14
9.4
References
Bell, David C., Murtagh, Katelyn M., Dionne, Cheryl A., Glover, William R. Glover.
Direct observation of single-atom DNA labels with annular dark-field electron
microscopy. Submitted to Nature (2010).
Gonzalez, Rafael C., Richard E. Woods. Digital Image Processing. Upper Saddle
River, N.J.: Prentice Hall, 2008.
Nakanishi, Nobuto. Kotaka, Tasutoshi. Yamazaki, Takashi. An expanded approach
to noise reduction from high-resolution STEM images based on the maximum
entropy method. Ultramicroscopy 106 (2006) 233-239.
Robinson, Richard. DNA Structure and Function, History. Genetics (2003).
15