Download Bioinformatics Research - Purdue University :: Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epigenetics of neurodegenerative diseases wikipedia , lookup

DNA vaccination wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

History of RNA biology wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Human genome wikipedia , lookup

Genome (book) wikipedia , lookup

Metagenomics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Expanded genetic code wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Primary transcript wikipedia , lookup

Genetic engineering wikipedia , lookup

Gene expression profiling wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome evolution wikipedia , lookup

Designer baby wikipedia , lookup

Gene wikipedia , lookup

RNA-Seq wikipedia , lookup

Non-coding DNA wikipedia , lookup

NEDD9 wikipedia , lookup

Genome editing wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Protein moonlighting wikipedia , lookup

Genetic code wikipedia , lookup

History of genetic engineering wikipedia , lookup

Microevolution wikipedia , lookup

Point mutation wikipedia , lookup

Genomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Bioinformatics
Research
Presented by : Amgad Madkour
Outline
z
z
z
What is Bioinformatics ? (Definition)
Some important terms
Major research areas in Bioinformatics
What is Bioinformatics ?
z
z
Bioinformatics or computational biology is the use of
techniques from applied mathematics, informatics,
statistics, and computer science to solve biological
problems .
The terms bioinformatics and computational biology are
often used interchangeably, although the latter typically
focuses on algorithm development and specific
computational methods.
National Institute of health
definition of the field
z
z
Bioinformatics: Research, development, or application of
computational tools and approaches for expanding the
use of biological, medical, behavioral or health data,
including those to acquire, store, organize, archive,
analyze, or visualize such data.
Computational Biology: The development and
application of data-analytical and theoretical methods,
mathematical modeling and computational simulation
techniques to the study of biological, behavioral, and
social systems.
Some Important Terms
z
z
z
z
z
z
z
Protein
Amino Acid
DNA
RNA
Chromosome
Gene Expression
Genetic Code
Protein
z
z
z
z
Is a complex, high-molecular-weight organic compound
that consists of amino acids joined by peptide bonds.
Many proteins are enzymes or subunits of enzymes,
catalyzing chemical reactions.
Other proteins play structural or mechanical roles, such
as those that form the struts and joints of the
cytoskeleton, serving as biological scaffolds for the
mechanical integrity and tissue signaling functions.
Other protein functions include immune response
Protein
Protein (Function)
z
z
z
z
Proteins are involved in practically every function
performed by a cell, including regulation of cellular
functions such as signal transduction and metabolism.
Life, chemically speaking, is nothing but the function of
proteins although the information to make a unique
protein resides in DNA.
The protein involved in functions control almost all the
molecular processes of the body.
Without such proteins, the activity requires a different set
of conditions, such as high temperature and pressure.
Protein (Types)
z
z
z
z
z
z
z
Enzymes, which are responsible for catalyzing the
thousands of chemical reactions of the living cell
Keratin, elastin, and collagen, which are important types
of structural, or support, proteins
Hemoglobin and other gas transport proteins
Ovalbumin, casein, and other nutrient molecules
Antibodies, which are molecules of the immune system
(see immunity)
Protein hormones, which regulate metabolism
Proteins that perform mechanical work, such as actin
and myosin, the contractile muscle proteins.
Amino Acids
z
z
z
z
z
Amino acids are the basic structural building units of proteins.
They form short polymer chains called peptides or
polypeptides which in turn form structures called proteins.
The process of such formation is known as translation, which
is part of protein synthesis.
Other amino acids contained in proteins are usually formed by
post-translational modification, which is modification after
translation in protein synthesis.
Twenty amino acids are encoded by the standard genetic
code and are called proteinogenic or standard amino acids.
At least two others are also coded by DNA in a non-standard
successfully
DNA
z
z
z
z
Deoxyribonucleic acid (DNA) is a nucleic acid —usually in
the form of a double helix— that contains the genetic
instructions specifying the biological development of all
cellular forms of life
Two bases which form a "rung of the DNA ladder." A DNA
nucleotide is made of a molecule of sugar, a molecule of
phosphoric acid, and a molecule called a base.
In DNA, the code letters are A, T, G, and C, which stand for
the chemicals adenine, thymine, guanine, and cytosine,
respectively.
In base pairing, adenine always pairs with thymine, and
guanine always pairs with cytosine.
DNA
RNA
z
z
z
z
Ribonucleic acid (RNA) is a nucleic acid polymer
consisting of covalently bound nucleotides.
RNA nucleotides contain ribose rings and uracil unlike
deoxyribonucleic acid (DNA), which contains
deoxyribose and thymine.
It is transcribed from DNA by enzymes called RNA
polymerases and further processed by other enzymes.
RNA serves as the template for translation of genes into
proteins, transferring amino acids to the ribosome to
form proteins, and also translating the transcript into
proteins.
Central Dogma
Chromosome
z
z
The DNA which carries genetic information in cells is
normally packaged in the form of one or more large
macromolecules called chromosomes.
If you were to stretch out all the DNA from one of your
cells, it would be over 3 feet (1 meter) long from end to
end! You can think of chromosomes as "DNA packages"
that enable all this DNA to fit in the nucleus of each cell.
Normally, we have 46 of these packages in each cell; we
received 23 from our mother and 23 from our father.
Chromosome Phases
Gene Expression
z
z
z
Gene expression, also called protein expression or
often simply expression is the process by which a
gene's DNA sequence is converted into the structures
and functions of a cell.
Gene expression is a multi-step process that begins with
transcription of DNA, which genes are made of, into
messenger RNA. It is then followed by post
transcriptional modification and translation into a gene
product, followed by folding, post-translational
modification and targeting.
The amount of protein that a cell expresses depends on
the tissue, the developmental stage of the organism and
the metabolic or physiologic state of the cell.
Genetic Code
z
z
z
The genetic code is a set of rules that maps DNA
sequences to proteins in the living cell, and is employed
in the process of protein synthesis. Nearly all living
things use the same genetic code, called the standard
genetic code, although a few organisms use minor
variations of the standard code.
The gene sequence inscribed in DNA, and in RNA, is
composed of tri-nucleotide units called codons, each
coding for a single amino acid.
There are 4^3=64 different codon combinations. For
example, the RNA sequence UUUAAACCC contains the
codons UUU, AAA and CCC, each of which specifies
one amino acid So, this RNA sequence represents a
protein sequence, three amino acids long
Major research areas in
Bioinformatics
z
z
z
z
z
z
z
z
z
Sequence analysis
Genome Annotation
Computational evolutionary biology
Measuring biodiversity
Gene expression analysis
Regulation Analysis
Protein Expression Analysis
Structure prediction
Comparative Genomics
Sequence Analysis
z
z
z
z
Data is analyzed to determine genes that code for
proteins, as well as regulatory sequences
A comparison of genes within a species or between
different species can show similarities between protein
functions, or relations between species
A variant , sequence alignment is used in the
sequencing process itself
Automatic search for genes and regulatory sequences
within a genome. Not all of the nucleotides within a
genome are genes. Within the genome of higher
organisms, large parts of the DNA do not serve any
obvious purpose.
Genome Annotation (GO)
z
z
Genome annotation is the process of attaching biological information
to sequences. It consists of two main steps. First, identifying
elements on the genome, and second attaching biological
information to these elements.
Structural annotation consists of identification of genomic elements.
z
z
z
z
gene structure
coding regions
location of regulatory motifs
Functional annotation consists of attaching biological information to
genomic elements.
z
z
z
z
biochemical function
biological function
involved regulation and interactions
expression
Computational evolutionary
biology
z
z
z
z
Trace the evolution of a large number of organisms by
measuring changes in their DNA, rather than through
physical taxonomy or physiological observations alone
More recently, compare entire genomes, which permits
the study of more complex evolutionary events, such as
gene duplication, lateral gene transfer, and the prediction
of bacterial speciation factors,
Build complex computational models of populations to
predict the outcome of the system over time
Track and share information on an increasingly large
number of species and organisms
Measuring Biodiversity
z
z
z
z
Databases are used to collect the species names, descriptions,
distributions, genetic information, status and size of populations,
habitat needs, and how each organism interacts with other species.
Specialized software programs are used to find, visualize, and
analyze the information
Computer simulations model such things as population dynamics, or
calculate the cumulative genetic health of a breeding pool (in
agriculture) or endangered population (in conservation)
One very exciting potential of this field is that entire DNA sequences,
or genomes of endangered species can be preserved, allowing the
results of Nature's genetic experiment to be remembered in silico,
and possibly reused in the future, even if that species is eventually
lost.
Gene Expression Analysis
z
z
The expression of many genes can be determined by
measuring mRNA levels with multiple techniques
including microarrays, expressed cDNA sequence tag
and so forth
All of these techniques are extremely noise-prone and/or
subject to bias in the biological measurement, and a
major research area in computational biology involves
developing statistical tools to separate signal from noise
in high-throughput gene expression (HT) studies.
Regulation analysis
z
z
z
Regulation is the complex orchestra of events starting
with an extra-cellular signal and ultimately leading to the
increase or decrease in the activity of one or more
protein molecules were Bioinformatics techniques have
been applied to explore various steps in this process.
promoter analysis involves the elucidation and study of
sequence motifs in the genomic region surround the
coding region of a gene
Expression data can be used to infer gene regulation
Protein expression analysis
z
z
z
Bioinformatics is very much involved in making sense of
protein microarray and HT MS data
The former involves a number of the same problems
involve in examining microarrays targeted at mRNA
The latter involves the problem of matching large
amounts of mass data against predicted masses from
protein sequence databases, and the complicated
statistical analysis of samples where multiple, but
incomplete, peptides from each protein are detected.
Structure prediction
z
z
z
z
One of the key ideas in bioinformatics research is the notion of
homology.
In the genomic branch of bioinformatics, homology is used to predict
the function of a gene: if the sequence of gene A, whose function is
known, is homologous to the sequence of gene B, whose function is
unknown, one could infer that B may share A's function.
In the structural branch of bioinformatics homology is used to
determine which parts of the protein are important in structure
formation and interaction with other proteins.
In a technique called homology modelling, this information is used to
predict the structure of a protein once the structure of a homologous
protein is known. This currently remains the only way to predict
protein structures reliably.
Structure Prediction (Example)
z
One example of this is the similar protein homology
between hemoglobin in humans and the hemoglobin in
legumes (leghemoglobin). Both serve the same purpose
of transporting oxygen in both organisms. Though both
of these proteins have completely different amino acid
sequences, their protein structures are virtually identical,
which reflects their near identical purposes.
Comparative genomics
z
z
z
The core of comparative genome analysis is the
establishment of the correspondence between genes
(orthology analysis) or other genomic features in
different organisms
It is these intergenomic maps that make it possible to
trace the evolutionary processes responsible for the
divergence of two genomes
Many of these studies are based on the homology
detection and protein families computation.
Special Thanks ..
z
z
I would like to thank my Father Prof. Dr. Magdy Madkour
for his biological background support to the subject
I would also like to thank my friend Ibrahim Imam for his
support in the biological background which aided me a
lot in understanding a lot of concepts
References
z
z
z
z
z
Wikipedia http://www.wikipedia.org
EMC
http://www.emc.maricopa.edu/faculty/farabee/BIOBK/Bio
BookPROTSYn.html
Fact Monster
http://www.factmonster.com/ce6/sci/A0860558.html
Molecular Biology of the Gene
(Watson,Hopkins,Roberts,Steiz,Weiner)
University Of Utah
http://gslc.genetics.utah.edu/units/disorders/karyotype/w
hatarechrom.cfm