Download Lecture_note_463BI

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genealogical DNA test wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Epigenomics wikipedia , lookup

Minimal genome wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Molecular cloning wikipedia , lookup

Transfer RNA wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

DNA vaccination wikipedia , lookup

Transposable element wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene expression profiling wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Deoxyribozyme wikipedia , lookup

SNP genotyping wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Primary transcript wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Human genetic variation wikipedia , lookup

Genetic engineering wikipedia , lookup

Public health genomics wikipedia , lookup

DNA barcoding wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genome (book) wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Pathogenomics wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Human Genome Project wikipedia , lookup

Computational phylogenetics wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene wikipedia , lookup

Genomic library wikipedia , lookup

Designer baby wikipedia , lookup

Point mutation wikipedia , lookup

Microsatellite wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome evolution wikipedia , lookup

History of genetic engineering wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Human genome wikipedia , lookup

Non-coding DNA wikipedia , lookup

Microevolution wikipedia , lookup

Metagenomics wikipedia , lookup

Genome editing wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
BCH463 Bioinformatics
Md. Ashrafuzzaman, D.Sc.
Known as: Dr. Ashraf
Email: [email protected]
Emergency contact cell: 0564174931
Office: 2B10, Bldg # 5, KSU
Bioinformatics
Bio-Informatics
Management of the biological information using computer technology.
Biological informations?
Huge!
What kind of info?
(structure and mechanism)
•
•
•
•
•
•
Discovered aspects related to biology
Literature search using various routes
Data bank exploration from different international sources
Biological network data
Biological structure data
Data that will help understand the working mechanisms of
biological systems
• etc.
Searching Data
•
•
•
•
•
Why searching?
How to search?
Where to search?
What is usually done with searched data?
Who should be a Bioinformatician?
A case study
•
•
•
Bioinformatic-driven search for metabolic biomarkers in disease
http://www.jclinbioinformatics.com/content/1/1/2
The search and validation of novel disease biomarkers requires the complementary power of professional study planning and
execution, modern profiling technologies and related bioinformatics tools for data analysis and interpretation. Biomarkers have
considerable impact on the care of patients and are urgently needed for advancing diagnostics, prognostics and treatment of
disease. This survey article highlights emerging bioinformatics methods for biomarker discovery in clinical metabolomics,
focusing on the problem of data preprocessing and consolidation, the data-driven search, verification, prioritization and
biological interpretation of putative metabolic candidate biomarkers in disease. In particular, data mining tools suitable for the
application to omic data gathered from most frequently-used type of experimental designs, such as case-control or
longitudinal biomarker cohort studies, are reviewed and case examples of selected discovery steps are delineated in more
detail. This review demonstrates that clinical bioinformatics has evolved into an essential element of biomarker discovery,
translating new innovations and successes in profiling technologies and bioinformatics to clinical application.
Data sequencing-GeneBank
What is GeneBank?
GenBank® is the National Institute
of Health (NIH) genetic
sequence database, an
annotated collection of all
publicly available DNA
sequences.
GenBank is part of the International Nucleotide Sequence
Database Collaboration, which comprises the DNA DataBank of
Japan (DDBJ), the European Molecular Biology Laboratory
(EMBL), and GenBank at National Center for Biotechnology
Information (NCBI). These three organizations exchange data on
a daily basis.
As of 2008, there are approximately
100 billion bases in
100 million sequences
Consider the growth rate!
Started in 1982 with 680,338 base
pairs in 606 sequences
How GeneBank works
Submissions to GenBank
• Many journals require submission of sequence information to a database prior to
publication so that an accession number may appear in the paper. Sequin, NCBI's
stand-alone submission software for MAC, PC, and UNIX platforms, is available. When
using Sequin, the output files for direct submission should be sent to GenBank by
electronic mail.
Updating or Revising a Sequence
• Revisions or updates to GenBank entries can be made at any time and can be
accepted as BankIt or Sequin files or as the text of an e-mail message.
Access to GenBank
• GenBank is available for searching at NCBI via several methods.
• The GenBank database is designed to provide and encourage access within the
scientific community to the most up to date and comprehensive DNA sequence
information. Therefore, NCBI places no restrictions on the use or distribution of the
GenBank data. However, some submitters may claim patent, copyright, or other
intellectual property rights in all or a portion of the data they have submitted.
New Developments
• NCBI is continuously developing new tools and enhancing existing ones to improve
both submission and access to GenBank. The easiest way to keep abreast of these
and other developments is to check the "What's New" section of the NCBI Web page
and to read the NCBI News, which is also available by free subscription.
Various bases of Bioinformatics
• Count Bases at the Fraunhofer IGB, Germany
This system basically consists of modules that cover sequence analysis (Count
Bases – Next-Gen Sequence Assistant), statistics as well as visualization (Count
Bases Viewer)
In a single run, 106–109 DNA fragments with an average sequence length of 30–800
bases are simultaneously sequenced. This results in huge amounts of data that
require a storage volume of up to 10–100 gigabyte.
Sources: Genome and proteomic data bases
Major rersearch areas
Sequence analysis
Genome annotation
Literature
Analysis of gene expression, regulation
Analysis of protein expression
Mutations in cancer,
Etc.
Organisms in GeneBank
• 260,000 different species
• 1000 new species being added per month
• Human (Homo sapiens):
11,551,000 entries with 13,149,000,000 bases
• Mouse (Mus musculus):
7,256,000 entries with 8,361,230,000 bases
are top two species
GeneBank Format
GenBank format (GenBank Flat File Format) consists of an annotation
section and a sequence section.
Annotation section
The start of the annotation section is marked by a line beginning with
the word "LOCUS".
The only rule now applied in assigning a locus name is that it must be
unique
Sequence section
The start of sequence section is marked by a line beginning with the
word "ORIGIN" and the end of the section is marked by a line with
only "//“.
GeneBank Flat File Format
LOCUS AF068625 200 bp mRNA linear ROD 06-DEC-1999
DEFINITION
Mus musculus DNA cytosine-5 methyltransferase 3A (Dnmt3a) mRNA,
complete cds.
ACCESSION
AF068625 REGION: 1..200
VERSION
AF068625.2 GI:6449467
KEYWORDS .
SOURCE
Mus musculus (house mouse)
ORGANISM
Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi;
Muroidea; Muridae; Murinae; Mus.
REFERENCE1
(bases 1 to 200) , AUTHORS, TITLE, JOURNAL, etc.
REFERENCE2
(bases 1 to 200) , AUTHORS, TITLE, JOURNAL, etc.
REMARK Sequence update by submitter
COMMENT On Nov 18, 1999 this sequence version replaced gi:3327977.
FEATURES Location/Qualifiers
source 1..200 /organism="Mus musculus" /mol_type="mRNA" /db_xref="taxon:10090"
/chromosome="12" /map="4.0 cM"
gene 1..>200 /gene="Dnmt3a"
ORIGIN 1 gaattccggc ctgctgccgg gccgcccgac ccgccgggcc acacggcaga gccgcctgaa 61
gcccagcgct gaggctgcac ttttccgagg gcttgacatc agggtctatg tttaagtctt 121 agctcttgct
tacaaagacc acggcaattc cttctctgaa gccctcgcag ccccacagcg 181 ccctcgcagc cccagcctgc //
GenBank sequence format
It’s a rich format for storing sequences and associated annotations. It shares a feature
table vocabulary and format with the EMBL and DDJB formats.
•
•
•
•
•
•
•
•
•
•
•
•
•
•
LOCUS
CAA89576
109 aa linear PLN 11-AUG-1997
DEFINITION
CYC1 [Saccharomyces cerevisiae].
ACCESSION CAA89576
VERSION
CAA89576.1 GI:1015707
DBSOURCE
embl locus SCYJR048W, accession Z49548.1
KEYWORDS 5-10 or as many as needed
SOURCE
Saccharomyces cerevisiae (baker's yeast)
ORGANISM
Saccharomyces cerevisiae Eukaryota; Fungi; Ascomycota; Saccharomycotina;
Saccharomycetes; Saccharomycetales; Saccharomycetaceae; Saccharomyces.
REFERENCE1 (residues 1 to 109) , AUTHORS, TITLE, JOURNAL, etc.
REFERENCE2 (residues 1 to 109) , AUTHORS, TITLE, JOURNAL, etc.
FEATURES
Location/Qualifiers
source
1..109 /organism="Saccharomyces cerevisiae" /db_xref="taxon:4932"
/chromosome="X" Protein 1..109 /name="CYC1"
CDS 1..109 /gene="CYC1" /coded_by="Z49548.1:954..1283" /note="ORF YJR048w"
/db_xref="GOA:P00044" /db_xref="SGD:S0003809" /db_xref="UniProtKB/Swiss-Prot:P00044"
ORIGIN
1 mtefkagsak kgatlfktrc lqchtvekgg phkvgpnlhg ifgrhsgqae gysytdanik 61
knvlwdennm seyltnpkky ipgtkmafgg lkkekdrndl itylkkace //
Online Mendelian Inheritance in Man (OMIM) Database
•
•
•
•
•
•
•
OMIM (since 1960s) catalogues all the known diseases with a genetic component
and tries to link them to the relevant genes in human genome.
In 2004 there were 15,000 records.
One can request to download the mim2gene.txt file from OMIM here:
http://www.omim.org/downloads
The OMIM code
Every disease and gene is assigned a six digit number of which the first number classifies
the method of inheritance.
If the initial digit is 1, the trait is deemed autosomal dominant; if 2, autosomal recessive; if 3,
X-linked. Wherever a trait defined in this dictionary has a MIM number, the number from the
12th edition of MIM, is given in square brackets with or without an asterisk (asterisks
indicate that the mode of inheritance is known; a number symbol (#) before an entry number
means that the phenotype can be caused by mutation in any of two or more genes) as
appropriate e.g., Pelizaeus-Merzbacher disease [MIM #312080] is an X-linked recessive
disorder.
For further studies visit http://www.omim.org
OMIM
Example: http://www.omim.org/entry/189911
*189911 TRANSFER RNA GLYCINE 1; TRNAG1
Alternative titles; symbols TRANSFER RNA GLYCINE-CCC-1; TRG1
Cytogenetic location: Chr.16 Genomic coordinates (GRCh37): 16:0 - 90,354,753 (from NCBI)
TEXT
Mapping McBride et al. (1989) assigned a glycine tRNA(CCC) gene (TRG1) to human chromosome 1 (1pter-p34)
on the basis of Southern analysis of a panel of hybrid cell DNAs. They also assigned a cloned DNA fragment
encompassing a glycine tRNA gene (tRNA-GCC) and pseudogene to human chromosome 16 by the same
method.
Evolution There are about 1,300 tRNA genes in the haploid human genome (Hatlen and Attardi, 1971) encoding
60 to 90 tRNA isoacceptors (Lin and Agris, 1980). The studies by McBride et al. (1989) as well as studies by
others (see, e.g., 180620, 189930, 189920, 180640, 189880) indicated that tRNA genes and pseudogenes
are dispersed on at least 7 human chromosomes and suggested that these sequences would probably be
found on most if not all human chromosomes. McBride et al. (1989) described short, 8-12 nucleotide, direct
terminal repeats flanking many of the dispersed tRNA genes. This finding, combined with the dispersion of
tRNA genes, suggests that many of these genes may have arisen by an RNA-mediated retroposition
mechanism. There may have been selection for reiteration of genes encoding isoaccepting tRNAs, since a
single mutation in a single-copy tRNA gene could be devastating. Moreover, even a mutation in the anticodon
of a single tRNA gene might not be crucial if competition was provided by the normal 'wildtype' tRNA
isoacceptor produced by multiple copies of the normal tRNA gene still present in the genome. Dispersion of
multiple copies of each tRNA gene could provide diversity of 5-prime-flanking sequences, which are known to
modulate the expression of some human tRNA genes. Tissue-specific or differentiation-specific expression of
tRNA isoacceptors might be provided for by this mechanism. The recombination and unequal crossingover
that can occur with tandem tRNA sequences can result in homogenization of the sequences with disastrous
consequences.
Nucleotide Database
•
•
•
•
•
•
•
•
•
•
•
•
•
•
NUCLEOTIDE DATABASES
NCBI's sequence databases accept genome data from sequencing projects from around
the world and serve as the cornerstone of bioinformatics research.
GenBank:
An annotated collection of all publicly available nucleotide and amino acid sequences.
EST database:
A collection of expressed sequence tags, or short, single-pass sequence reads from mRNA
(cDNA).
GSS database:
A database of genome survey sequences, or short, single-pass genomic sequences.
HomoloGene:
A gene homology tool that compares nucleotide sequences between pairs of organisms in order
to identify putative orthologs.
HTG database:
A collection of high-throughput genome sequences from large-scale genome sequencing
centers, including unfinished and finished sequences.
SNPs database:
A central repository for both single-base nucleotide substitutions and short deletion and
insertion polymorphisms.
Nucleotide Database
•
•
•
•
•
•
•
•
RefSeq:
A database of non-redundant reference sequences standards, including genomic
DNA contigs, mRNAs, and proteins for known genes. Multiple collaborations, both
within NCBI and with external groups, support our data-gathering efforts.
STS database:
A database of sequence tagged sites, or short sequences that are operationally
unique in the genome.
UniSTS:
A unified, non-redundant view of sequence tagged sites (STSs).
UniGene:
A collection of ESTs and full-length mRNA sequences organized into clusters, each
representing a unique known or putative human gene annotated with mapping and
expression information and cross-references to other sources.
UniGene computationally identifies transcripts from the same locus; analyzes
expression by tissue, age, and health status; and reports related proteins (protEST)
and clone resources.
Single Nucleotide Polymorphism (SNP) database
What it is?
The SNP Database (also known as dbSNP) is an archive for genetic variation within and
across different species developed and hosted by NCBI in collaboration with the National
Human Genome Research Institute (NHGRI).
Polymorphism in biology occurs when two or more clearly different phenotypes exist in
the same population of a species: related to biodiversity, genetic variation and adaptation
-The dbSNP accepts apparently neutral polymorphisms, polymorphisms corresponding to
known phenotypes, and regions of no variation.
-It was created in September 1998 to supplement GenBank (NCBI’s nucleic acid and protein
sequences)
Goal
Its goal is to act as a single database that contains all identified genetic variation, which
can be used to investigate a wide variety of genetically based natural phenomenon.
Specifically, access to the molecular variation cataloged within dbSNP aids basic
research such as physical mapping, population genetics, investigations into evolutionary
relationships, as well as being able to quickly and easily quantify the amount of variation
at a given site of interest.
Application
Applied research, genetic engineering, drug discovery, etc.
Submitting
Every submitted variation receives a submitted SNP ID number (“ss#”).This accession number is
a stable and unique identifier for that submission.
Unique submitted SNP records also receive a reference SNP ID number (“rs#”; "refSNP cluster").
Section Types for Submissions to dbSNP
Contact
TYPE: CONT
HANDLE:EGREEN
NAME: Eric Green
EMAIL: [email protected]
LAB: Biophysics laboratory
INST: King Saud University
ADDR: PO Box 2455, Riyadh 11451, Kingdom of Saudi Arabia
Publication section
TYPE: PUB
HANDLE: EGREEN
MEDUID: Medline unique identifier. Not obligatory
TITLE: Human chromosome 7 STS
AUTHORS: Ashrafuzzaman,M.
YEAR: 2012
STATUS: 1 (unpublished) / 2 (submitted) / 3 (in press) / 4 (published)
Population class
TYPE:POPULATION
HANDLE:WHOEVER
ID:YOUR_POP
POP_CLASS: EUROPE
POPULATION: Continent:Europe
Nation: Some Nation
Phenotype: You name it
How to Submit
To submit variations to dbSNP, one must first acquire a submitter handle, which identifies the laboratory
responsible for the submission. Next, the author is required to complete a submission file containing the relevant
information and data. Submitted records must contain the ten essential pieces of information listed in the following
table.Other information required for submissions includes contact information, publication information (title,
journal, authors, year), molecule type (genomic DNA, cDNA, mitochondrial DNA, chloroplast DNA), and organism.
A sample submission sheet can be found at:
(http://www.ncbi.nlm.nih.gov/SNP/get_html.cgi?whichHtml=how_to_submit#SECTION_TYPES)
Element
Explanation
Flanking DNA (region of DNA that is not transcribed to RNA,
region of DNA adjacent to 5’ end of the gene)
Variations from assays must have 25 bp of flanking sequence on
either side of the polymorphism and must be 100 bp overall.
Alleles
Alleles must be defined using A, G, C, or T nomenclature; IUPAC
nomenclature will only be accepted in flanking regions. See:
http://www.ncbi.nlm.nih.gov/sites/entrez?db=snp
Method
A description of how the variation was detected (e.g. DNA
sequencing) or how the allele frequencies were calculated. A
table of method classes is provided.
Population
A description of the initial group from which the variation was
found or from which the allele frequency was calculated. A table
of population classes is provided.
Sample size
The number of chromosomes used to find the variation and the
number of chromosomes used to calculate allele frequencies.
Population-specific allele frequency
The allele frequency of the surveyed population.
Population-specific genotype frequency
The genotype frequency of the surveyed population.
Population-specific heterozygosity
The proportion of individuals who are heterozygous for the
variation.
Individual genotypes
The genotype of individuals from the study.
Validation information
The validation status lists the categories of evidence supporting
the variation.
Example of SNP submission
View SNP Submission Batch
Submitter Handle:
OMIM-CURATED-RECORDS
Submitter Batch ID:
590095_batch
Submitter Method ID:
CLINICAL_SNP_SUBMISSION
Citation:
Comment:
Batch Total SubSNP(ss)
Count:
SNP
Allele
Samplesize
RefSNP(rs)
ss2rs
Orien
Chr
ChrPos
Contig
Accession
Contig
Pos
ss49214876
8804
6
A/G
N.D
rs19947467
3
0
MT
5521
NC_012920
.1
5521
ss49214877
8805
0
A/G
N.D
rs19947467
4
0
MT
5532
NC_012920
.1
5532
ss49214876
8803
2
AG/T
N.D
rs19947467
2
0
MT
5537
NC_012920
.1
5537
ss49214875
8802
3
A/G
N.D
rs19947467
1
0
MT
5549
NC_012920
.1
5549
SubSNP(ss)
Submitter
SNP_ID
not supplied
not supplied
4
Entrez records
Homo sapiens
Taxonomy ID: 9606
Genbank common name: human
Inherited blast name: primates
Rank: species
Genetic code: Translation table 1
(Standard)
Mitochondrial genetic code:
Translation table 2 (Vertebrate
Mitochondrial)
Other names:
common name: man authority:
Homo sapiens Linnaeus, 1758
Database name
Subtree links
Direct links
Nucleotide
9,892,226
9,892,201
Nucleotide EST
8,315,296
8,315,296
Nucleotide GSS
1,695,452
1,694,126
599,454
599,358
Structure
19,444
19,444
Genome
51
50
22,309
22,309
60,480,978
60,480,978
10
10
GEO Datasets
402,695
402,695
UniGene
129,493
129,493
UniSTS
328,584
328,584
PubMed Central
11,220
11,214
Gene
42,139
42,102
HomoloGene
18,431
18,431
SRA Experiments
72,649
72,647
9,033,473
9,033,473
Bio Project
694
693
Bio Sample
550,346
550,343
2,219
2,219
795,936
795,936
Epigenomics
1,987
1,987
GEO Profiles
27,034,750
27,034,750
13
13
2
1
Protein
Popset
SNP
Domains
Probe
Bio Systems
dbVar
Protein Clusters
Taxonomy
Protein structure-presentation
•
Ribbon diagram
PyMol ribbon of the unusual structure
of the "tubby" brain protei
Computer-drawn ribbon diagram of two
CuZn superoxide dismutase dimers.
Hollow 1.1 – Illustration software for Proteins
HOLLOW facilitates the production of surface images of proteins. Hollow generates fake
atoms that identifies voids, pockets, channels and depressions in a protein structure
specified in the PDB format.
interior pathway surfaces
channel surfaces (and
electrostatic surfaces)
ligand-binding surfaces
Softwares help addressing protein functions
Molecular dynamics (MD)
(mimicking the structure/conformations)
Purpose:To understand statistical nature of conformations
MD requires the following parameters:
•
•
•
•
i. Dimension, parameters related to the state of the platform-initial conditions
ii. Dimensions of the participating atoms
iii. Structure of the individual molecules or sections of the whole structure.
iv. Physical properties like charges on the atoms
MD allows to locate agents/atoms involved in a structure by
providing the following:
•
•
i. coordinates (in most cases time dependent)
ii. Projection
MD results can importantly be converted into energetics:
•
•
i. interactions between participating agents/atoms
ii. Interactions with the background
MD on DNA-lipid interaction
An example of MD on interactions between biomolecules
Important illustration in drug discovery
Certain programs can convert these data into energy
Information
Energy
Swiss Prot Database
•
•
UniProtKB/Swiss-Prot
UniProtKB/Swiss-Prot is the manually annotated and reviewed section of the
UniProt Knowledgebase (UniProtKB).
It is a high quality annotated and non-redundant protein sequence database,
which brings together experimental results, computed features and scientific
conclusions.
•
Since 2002, it is maintained by the UniProt consortium and is accessible via
the UniProt website http://www.uniprot.org/ .
•
Deals with
interactions, protein modelling, proteomics, protein structure & function, and
genome analysis & annotation, etc.
UniProtKB
•
The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of
functional information on proteins, with accurate, consistent and rich annotation.
The UniProt Knowledgebase consists of two sections:
a section containing manually-annotated records with information extracted from
literature and curator-evaluated computational analysis,
and a section with computationally analyzed records that await full manual annotation.
For the sake of continuity and name recognition, the two sections are referred to as
"UniProtKB/Swiss-Prot" (reviewed, manually annotated) and "UniProtKB/TrEMBL"
(unreviewed, automatically annotated), respectively.
•
•
•
•
•
•
•
Why is UniProtKB composed of 2 sections, UniProtKB/Swiss-Prot and UniProtKB/TrEMBL?
Where do the protein sequences come from?
About 85 % of the protein sequences provided by UniProtKB are derived from the translation of
the coding sequences (CDS) which have been submitted to the public nucleic acid databases,
the EMBL-Bank/GenBank/DDBJ databases (INSDC). All these sequences, as well as the
related data submitted by the authors, are automatically integrated into UniProtKB/TrEMBL.
Where do the UniProtKB protein sequences come from?
Does UniProtKB contain all protein sequences?
What are the differences between UniProtKB/Swiss-Prot and UniProtKB/TrEMBL?
UniProtKB/TrEMBL (unreviewed) contains protein sequences associated with computationally
generated annotation and large-scale functional characterization. UniProtKB/Swiss-Prot
(reviewed) is a high quality manually annotated and non-redundant protein sequence database,
which brings together experimental results, computed features and scientific conclusions.
PCR-Polymerase Chain Reaction
•
•
Polymerase Chain Reaction
Polymerase chain reaction (PCR) enables researchers to produce millions of copies
of a specific DNA sequence in approximately two hours. This automated process
bypasses the need to use bacteria for amplifying DNA.
•
PCR is a scientific technique in molecular biology to amplify a single or a few copies of a piece
of DNA across several orders of magnitude, generating thousands to millions of copies of a
particular DNA sequence.
Developed in 1983 by Kary Mullis,[1] PCR is now a common and often indispensable technique
used in medical and biological research labs for a variety of applications.[2][3] These include
DNA cloning for sequencing, DNA-based phylogeny, or functional analysis of genes; the
diagnosis of hereditary diseases; the identification of genetic fingerprints (used in forensic
sciences and paternity testing); and the detection and diagnosis of infectious diseases. In 1993,
Mullis was awarded the Nobel Prize in Chemistry along with Michael Smith for his work on
PCR.[4]
The method relies on thermal cycling, consisting of cycles of repeated heating and cooling of
the reaction for DNA melting and enzymatic replication of the DNA. Primers (short DNA
fragments) containing sequences complementary to the target region along with a DNA
polymerase (after which the method is named) are key components to enable selective and
repeated amplification. As PCR progresses, the DNA generated is itself used as a template for
replication, setting in motion a chain reaction in which the DNA template is exponentially
amplified. PCR can be extensively modified to perform a wide array of genetic manipulations.
•
•
•
http://www.youtube.com/DNALearningCenter
Fast A and BLAST
•
•
FASTA suite of programs to perform sequence searching of the EBI protein
databases using local or global similarity.
In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for
comparing primary biological sequence information, such as the amino-acid
sequences of different proteins or the nucleotides of DNA sequences. A BLAST
search enables a researcher to compare a query sequence with a library or database
of sequences, and identify library sequences that resemble the query sequence
above a certain threshold. Different types of BLASTs are available according to the
query sequences. For example, following the discovery of a previously unknown
gene in the mouse, a scientist will typically perform a BLAST search of the human
genome to see if humans carry a similar gene; BLAST will identify sequences in the
human genome that resemble the mouse gene based on similarity of sequence. The
BLAST program was designed by Stephen Altschul, Warren Gish, Webb Miller,
Eugene Myers, and David J. Lipman at the NIH and was published in the Journal of
Molecular Biology in 1990
Phylogenetic tree tutorial
All life on Earth is united by evolutionary history; we are all evolutionary cousins — twigs on the tree
of life. Phylogenetic systematics is the formal name for the field within biology that reconstructs
evolutionary history and studies the patterns of relationships among organisms. Unfortunately,
history is not something we can see. It has only happened once and only leaves behind clues
as to what happened. Systematists use these clues to try to reconstruct evolutionary history.
See the attached tutorial: pdf file provided
A phylogeny, or evolutionary tree, represents the evolutionary relationships
among a set of organisms or groups of organisms, called taxa (singular: taxon).
The tips of the tree represent groups of descendent taxa (often species) and the
nodes on the tree represent the common ancestors of those descendants. Two
descendents that split from the same node are called sister groups. In the tree
below, species A & B are sister groups — they are each other's closest relatives.
Many phylogenies also include an outgroup — a taxon outside the group of
interest. All the members of the group of interest are more closely related to
each other than they are to the outgroup. Hence, the outgroup stems from the
base of the tree. An outgroup can give you a sense of where on the bigger tree
of life the main group of organisms falls. It is also useful when constructing
evolutionary trees.
Evolutionary trees depict
clades. A clade is a group of
organisms that includes an
ancestor and all
descendants of that
ancestor. You can think of a
clade as a branch on the
tree of life.