Download Unit 5 - Genomics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Spring 2007
Bioinformatiatics
Ch. 6 - Genomics
Spring 2009
Bioinformatiatics
Completed genomes
• http://www.genomesonline.org
Spring 2009
Bioinformatiatics
•Avg. genome = 5 mb
•Typical sequence coverage = 20X, therefore approx. 100 mb of DNA
•Avg. English word size = 5 letters
•Avg. words per page = 250, therefore 1250 letters per page
•Avg. book size = 200 pages, therefore 250,000 letters per book
•Approximately 400 books per genome
•958 completed genomes as of January 1, 2009
•Approximately 383,200 books worth of genomic information
•MSU library holdings: 182,000
Approaches to Genome
Sequencing
Spring 2007
Bioinformatiatics
• Whole Genome Sequencing
• Shotgun Sequencing
• Expressed Sequence Tags
• Comparative Genomics
• Metagenomics
Overview of Genome Sequencing
Isolate Genomic DNA
Genomic DNA
Create Genomic Library
BAC Clones
Construction of Genome Map
DNA Sequencing and Assembly
Isolating Genomic DNA
ala, Qiagen’s DNeasy kit
Lysis:
• Proteinase K digestion
• Lysis by chaotropic salt
Purification:
• DNA negatively charged
• Bind positively charged column
• Wash (EtOH) away impurities
Elution:
• Removal of DNA
• Disrupt ionic interaction with high salt
buffer
Preservation:
• Store at -20°C to -160°C
• Tris•EDTA buffer [pH 8.0]
Sephadex Structure
Creating a Genomic Library
Cut Genomic DNA:
• Partial Restriction Digest
•EcoRI & EcoRI methylase
• Mechanical Shearing
• Determine Avg. fragment size
Clone Fragments into BAC vectors:
• Proporties of BACs
BAC Clones
Transform E. coli:
• Electroporation
Pulse Field Gel Electrophoresis
Average Insert Size by Pulse Field Gel
Electrophoresis
Average Insert Size in Human
BACs
Creating a Genomic Library
Cut Genomic DNA:
• Partial Restriction Digest
•EcoRI & EcoRI methylase
• Mechanical Shearing
• Determine Avg. fragment size
Clone Fragments into BAC vectors:
• Proporties of BACs
BAC Clones
Transform E. coli:
• Electroporation
Bacterial Artificial Chromosome
• Derived from F plasmids
• Multiple cloning site
• Selectable Marker
• Antibiotic Resistance Gene - ie, cm
• Ori S - unidirectional
• Par genes
• partitioning genes
• maintain single copy of BAC
Creating a Genomic Library
Cut Genomic DNA:
• Partial Restriction Digest
•EcoRI & EcoRI methylase
• Mechanical Shearing
• Determine Avg. fragment size
Clone Fragments into BAC vectors:
• Proporties of BACs
BAC Clones
Transform E. coli:
• Electroporation
Construction of Genome Map
Transformed E. coli:
Plasmid Miniprep
BAC Clones
Construction of Genome Map
• BAC end sequencing
• Identify overlapping BACs
• Subclone BACs into plasmids
DNA Sequencing and Assembly
Genome
Assembly and
Annotation
Overview of Shotgun Sequencing
Isolate Genomic DNA
Genomic DNA
Create Genomic Library
Plasmid Clones
DNA Sequencing and Assembly
Construction of Genome Map
Overview of EST Sequencing
Isolate mRNA
Create cDNA
Create Genomic Library
DNA Sequencing
Comparative Genomics
Isolate mRNA and create cDNA
Create Genomic Library
BAC Clones
Construction of Genome Map
DNA Sequencing and Assembly
Synteny - same gene order preserved between species
Comparative Genomics BAC
array
Comparative Genome Hybridization
Bordetella phylogeny
Comparative Genome Hybridization
Comparative Genome Hybridization
Metagenomic analysis
• What is metagenomics?
– Metagenomics is the genomic analysis of the collective
genomes of an assemblage of organisms from a defined
environment.
» Handelsman, et al, 2002
– a.k.a., community genomics, environmental genomics
– Derived from tools, techniques and models used in genomics.
• Why do metagenomic analysis?
– Genomic content of all eucaryotes, bacteria, archaea and
viruses in an evironment.
– Provides a picture of genetic/functional potential of the
community.
Metagenomics
Venter’s Trip
Yooseph, et al, PLOS biology, 2007
Yooseph, et al, PLOS biology, 2007
Creation of Fosmid Libraries
Preliminary Categorization of 263 ORFs
from a Fosmid Library of Subgingival Plaque
Category
Percentage of library
Eucaryotic
34%
Bacterial
21%
Archaeal
1.1%
Viral1
0.8%
Bacteriophage
2%
Unidentified
41%
1not
bacteriophage
Spring 2007
Bioinformatiatics
Genome Annotation
Genome
Assembly and
Annotation
RefSeq db
Caveats
• Finding genes involves computational
methods as well as experimental validation
• Computational methods are often inadequate,
and often generate erroneous ‘gene’ (false
positive) sequences which:
–
–
–
–
Are missing exons
Have incorrect exons
Over predict genes
Where the 5’ and 3’ UTR are missing
Things we are looking to annotate?
•
•
•
•
•
•
•
•
CDS
mRNA
Alternative RNA
Promoter and Poly-A Signal
Pseudogenes
ncRNA
Repeat elements
G+C content
Pseudogenes
• Could be as high as 20-30% of all Genomic sequence
predictions could be pseudogene
• Non-functional copy of a gene
– Processed pseudogene
•
•
•
•
Retro-transposon derived
No 5’ promoters
No introns
Often includes poly-A tail
– Non-processed pseudogene
• Gene duplication derived
– Both include events that make the gene non-functional
• Frameshift
• Stop codons
• We assume pseudogenes have no function, but we
really don’t know!
Noncoding RNA (ncRNA)
• tRNA – transfer RNA: involved in translation
• rRNA – ribosomal RNA: structural component
of ribosome, where translation takes place
• snRNA – small nuclear RNA:
functional/catalytic in RNA maturation
• Antisense RNA - gene regulation
• siRNA - gene silencing
Noncoding RNA (ncRNA)
• ncRNA represent 80-98% of all transcripts in cell
• ncRNA have not been taken into account in gene
counts
• cDNA
• ORF computational prediction
• Comparative genomics looking at ORF
• ncRNA can be:
– Structural
– Catalytic
– Regulatory
GenBank Features
-10_signal
-35_signal
3'clip
3'UTR
5'clip
5'UTR
attenuator
CAAT_signal
CDS
conflict
C_region
D-loop
D_segment
enhancer
exon
GC_signal
gene
iDNA
intron
J_segment
LTR
mat_peptide
misc_binding
misc_difference
misc_feature
misc_recomb
misc_RNA
misc_signal
misc_structure
modified_base
mRNA
N_region
old_sequence
polyA_signal
polyA_site
precursor_RNA
primer_bind
prim_transcript
promoter
protein_bind
RBS
repeat_region
repeat_unit
rep_origin
rRNA
satellite
scRNA
sig_peptide
snoRNA
snRNA
S_region
stem_loop
STS
TATA_signal
terminator
transit_peptide
tRNA
unsure
variation
V_region
V_segment
LOCUS
DEFINITION
NG_005487
1850 bp
DNA
linear
ROD 14-FEB-2006
Mus musculus ubiquitin-conjugating enzyme E2 variant 2 pseudogene
(LOC625221) on chromosome 6.
ACCESSION
NG_005487
VERSION
NG_005487.1 GI:87239965
KEYWORDS
.
SOURCE
Mus musculus (house mouse)
ORGANISM Mus musculus
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia;
Sciurognathi; Muroidea; Muridae; Murinae; Mus.
REFERENCE
1 (bases 1 to 1850)
AUTHORS
Wilson,R.
TITLE
Mus musculus BAC clone RP24-201D17 from 6
JOURNAL
Unpublished (2003)
COMMENT
PROVISIONAL REFSEQ: This record has not yet been subject to final
NCBI review. The reference sequence was derived from AC121925.2.
FEATURES
Location/Qualifiers
source
1..1850
/organism="Mus musculus"
/mol_type="genomic DNA"
/db_xref="taxon:10090"
/chromosome="6"
/note="AC121925.2 32277..34126"
gene
101..1750
/gene="LOC625221"
/pseudo
/db_xref="GeneID:625221"
repeat_region
1792..1827
/rpt_family="ID"
ORIGIN
1 tcttctgcct caattcctca agtgctagta tcatatgccc atgccattat ttttaactcc
61 cctttttcat gctaagaatt gaacacacgg ccctgcgtgc ggtggtgcgt ctggtagcag
121 gagaagatgg cggtctccac aggagttaaa gttcctcgta attttcgctt gttggaagaa
The ideal annotation of
“MyGene”
All clones
All SNPs
Promoter(s)
MyGene
All mRNAs
All proteins
All structures
• All protein modifications
• Ontologies
• Interactions (complexes,
pathways, networks)
•Expression (where and
when, and how much)
•Evolutionary relationships
Related documents