Download overview

Document related concepts

Epistasis wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Ridge (biology) wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Non-coding DNA wikipedia , lookup

Protein moonlighting wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

RNA silencing wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Copy-number variation wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Genetic engineering wikipedia , lookup

Gene therapy wikipedia , lookup

Genomic imprinting wikipedia , lookup

Genomic library wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Transposable element wikipedia , lookup

Point mutation wikipedia , lookup

RNA interference wikipedia , lookup

Nutriepigenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Non-coding RNA wikipedia , lookup

NEDD9 wikipedia , lookup

Human genome wikipedia , lookup

Gene desert wikipedia , lookup

Minimal genome wikipedia , lookup

Metagenomics wikipedia , lookup

Public health genomics wikipedia , lookup

Gene nomenclature wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene expression programming wikipedia , lookup

Pathogenomics wikipedia , lookup

Genome (book) wikipedia , lookup

Gene wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genomics wikipedia , lookup

Microevolution wikipedia , lookup

Genome editing wikipedia , lookup

Helitron (biology) wikipedia , lookup

Designer baby wikipedia , lookup

Genome evolution wikipedia , lookup

RNA-Seq wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
For Bioinformatics, Start with:
Genomics:
READING genome sequences
carry out dideoxy sequencing
ASSEMBLY of the sequence
connect seqs. to make whole chromosomes
ANNOTATION of the sequence
find the genes!
For Bioinformatics, Start with:
Genomics:
READING genome sequences
carry out dideoxy sequencing
ASSEMBLY of the sequence
connect seqs. to make whole chromosomes
ANNOTATION of the sequence
find the genes!
2 ways to annotate eukaryotic genomes:
-ab initio gene finders:
Work on basic biological principles:
Open reading frames
Consensus splice sites
Met start codons
…..
-Genes based on previous knowledge….EVIDENCE of message
2 ways to annotate eukaryotic genomes:
-ab initio gene finders:
Work on basic biological principles:
Open reading frames
Consensus splice sites
Met start codons
…..
-Genes based on previous knowledge….EVIDENCE of message
cDNA sequence of the gene’s message
cDNA of a closely related gene’ message sequence
Protein sequence of the known gene
Same gene’s
Same gene’s from another species
Related gene’s protein…….
start and
stop site
predictions
Unique identifiers
Information for Ab initio
gene finding
Splice site
predictions
Homology
based exon
predictions
computational
exon
predictions
Tracking
information
Consensus gene
structure (both strands)
Automatically
generated
annotation
A zebrafish hit shows a gene model protein encoded by a 6 exon gene.
This gene structure (intron/exon) is seen in other species, as is the protein size.
The proteins, if corresponding to MSP in S. gal., must be heavily glycosylated (likely).
At least some have a signal peptide.
The zebrafish hit can be viewed at higher resolution, and…
The zebrafish hit can be viewed down to nucleotide resolution
Sarin et al
Sarin et al
Is there linkage between a mutant gene/phenotype and a SNP?
USE standard genetic mapping technique,
with SNP alternative sequences as “phenotype”
..ACGTC..
B= bad hair, Dominant
SNP1
SNP1’
..ACGCC..
SNP2
SNP2’
..GCTAA..
..GCAAA..
SNP3
SNP3’
..GTAAC..
..GTCAC..
X
F1
B
START with
Inbred linesSNPs are
homozygosed
X
SNP1’
SNP1’
..ACGCC..
..ACGCC..
SNP1
SNP1
..ACGTC..
..ACGTC..
SNP2’
SNP2’
..GCAAA..
..GCAAA..
SNP2
SNP2
..GCTAA..
..GCTAA..
SNP3’
SNP3’
..GTCAC..
..GTCAC..
SNP3
SNP3
..GTAAC..
..GTAAC..
Is there linkage between a mutant gene/phenotype and a SNP?
SNP1
SNP1’
USE standard genetic mapping technique,
with SNP alternative sequences as “phenotype”
..ACGTC..
B= bad hair, Dominant
B 2’ / b 2
..ACGCC..
SNP2
SNP2’
..GCTAA..
..GCAAA..
SNP3
SNP3’
..GTAAC..
..GTCAC..
X
B/b 1’/1 2’/2 3’/3
b/b 1/1 2/2 3/3
B/b 1’/1 25%
2’/2 47%
3’/3 25%
B/b 1/1
25%
2/2
3%
3/3
b/b 1’/1 25%
2’/2
3%
3’/3 25%
25%
2/2 47%
3/3 25%
b/b 1/1 25%
SO…B is 6 cM from SNP2, and is unlinked to SNP 1 or 3
Is there linkage between a mutant gene/phenotype and a SNP?
USE standard genetic mapping technique,
with SNP alternative sequences as “phenotype”
..ACGTC..
B= bad hair, Dominant
SNP1
SNP1’
..ACGCC..
SNP2
SNP2’
..GCTAA..
..GCAAA..
SNP3
SNP3’
..GTAAC..
..GTCAC..
X
B/b 1/1’ 2/2’ 3/3’
b/b 1/1 2/2 3/3
We have the ENTIRE genome sequence of mouse,
so we know where the SNPs are
Now-do this while checking the sequence of THOUSANDS of SNPs
SO…B is 6 cM from SNP2, and is unlinked to SNP 1 or 3
Genomics:
READING genome sequences
carry out dideoxy sequencing
ASSEMBLY of the sequence
connect seqs. to make whole chromosomes
ANNOTATION of the sequence
find the genes!
But Bioinformatics is more…
TRANSCRIPTOMICS:
cDNAs &
ESTs: Expressed Sequence Tags
RNA target sample
End Reads (Mates)
cDNA Library
Primer
SEQUENCE
Each cDNA provides sequence from the two ends – two ESTs
Protein sequence: from peptide sequencing, or
from translation of sequenced nucleic acids
!!AA_SEQUENCE 1.0
ab025413 peptide
tenm4.pep Length: 2771
May 12, 1999 09:34
Type: P
Check: 2254
..
1
MDVKERKPYR SLTRRRDAER RYTSSSADSE EGKGPQKSYS SSETLKAYDQ
51
DARLAYGSRV KDMVPQEAEE FCRTGTNFTL RELGLGEMTP PHGTLYRTDI
101
GLPHCGYSMG ASSDADLEAD TVLSPEHPVR LWGRSTRSGR SSCLSSRANS
151
NLTLTDTEHE NTETDHPSSL QNHPRLRTPP PPLPHAHTPN QHHAASINSL
201
NRGNFTPRSN PSPAPTDHSL SGEPPAGSAQ EPTHAQDNWL LNSNIPLETR
251
NLGKQPFLGT LQDNLIEMDI LSASRHDGAY SDGHFLFKPG GTSPLFCTTS
301
PGYPLTSSTV YSPPPRPLPR STFSRPAFNL KKPSKYCNWK CAALSAILIS
351
ATLVILLAYF VAMHLFGLNW HLQPMEGQMQ MYEITEDTAS SWPVPTDVSL
401
YPSGGTGLET PDRKGKGAAE GKPSSLFPED SFIDSGEIDV GRRASQKIPP
Structural proteomics:
Coordinates, rather than 1D sequence, Saved
/TRANSCRIPTOMICS (Arrays)
Where? When? Who?
are the RNAs
RNA for ALL C. elegans genes
Where? When? Who?
are the RNAs
Where? When? Who?
are the RNAs
Where? When? Who?
are the RNAs
MICROARRAY ANALYSIS
/TRANSCRIPTOMICS (Arrays)
Where? When? Who?
are the RNAs
Figure 4.15 Microarray Technique
Where? When? Who?
are the RNAs
Figure 4.15 Microarray Technique
Where? When? Who?
are the RNAs
Array analysis: see animation from Griffiths
Where? When? Who?
are the RNAs
Figure 4.16(1) Microarray Analysis of Those Genes
Whose Expression in the Early Xenopus Embryo Is
Caused by the Activin-Like Protein Nodal-Related 1
(Xnr1)
Where? When? Who?
are the RNAs
Figure 4.16(2) Microarray Analysis of Those Genes
Whose Expression in the Early Xenopus Embryo Is
Caused by the Activin-Like Protein Nodal-Related 1
(Xnr1)
Where? When? Who?
are the RNAs
Where? When? Who?
are the RNAs
RNAi for every C. elegans
gene too!
-results on the web
Projects to systematically Knock-out (or pseudo-knockout)
every gene, in order to establish phenotype of each gene
-> function of each gene
Figure 4.23(1) Use of Antisense RNA to Examine the Roles of Genes
in Development (here fly)
Figure 4.23(2) Use of Antisense RNA to Examine the
Roles of Genes in Development (here fly)
RNAi for ALL C. elegans genes
Figure 4.24 Injection of dsRNA for E-Cadherin into the Mouse Zygote
Blocks E-Cadherin Expression
MODENCODE
MODENCODE
MODENCODE
MODENCODE
MODENCODE
MODENCODE
MODENCODE
MODENCODE
MODENCODE
MODENCODE was from the Drosophila paper:
Nature. 2011 Mar 24;471(7339):527-31. doi: 10.1038/nature09990.
A cis-regulatory map of the Drosophila genome.
Nègre N et al.
KNOCK-OUTS OF ALL ESSENTIAL GENES – RANDOM
MUTAGENESIS ATTEMPT – using transposon mobilization
Followed by INVERSE PCR to recover seqeunce adjacent to insertion.
Then compare to the complete Drosophila genome sequence to know which ORF “Hit”
About 10% of All Assumed genes “Hit” (~10/100 per interval) on
Drosophila X chromosome. 1 series of random insertion experiments.
ALL inset sites know, thanks to INVERSE PCR
2-hybrid reaction between one protein and all 6000+
potential interactors in Yeast Genome
Figure 1 The two-hybrid assay carried out by screening a protein
array. a, The array of 6,000 haploid yeast transformants plated on
medium lacking leucine, which allows growth of all transformants.
Each transformant expresses one of the yeast ORFs expressed as a
fusion to the Gal4 activation domain. b, Two-hybrid positives from
a screen of the array with a Gal4 DNA-binding domain fusion of
the Pcf11 protein, a component of the pre-mRNA cleavage and
polyadenylation factor IA, which also consists of four other
polypeptides36. Diploid colonies are shown after two weeks of
growth on medium lacking tryptophan, leucine and histidine and
supplemented with 3 mM 3-amino-1,2,4-triazole, thus allowing
growth only of cells that express the HIS3 two-hybrid reporter
gene. Three other components of factor IA, Rna14, Rna15 and
Clp1, were identified as Pcf11 interactors. Positives that do not
appear in Table 2 were either not reproducible or are false
positives that occurred in many screens.
Osprey: integrate all 2-hybrid interactions between all 6000+ proteins
in Yeast Genome (Proteome)
Figure 2 Visualization of combined, large-scale interaction data sets in yeast. A total of 14,000 physical interactions
obtained from the GRID database were represented with the Osprey network visualization system (see
http://biodata.mshri.on.ca/grid). Each edge in the graph represents an interaction between nodes, which are
coloured according to Gene Ontology (GO) functional annotation. Highly connected complexes within the data set,
shown at the perimeter of the central mass, are built from nodes that share at least three interactions within other
complex members. The complete graph contains 4,543 nodes of 6,000 proteins encoded by the yeast genome, 12,843
interactions and an average connectivity of 2.82 per node. The 20 highly connected complexes contain 340 genes,
1,835 connections and an average connectivity of 5.39