* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide 1
List of types of proteins wikipedia , lookup
Ridge (biology) wikipedia , lookup
Exome sequencing wikipedia , lookup
Genomic imprinting wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Gene expression wikipedia , lookup
Non-coding DNA wikipedia , lookup
Gene desert wikipedia , lookup
Gene regulatory network wikipedia , lookup
Gene expression profiling wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Genomic library wikipedia , lookup
Community fingerprinting wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Molecular evolution wikipedia , lookup
For Bioinformatics, Start with: Genomics: READING genome sequences carry out dideoxy sequencing ASSEMBLY of the sequence connect seqs. to make whole chromosomes ANNOTATION of the sequence find the genes! The Human Genome E. coli Genome Reading: Shotgun DNA Sequencing of whole genome (WGS) DNA target sample SHEAR Reads LIGATE & CLONE Primer SEQUENCE Vector Reading to Assembly: Assembly: 4 million bp 3 billion bp The challenge of eukaryotic genomes E. coli Genome The Human Genome 50% of genome is repeat sequences! Assembly of sequence of each chromosome from end to end END, Jan 14 begin Annotation: Genomics: READING genome sequences Robotically do dideoxy-dye data collection ASSEMBLY of the sequence Whole genome shotgun OR Ordered clones ANNOTATION of the sequence find the genes ! Annotation: 10/1/5 Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence find the genes ! 1. ab initio 2. by evidence Annotation: For Bacterial genomes, ab initio is adequate ab initio: “from the beginning” יש מאין from first principles… ORFs are MOST of prokaryotic genome Annotation: ab initio – finding ORFs -85-88% of the nucleotides are associated with coding sequence in the bacterial genomes that have been completely sequenced. example: in Escherichia coli there are 4288 genes that have an average of 950 bp of coding sequence and are separated by an average of just 118 bp. So first, to find genes in prokaryotic DNA, search for ORFs!! Annotation: ab initio – finding ORFs -85-88% of the nucleotides are associated with coding sequence in the bacterial genomes that have been completely sequenced. example: in Escherichia coli there are 4288 genes that have an average of 950 bp of coding sequence and are separated by an average of just 118 bp. So first, to find genes in prokaryotic DNA, search for ORFs!! Annotation: ab initio – beyond ORFs beyond ORFs: -Prokaryotes have short, simple promoters that are easy to recognize -Transcriptional terminators often consist of short inverted repeats followed by a run of Ts. -Therefore, programs that find prokaryotic genes search for: ORFs 60 or more codons long –and codon usage promoters at the 5' end Terminators at the 3' end Homology to known genes from other prokaryotes Shine-Dalgarno sequences Annotation: ab initio – automated Prokaryotic gene finder examples GlimmerInterpolated Markov Model method GrailIINeural Network method (See BioInfo text – Fig 8.8) Annotation: results Annotation: Multicellular eukaryotes Done too 10/1/5 Annotation: Multicellular eukaryotes Done too 10/1/5 Annotation: Multicellular eukaryotes Done too 10/1/5 Annotation: 2 ways to annotate eukaryotic genomes: -ab initio gene finders: Work on basic biological principles: Open reading frames Codon usage Consensus splice sites Genes basedMet onstart previous codons knowledge-EVIDENCE ….. -cDNA sequence of the gene’s message -ab initio gene finders: Work on basic biological principles: Open reading frames Codon usage splice sites seq. -cDNAConsensus of a related gene’s message -Genes based on previous knowledge….EVIDENCE -Protein sequence of gene’s the known gene Met startof codons -cDNA sequence the message -cDNA of a closely related gene’ message sequence Same gene’s ….. -Protein sequence of the known gene Same gene’s from another species Same gene’s Related gene’s Same gene’s fromprotein……. another species Related gene’s protein……. start and stop site predictions Unique identifiers Splice site predictions Homology based exon predictions computational exon predictions Tracking information Consensus gene structure (both strands) Automatically generated annotation A zebrafish hit shows a gene model protein encoded by a 6 exon gene. This gene structure (intron/exon) is seen in other species, as is the protein size. The proteins, if corresponding to MSP in S. gal., must be heavily glycosylated (likely). At least some have a signal peptide. The zebrafish hit can be viewed at higher resolution, and… The zebrafish hit can be viewed down to nucleotide resolution Genomics: READING genome sequences carry outeach dideoxy , 700 bp read,sequencing MAX ASSEMBLY of the sequence connect seqs. to make whole chromosomes ANNOTATION of the sequence Genomics: READING genome sequences carry out dideoxy sequencing ASSEMBLY of the sequence connect seqs. to make whole chromosomes ANNOTATION of the sequence find the genes! Annotation: cDNAs & ESTs: Expressed Sequence Tags RNA target sample End Reads (Mates) cDNA Library Primer SEQUENCE Each cDNA provides sequence from the two ends – two ESTs Who Gets Sequenced? Models Pathogens Agriculturals Array analysis: see animation from Griffiths Protein Structure Database See Swiss-pdb viewer RNA for ALL C. elegans genes RNAi for every C. elegans gene too! -results on the web Projects to systematically Knock-out (or pseudo-knockout) every gene, in order to establish phenotype of each gene -> function of each gene