* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download One copy from each parent Each parent passes on a “mixed copy”
Cell-penetrating peptide wikipedia , lookup
Polyadenylation wikipedia , lookup
Gene regulatory network wikipedia , lookup
Genome evolution wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Synthetic biology wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Eukaryotic transcription wikipedia , lookup
Expanded genetic code wikipedia , lookup
RNA polymerase II holoenzyme wikipedia , lookup
Messenger RNA wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Non-coding RNA wikipedia , lookup
Biochemistry wikipedia , lookup
Epitranscriptome wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Genetic code wikipedia , lookup
Non-coding DNA wikipedia , lookup
List of types of proteins wikipedia , lookup
Gene expression wikipedia , lookup
Biosynthesis wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Biological Sequence Analysis 140.638.01 The materials used in this class are made possible by: Zhiping Weng, http://zlab.bu.edu Wenyi Wang Zhijin Wu Garland publishing, Alberts’s the Cell And the wealth of internet resources Who are we? Sining Chen Carlo Colantuoni Giovanni Parmigiani Who are you? • Field of research • Stats & computing background • Register or audit • Why are you taking this course • Specific topics you are interested Administrative Details http://astor.som.jhmi.edu/~sining/BSA/syllabus.h tm The MHS program in Bioinfo Jointly offered by Dept. Biostatistics and Molecular Microbiology and Immunology An intensive one-year program that emphasizes biology, statistical methods, and computing Goal of the class •• Learn to look at biological sequences from a probabilistic point of view • Understand algorithms behind routine operations, e.g. BLAST. • Be able to build statistical model to solve problems involving sequences Biological Sequence Analysis: Basic Biological Concepts Carlo Colantuoni Clinical Brain Disorders Branch, NIMH, NIH Dept. Biostatistics, JHSPH [email protected] [email protected] Molecular Cell Biology: Central Dogma Replication DNA Transcription RNA Translation Protein Sequence analysis important at all 3 levels The Human Genome Genomic Content: 3.3 billion bases ~30K genes 23 chromosomes (22+X/Y) Millions of variants DAD MOM 2 copies in every cell (46 chr) One copy from each parent Each parent passes on a “mixed copy” YOU Nucleotides are the chemical building block of Nucleic Acids: DNA and RNA Nucleotides are the chemical building block of Nucleic Acids: DNA and RNA From Genomic DNA to mRNA Transcripts EXONS INTRONS Protein-coding genes are not easy to find - gene density is low, and exons are interrupted by introns. ~30K >30K Promoters Alternative splicing Poly-Adenylation Molecular Cell Biology: Components of the Central Dogma Protein Translation START mRNA 5’ UTR protein coding STOP AAAAA 3’ UTR Transcription Genomic DNA 3.3 Gb Translation - Protein Synthesis: Every 3 nucleotides (codon) are translated into one amino acid DNA: A T G C Replication 1:1 Transcription RNA: A U G C 3:1 Protein: 20 amino acids Translation Translation - Protein Synthesis RNA Protein 5’ -> 3’ : N-term -> C-term Nucleotide sequence determines the amino acid sequence The Human Genome Genomic Content: 3.3 billion bases ~30K genes 23 chromosomes (22+X/Y) 2 copies in every cell DAD One copy from each parent Each parent passes on a “mixed copy” MOM Deletions Insertions Mutations Evolutionary Scale YOU Biological Sequence Analysis: Primary Concepts Identity Homologue & Paralogue Similarity Ortholog