* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download OverviewLecture1
Molecular cloning wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Genetic code wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Expression vector wikipedia , lookup
Biosynthesis wikipedia , lookup
Genetic engineering wikipedia , lookup
Transposable element wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Molecular ecology wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Gene nomenclature wikipedia , lookup
Gene regulatory network wikipedia , lookup
Genomic library wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene expression wikipedia , lookup
Non-coding DNA wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Protein structure prediction wikipedia , lookup
Community fingerprinting wikipedia , lookup
Homology modeling wikipedia , lookup
Point mutation wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Bioinformatics Overview Carlow IT September 2006 What, me? • • • • • • • • • Andrew Lloyd [email protected] 087-225-9850 053-9255717 Director INCBI 1993-2000 Population genetics, evolution Whole genome analysis Immunology, chickens, FIRM http://ercbinfo1.ucd.ie/itc/ Definition/scope • Storage retrieval and analysis of biological (sequence) information. • Insert better definition here • Case can be made of microarray analysis • NOT – ecoinformatics (ecology) – Image analysis – Bar-coding hospital sheets Subtext Critical thinking: crap detecting • • • • • • • Is it true? Supporting evidence? Assumptions made? Alternative explanations? Is statement testable? Has it been tested? More information necessary? Consequences? Predictions? Philosophy • “Nothing worth learning can be taught” Oscar Wilde • Read the handout before class • Finish the exercises out of class • Read science, talk science, think science • You de punter! • Stop me !!!! Exams • The most boring, stressful and hateful part of the course (for me) • 30% is Continuous assessment – Easy marks, forward planning – Practical exams – Gene day presentations • Exams: any extra info, original thorts = BONUS points Getting bioinformation • Type it in: A,T,C,C,G,T,C,A (1991) • Access databases – – – – – Literature (Medline/Pubmed) Medical (OMIM) DNA sequence (EMBL/GenBank) Protein sequence (UniProt, SwissProt, PIR) 3-D structure (PDB) Annotation • In any DB, half is data and half context. – Parsing sequence (ORF, RBS, Intron, -helix) – Recognising similar sequences (evolution!) – Complementary info : DB cross-referencing • (DNA -> Protein -> 3D structure -> motifs) Secondary databases • • • • • • • • Protein motifs, domains, families RNA structures (16S ribosomal RNA…) Taxonomy/classification Metabolic pathways Enzymes SNPs mutations and variants Disease DBs Immuno, epitope DBs Complete genomes • Ensembl (complex, basically vertebrate) – Uniform look-and-feel; cross-refs – See also UCSC GoldenPath browser • Plants • Bacterial genomes – – – – Mitochondrial, chloroplast Eubacteria archaea Each idiosyncratic & in its own place Some meta-DBs Annotated/known genes • What does my gene do? • Blast (fasta) against the DB • SRS/Entrez to access databases – Neighboring (similar things in same DB) • DB cross-references – full picture of attributes – What biochemical pathway? OMIM Maps & Genomes FullText Journals GenBank/EMBL DNA Sequence PubMed UniProt Protein sequence Prosite Pfam Taxonomy The territory PSSM PDB 3-D struct Databases • BIG • EMBL/GenBank 400GB, 60m entries, 2500 complete genomes, 200K species • Encyclopedia Britannica 180m letters. 1.3m • EMBL 3km of Britannica Volumes • Doubling every 14-18 mo • Human genome is ? New Unknown Gene • • • • • • • • Blast homology searching Genomic location/neighboring genes Where is it expressed? How regulated (control sequences) Intron/exon structure Domain structure Restriction sites etc. Primer design DNA/gene structure • Four bases A T C G U – 2 pyrimidine, 2 purine – LOTS of them: how many? • • • • Open reading frame 5’ signals, 3’ signals Introns/exons Neighbours (operons) Two sequences • Alignment – Local – Global • Dotplot • Threading One seq vs many • • • • • • Homology search vs database Special case of 2-seq alignment Blast vs fasta Limit by species/taxon Substitution matrices Low complexity masking Multiple sequence alignment • MSA • Progressive alignment • ClustalW or (better) T-Coffee Phylogenetic trees • Computationally intensive • Distance matrix methods – Neighbor-joining (NJ) – UPGMA • Minimum evolution • Maximum parsimony • Maximum likelihood – Bayesian methods Genefinding • Special case of DNA analysis • How to annotate a genome • Bacterial – Find open reading frames (ORFs) – With start/stop codons – With promoter, RBS, CAAT, TATA • Eukaryotic – As above PLUS – Introns/exons – Alternative splicing Protein substructure • DNA makes protein and protein (enzymes) make everything else. • 20 Amino acids • Amino acid properties • Motifs • Domains • Biological units Amino acid properties again … and again and again Protein 3-D structure • Relationship between sequence & structure • Secondary structure – Alpha helix – Beta sheet – Coil – Turn • Threading sequence to homologous structure Gene Expression • • • • EST SAGE MicroArray Clustering of same expressed genes Genomics • Complete DNA seq for a species • Gene order • Gene clusters/operons – Missing operons • Gene duplication • Whole genome duplication (WGD) SNPs • Key issue in genetics is that two organisms are both the same and different: – Humans vs chimps vs mouse – Parent vs offspring vs co-national vs human • Single nucleotide polymorphisms • Variation between individuals • Pharmacogenetics – Personal tailored medicine Summary/take home • Course designed to give you access to databases, software tools • …and ways of thinking about data