* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Bioinformatics course 10.09.15
Real-time polymerase chain reaction wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Genomic library wikipedia , lookup
Interactome wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Biochemistry wikipedia , lookup
Expression vector wikipedia , lookup
Molecular ecology wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Gene regulatory network wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Gene expression profiling wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Community fingerprinting wikipedia , lookup
Non-coding DNA wikipedia , lookup
Point mutation wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Gene expression wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Introduction to bioinformatics Bioinformatics course 10.09.15 Proposed room and time 1. N 12-14 room 107 and N 14-16 room 224 in Jakobi 2 2. E 8-10 and 10-12 room 405 in Liivi 2. 3. K 14-16 and 16-18 room 405 in Liivi 2. Life on earth LIFE: the condition that distinguishes animals and plants from inorganic matter, including the capacity for growth, reproduction, functional activity, and continual change preceding death. Life on earth https://letstalkaboutscience.wordpress.com/2012/07/30/understanding-deep-time/ Biology is a natural science concerned with the study of life and living organisms, including their structure, function, growth, evolution, distribution, and taxonomy AEROBIOLOGY, AGRICULTURE, ANATOMY, ASTROBIOLOGY, BIOCHEMISTRY, BIOENGINEERING, BIOINFORMATICS, BIOMATHEMATICSOR, MATHEMATICAL BIOLOGY, BIOMECHANICS, BIOMEDICAL RESEARCH, BIOPHYSICS, BIOTECHNOLOGY, BUILDING BIOLOGY, BOTANY, CELLBIOLOGY, CONSERVATION BIOLOGY, CRYOBIOLOGY, DEVELOPMENTAL BIOLOGY, ECOLOGY, EMBRYOLOGY, ENTOMOLOGY, ENVIRONMENTAL BIOLOGY, EPIDEMIOLOGY, ETHOLOGY, EVOLUTIONARY BIOLOGY, GENETICS, HERPETOLOGY, HISTOLOGY, ICHTHYOLOGY, INTEGRATIVE BIOLOGY, LIMNOLOGY, MAMMALOGY, MARINE BIOLOGY, MICROBIOLOGY, MOLECULAR BIOLOGY, MYCOLOGY, NEUROBIOLOGY, OCEANOGRAPHY, ONCOLOGY, ORNITHOLOGY, POPULATION BIOLOGY, POPULATION ECOLOGY, POPULATION GENETICS, PALEONTOLOGY, PATHOBIOLOGY OR PATHOLOGY, PARASITOLOGY, PHARMACOLOGY, PHYSIOLOGY, PHYTOPATHOLOGY, PSYCHOBIOLOGY, SOCIOBIOLOGY, STRUCTURAL BIOLOGY, VIROLOGY Stamp collecting Domain - Eukaryota Kingdom - Animalia Phylum - Chordata Vertebrata (Subphylum) Class - Mammalia Order - Primates Anthropoidea (Suborder) Hominoidea (Superfamily) Family - Hominidae Genus - Homo Species - sapiens Species • Defined as a group of living organisms consisting of similar individuals capable of exchanging genes or interbreeding http://www.nature.com/news/2011/110823/full/news.2011.498.html Evolution Connection between species Life evolved from “simple” into more complex systems Levels of complexity http://www.nature.com/scitable/topicpage/biological-complexity-and-integrative-levels-of-organization-468# Biology • CELL - basic unit of life • GENE - basic unit of heredity • EVOLUTION - driving engine https://en.wikipedia.org/wiki/Biology Molecular biology Cell size http://learn.genetics.utah.edu/content/cells/scale/ Eukaryotic cell https://bhavanajagat.files.wordpress.com/2012/02/cell-structure-and-functions.jpg DNA The Watson and Crick paper entitled “A Structure for Deoxyribose Nucleic Acid” written on the 2nd of April, 1953 and published in “Nature” on the 25th April http://www.ba-education.com/for/science/dnadiscovery.html Bioinformatics Biolog e c ien y m o C c s p Definitions of Bioinformatics • The term bioinformatics was coined in 1978 • ︎Bioinformatics is the application of information technology and computer science to the field of molecular biology • ︎The science of using / developing computer software and algorithms to record, analyse and merge biologically related data • ︎Using computer technology to manage large amounts of biological data • ︎Bioinformatics involves the use of techniques including applied mathematics, informatics, statistics, computer science, artificial intelligence, chemistry, and biochemistry to solve biological problems usually on the molecular level http://www.google.com/search?q=define%3ABioinformatics Definitions of Bioinformatics • The collection, organisation, storage, analysis, and integration of large amounts of biological data using networks of computers and databases • ︎Bioinformatics involves the integration of computers, software tools, and databases in an effort to address biological questions • ︎In summary, the use of computer science to solve biological problems http://www.google.com/search?q=define%3ABioinformatics Bioinformatic focus ANALYSIS AND INTERPRETATION OF VARIOUS TYPES OF BIOLOGICAL DATA INCLUDING: NUCLEOTIDE AND AMINO ACID SEQUENCES, PROTEIN DOMAINS, AND PROTEIN STRUCTURES. http://bip.weizmann.ac.il/course/introbioinfo/lecture1/ Bioinformatic focus DEVELOPMENT OF NEW ALGORITHMS AND S TAT I S T I C S W I T H W H I C H T O A C C E S S B I O L O G I C A L I N F O R M AT I O N , S U C H A S RELATIONSHIPS AMONG MEMBERS OF LARGE DATA SETS. http://www.nature.com/msb/journal/v3/n1/images/msb4100163-f4b.jpg http://bip.weizmann.ac.il/course/introbioinfo/lecture1/ Bioinformatic focus DEVELOPMENT AND IMPLEMENTATION OF TOOLS THAT ENABLE EFFICIENT ACCESS AND MANAGEMENT OF DIFFERENT TYPES OF INFORMATION, SUCH AS VARIOUS DATABASES, INTEGRATED MAPPING INFORMATION http://www.jofwidata.com/images/database-design-development.jpg http://wolfson.huji.ac.il/expression/detective.jpg http://bip.weizmann.ac.il/course/introbioinfo/lecture1/ M.Alroy Mascrenghe Bioinformatic challenges • • Explosion of information • Need for faster, automated analysis to process large amounts of data • Need for integration between different types of information (sequences, literature, annotations, protein levels, RNA levels etc…) • Need for “smarter” software to identify interesting relationships in very large data sets Lack of “bioinformaticians” • Software needs to be easier to access, use and understand • Biologists need to learn about the software, its limitations, and how to interpret its results Examples of biological data Name the numbers Examples of biological data Central dogma of molecular biology http://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/nucacids.htm Central dogma of molecular biology http://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/nucacids.htm Central dogma of molecular biology http://www.uic.edu/classes/bios/bios100/lectures/centraldogma.jpg Examples of biological data Genome • ︎Is the entirety of an organism’s hereditary information • ︎The genome includes both the genes and non-coding sequences of DNA/RNA • ︎In 1995, Haemophilus influenzae or was the first genome of a living organism to be sequenced in July 1995 • ︎1 830 140 base pairs of DNA in single circular chromosome that contains 1740 protein-coding gene, 58 transfer RNA genes and 18 other RNA genes Genome sizes Genome sizes Completely sequenced genomes Human genome DNA The Watson and Crick paper entitled “A Structure for Deoxyribose Nucleic Acid” written on the 2nd of April, 1953 and published in “Nature” on the 25th April http://www.ba-education.com/for/science/dnadiscovery.html Relative proportions (%) of bases in DNA DNA vs RNA http://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/Images3/dna_rna1.gif • ︎Raw DNA sequence • ︎Coding or non-coding • ︎Parses into genes • ︎4 nucleotide bases ATGC • >ENST00000539570 cdna:known chromosome:GRCh37:15:63889592:63893885:1 gene:ENSG00000259662 gene_biotype:protein_coding transcript_biotype:protein_coding ATGTGGCCACTGCTCACCATGCACATAACCCAGCTCAACCGGGAGTGCCTGCTGCACCTCTTCTCCTTCCTA GACAAGGACAGCAGGAAGAGCCTTGCCAGGACCTGCTCCCAGCTCCACGACGTGTTTGAGGACCCCGCA CTCTGGTCCCTGCTGCACTTCCGTTCCCTCACTGAACTCCAGAAGGACAACTTCCTCCTGGGCCCGGCACTC CGCAGCCTCTCCATCTGCTGGCACTCCAGCCGCGTGCAGGTGTGCAGCATTGAGGACTGGCTCAAGAGTG CCTTCCAGAGAAGCATCTGCAGCCGGCACGAGAGCCTGGTCAATGATTTCCTCCTCCGGGTGTGCGACAG GCTTTCTGCTGTGCGCTCCCCACGGAGGCGGGAGGCGCCTGCACCGTCCTCGGGGACTCCGATCGCCGTT GGACCGAAATCACCTCGGTGGGGAGGACCTGACCACTCGGAGTTCGCCGACTTGCGCTCGGGGGTGACG GGGGCCAGGGCTGCCGCGCGCAGGGGTCTGGGGAGCCTCCGGGCGGAGCGACCCAGCGAGACCCCGC CGGCTCCCGGAGTGTCCTGGGGACCGCCACCTCCAGGAGCCCCGGTGGTGATCTCGGTGAAGCAGGAGG AGGGGAAGCAGGGGCGCACGGGCAGAAGGAGCCACCGAGCCGCTCCTCCTTGCGGTTTTGCCCGCACG CGCGTCTGCCCGCCCACCTTTCCTGGGGCGGATGCGTTCCCGCAGTGA A Gene DNA • Protein coding genes cover only 1.5% of human genome • What does the rest do ? DNA • Simple sequence analysis • database searching • pairwise analysis… • Regulatory regions • Gene finding • Whole genome annotations • Comparative genomics (analysis between species and strains) http://bip.weizmann.ac.il/course/introbioinfo/lecture1/introbioinfo11.htm Examples of biological data Transcription http://www.youtube.com/watch?v=ztPkv7wc3yU Alternative splicing Types of RNA Types of RNA RNA • Splice variants • Tissue specific expression • Structure • Single gene analysis (various cloning techniques…) • Experimental data involving thousands of genes simultaneously • DNA chips, microarray and expression array analyses Examples of biological data From transcription to translation Translation Translation initiation Translation termination Amino acids - the protein building blocks (IUPAC nomenclature) http://biology.stackexchange.com/questions/19314/essential-amino-acid-codons http://mcmanuslab.ucsf.edu/node/276 Amino acids Codon Wheel T == U http://sciencewords.tumblr.com/post/78190871261/x-for-all-you-biochemists-this-should-help Protein sequence >sp|P48431|SOX2_HUMAN Transcription factor SOX-2 OS=Homo sapiens GN=SOX2 PE=1 SV=1 MYNMMETELKPPGPQQTSGGGGGNSTAAAAGGNQKNSPDRVKRPMNAFMVWSRGQRRKMA QENPKMHNSEISKRLGAEWKLLSETEKRPFIDEAKRLRALHMKEHPDYKYRPRRKTKTLM KKDKYTLPGGLLAPGGNSMASGVGVGAGLGAGVNQRMDSYAHMNGWSNGSYSMMQDQLGY PQHPGLNAHGAAQMQPMHRYDVSALQYNSMTSSQTYMNGSPTYSMSYSQQGTPGMALGSM GSVVKSEASSSPPVVTSSSHSRAPCQAGDLRDMISMYLPGAEVPEPAAPSRLHMSQHYQS GPVPGTAINGTLPLSHM Protein domains Protein domains Protein • Proteome of an organism • Mass spec • 2D structure • 3D structure • 4D structure (interactions) Summary GENE TRANSCRIPTION, TRANSLATION AND PROTEIN SYNTHESIS http://compbio.pbworks.com/f/central_dogma.jpg Central Dogma Bioinformatic questions • ︎To identify an unknown gene of interest • ︎Sequence matching • ︎Is there a match to known sequence in the database • ︎Which protein family does it match to • ︎How to identify more family members • ︎I have an similar structure, how to identify its potential ligands • ︎How to identify if my gene/protein is found present also in other species • ︎How can I identify genes that are inherited together in a specific region Bioinformatic questions • ︎I have to constructed a artificial gene, how do I design the primers, how to check if I have the right sequence? • ︎To know structure of an poorly expressed RNA sequence • ︎To identify the structure and function of a protein sequence • ︎To cluster protein sequences into families of related sequences and develop models • ︎To generate phylogenetic trees to identify the evolutionary relationships using similar proteins/DNA • ︎To identify which other proteins interacts with sequence of interest. Bioinformatic questions • ︎Find genes that have similar expression in specific conditions • ︎Find transcription factors that regulate specific genes • ︎Visualise different gene and protein networks ︎ • Describe the regulation of genes Practice session Make a script or two that take in either DNA or mRNA sequence and perform “translation” action i.e. output protein sequence (single letters) For example use http://www.ncbi.nlm.nih.gov/nuccore/NC_007362.1?from=22&to=1728&report=fasta