Download LabM3bioinformatics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Magnesium transporter wikipedia , lookup

Gene regulatory network wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Community fingerprinting wikipedia , lookup

Synthetic biology wikipedia , lookup

Promoter (genetics) wikipedia , lookup

QPNC-PAGE wikipedia , lookup

Protein wikipedia , lookup

Genetic code wikipedia , lookup

Gene expression wikipedia , lookup

Non-coding DNA wikipedia , lookup

Western blot wikipedia , lookup

Protein moonlighting wikipedia , lookup

List of types of proteins wikipedia , lookup

Protein adsorption wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Molecular ecology wikipedia , lookup

Homology modeling wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Proteolysis wikipedia , lookup

Protein structure prediction wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

History of molecular evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Biochemistry wikipedia , lookup

Molecular evolution wikipedia , lookup

Transcript
Medical Biochemistry and Molecular Biology Department
MEDICAL BIOCHEMISTRY
AND
MOLECULAR BIOLOGY DEPARTMENT
PRACTICAL GUIDE NOTES
ON
MOLECULAR BIOLOGY PRACTICAL COURSE
MOLECULAR BIOLOGY TECHNIQUES
BIOINFORMATICS
Medical Biochemistry and Molecular Biology Department
BIOINFORMATICS
ILO of the current topic:
By the end of this topic, the student will be able to:
Gather, organize and appraise information including the use of information technology where applicable.
What Is Bioinformatics?
Many definitions can apply, all about use of computers and software to store, analyze and
interpret Biological data.
 Bioinformatics is the rapidly developing area of computer science devoted to collecting,
organizing, and analyzing DNA and protein sequences.
 Bioinformatics can be defined as the interface between biological and computational science in
which this scientific field deals with the computational management of all kind of biological
information about genes and their products.
 Bioinformatics is the unified discipline formed from the combination of biology, computer
science, soft ware engineering, mathematics and molecular biology.
 Bioinformatics "The mathematical, statistical and computing methods that aim to solve
biological problems using DNA and amino acid sequences and related information. (Frank
Tekaia)
A Molecular Alphabet
 Most large biological molecules are polymers, ordered chains of simple molecules called
monomers
 All monomers belong to the same general class, but there are several types with distinct and
well-defined characteristics
 Many monomers can be joined to form a single, large macromolecule; the ordering of
monomers in the macromolecule encodes information, just like the letters of an alphabet.
 These Alphabets can be the nucleotide bases in case of nucleic acids (A, G,C, T or U), or the
amino acids in case of proteins (single letter abbreviations of amino acids as A, G, V, L, P,
etc.).
Genomic sequencing
 The genome of different living organisms contains many elements including gene coding for
proteins.
 Scientists are working to sequence and assemble the genomes of these organisms.
 The goal is to obtain a genomic sequence and to identify a complete set of genes that codes
for a protein or proteins.
 Vast amount of DNA sequence has already been determined and the pace at which new
sequences are characterized is continuously accelerating.
 Computers are necessary to store and distribute this enormous volume of data.
Bioinformatics Techniques:
Reviews of bioinformatics are most often technology centered, focusing on the techniques that
have evolved rapidly in this new discipline for evermore sophisticated analysis of sequences and
structures. As a consequence of large amount of data produced in the field of molecular biology,
most of the current bioinformatics project deals with structural and functional aspects of genes
and proteins.
 First, the data produced by the thousands of research teams all over the world are collected
and organized in a particular specialized data bases.
 In the next step, computational tools are needed to analyze the collected data in most
efficient manner.
 Computational tools were developed to integrate the information in new types of web
resources.
 By using these web sites, the molecular cell biologists throughout the world enter the
Different databases as genebank, protein database (PDB) etc.. (www.ncbi.nlm.nih.gov)
Medical Biochemistry and Molecular Biology Department
 Bioinformatics can be used to suggest the functions of newly identified genes and proteins.
As the proteins with similar functions contain homologus amino acid sequences that corresponds to
important functional domains in the three dimensional structure of the proteins, so the function of a
protein that is not been isolated often can be predicted based on the homology of its gene or cDNA
with DNA sequences encoding protein of known function. This is done by identifying and cloning
the gene encoded for this protein with unknown function and then comparing these newly derived
sequences with previously determined sequences stored in data banks to search for similarities,
called homologous sequences.
Protein-coding regions can be translated into amino acid sequences, which also can be compared.
Because of degeneracy in the genetic code, related proteins often exhibit more homology than the
genes encoding them.
 Computational programming used for searching sequence databases
As mentioned above, the discovery of sequence homology to a known protein or family of
proteins often provides the first clues about the function of a newly sequenced gene and as the DNA
and amino acid sequence databases continue to grow in size, they become increasingly useful in the
analysis of newly sequenced genes and proteins because of the greater chance of finding such
homology. There are a number of software tools for searching sequence databases but all use
some measure of similarity between sequences to distinguish biologically significant relationships
from chance similarities.
**Basic Local Alignment Search Tool (BLAST) Program:
BLAST is the most famous and friendly user
web based tool
(www.ncbi.nlm.nih/blast/blast.cgi) used for rapid searching of nucleotide and protein sequences,
that obtained from the DNA sequencer and the nucleic acid translation program.
It directly approximates alignments between the novel sequences (queries) and the previously
characterized genes (databases) that optimize a measure of local similarity, the maximal segment pair
(MSP) score
Sequences alignment provides a powerful way to compare novel sequences with previously
characterized genes. BLAST provides a method for rapid searching of nucleotide and protein
databases.
Bioinformatics Lab Activity
1.
2.
3.
4.
5.
6.
Go to
http://www.ncbi.nlm.nih.gov/
Choose:
Blast
Choose:
Human
Feed your sequence
Choose:
MegaBlast or BlastN
Run Blast and wait.
Normal Hemoglobin beta subunit mRNA
AUGGUGCAUCUGACUCCUGAGGAGAAGUCUGCCGUUACUGCCCUGUGGGGCAAGGUGAACGUGGAUGA
AGUUGGUGGUGAGGCCCUGGGCAGGCUGCUGGUGGUCUACCCUUGGACCCAGAGGUUCUUUGAGUCCU
UUGGGGAUCUGUCCACUCCUGAUGCUGUUAUGGGCAACCCUAAGGUGAAGGCUCAUGGCAAGAAAGUG
CUCGGUGCCUUUAGUGAUGGCCUGGCUCACCUGGACAACCUCAAGGGCACCUUUGCCACACUGAGUGAG
CUGCACUGUGACAAGCUGCACGUGGAUCCUGAGAACUUCAGGCUCCUGGGCAACGUGCUGGUCUGUGU
GCUGGCCCAUCACUUUGGCAAAGAAUUCACCCCACCAGUGCAGGCUGCCUAUCAGAAAGUGGUGGCUGG
UGUGGCUAAUGCCCUGGCCCACAAGUAUCACUAA
To translate:
http://expasy.org/tools/dna.html
Normal Hemoglobin beta protein sequence
Medical Biochemistry and Molecular Biology Department
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGA
FSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVASALA
HKYH
Hemoglobin S beta chain
GAG became GUG (GTG)
AUGGUGCAUCUGACUCCUGUGGAGAAGUCUGCCGUUACUGCCCUGUGGGGCAAGGUGAACGUGGAUGA
AGUUGGUGGUGAGGCCCUGGGCAGGCUGCUGGUGGUCUACCCUUGGACCCAGAGGUUCUUUGAGUCCU
UUGGGGAUCUGUCCACUCCUGAUGCUGUUAUGGGCAACCCUAAGGUGAAGGCUCAUGGCAAGAAAGUG
CUCGGUGCCUUUAGUGAUGGCCUGGCUCACCUGGACAACCUCAAGGGCACCUUUGCCACACUGAGUGAG
CUGCACUGUGACAAGCUGCACGUGGAUCCUGAGAACUUCAGGCUCCUGGGCAACGUGCUGGUCUGUGU
GCUGGCCCAUCACUUUGGCAAAGAAUUCACCCCACCAGUGCAGGCUGCCUAUCAGAAAGUGGUGGCUGG
UGUGGCUAAUGCCCUGGCCCACAAGUAUCACUAA
Glutamic 6 will be changed to Valine
Medical Biochemistry and Molecular Biology Department
MVHLTPVEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGA
FSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVASALA
HKYH
To Allign two sequences:
http://xylian.igh.cnrs.fr/bin/align-guess.cgi
Normal Hemoglobin beta protein sequence
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGA
FSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVASALA
HKYH
Hemoglobin S beta chain
MVHLTPVEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGA
FSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVASALA
HKYH
Normal Hemoglobin beta subunit mRNA
AUGGUGCAUCUGACUCCUGAGGAGAAGUCUGCCGUUACUGCCCUGUGGGGCAAGGUGAACGUGGAUGA
AGUUGGUGGUGAGGCCCUGGGCAGGCUGCUGGUGGUCUACCCUUGGACCCAGAGGUUCUUUGAGUCCU
UUGGGGAUCUGUCCACUCCUGAUGCUGUUAUGGGCAACCCUAAGGUGAAGGCUCAUGGCAAGAAAGUG
CUCGGUGCCUUUAGUGAUGGCCUGGCUCACCUGGACAACCUCAAGGGCACCUUUGCCACACUGAGUGAG
CUGCACUGUGACAAGCUGCACGUGGAUCCUGAGAACUUCAGGCUCCUGGGCAACGUGCUGGUCUGUGU
GCUGGCCCAUCACUUUGGCAAAGAAUUCACCCCACCAGUGCAGGCUGCCUAUCAGAAAGUGGUGGCUGG
UGUGGCUAAUGCCCUGGCCCACAAGUAUCACUAA
Hemoglobin S beta chain
GAG became GUG (GTG)
AUGGUGCAUCUGACUCCUGUGGAGAAGUCUGCCGUUACUGCCCUGUGGGGCAAGGUGAACGUGGAUGA
AGUUGGUGGUGAGGCCCUGGGCAGGCUGCUGGUGGUCUACCCUUGGACCCAGAGGUUCUUUGAGUCCU
UUGGGGAUCUGUCCACUCCUGAUGCUGUUAUGGGCAACCCUAAGGUGAAGGCUCAUGGCAAGAAAGUG
CUCGGUGCCUUUAGUGAUGGCCUGGCUCACCUGGACAACCUCAAGGGCACCUUUGCCACACUGAGUGAG
CUGCACUGUGACAAGCUGCACGUGGAUCCUGAGAACUUCAGGCUCCUGGGCAACGUGCUGGUCUGUGU
GCUGGCCCAUCACUUUGGCAAAGAAUUCACCCCACCAGUGCAGGCUGCCUAUCAGAAAGUGGUGGCUGG
UGUGGCUAAUGCCCUGGCCCACAAGUAUCACUAA