Download Objective: This laboratory will serve as an introduction to

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Copy-number variation wikipedia , lookup

Point mutation wikipedia , lookup

Community fingerprinting wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
April 5, 2006
Bio/CS-251
Creating Phylogenetic Trees
With NCBI BLAST and ClustalW
Laboratory 9
Objective: This laboratory will serve as an introduction to Phylogenetic Analyses. This
is the scientific procedure that allows you to make intelligent hypotheses about the
evolutionary history of a group of organisms or sequences. We will begin with the
identification of one gene and then look for related orthologs using BLAST. After
choosing a group of ortholog sequences, we will use ClustalW to compute ‘distances’
between the sequences and using a neighbor joining technique, and create a possible
phylogenetic tree.
We will also in this lab encourage you to develop independence in your investigative
techniques in preparation for your larger project which will be assigned for the end of the
course.
Laboratory manual: for this Lab, you will refer to Bioinformatics for Dummies (BFD),
pp. 382-397.
The target gene: In this investigation we will consider various homologs of the pyruvate
kinase gene. As usual we will begin with the E.coli pyruvate kinase gene.
Activity 1: Go to the pubmed portion of the ncbi.nlm.nih.gov web site and look up some
reference on pyruvate kinase and give a brief, one sentence description of the function of
this gene.
Assembling our cohort of genes: This portion of the lab will be the most time
consuming part of the activity. We will differ from the procedure followed in lab 6 as far
as accumulating our file of FASTA sequences.
Go to www.ncbi.nlm.nih.gov/entrez and choose “gene” in the Search box and so we all
have the same E.coli pyruvate kinase enter the accession number AAC74746 in “Search
for” box at the top of the page . When you arrive at the default view of the information
about this gene that contains the protein sequence, temporarily switch the view to FASTA
and display the amino acid sequence in FASTA format. Open a WORD document and
paste all lines of this display into the WORD document. You will be pasting 16 more
sequences into this document.
Activity 2: Erase the description of the amino acid sequence and replace this line with:
>E. coli.
Press the Back button and return to your data page about the gene. Copy the protein
sequence and proceed to
www.ncbi.nlm.nih.gov/BLAST
Choose protein-to-protein BLAST and paste the sequence in the window. Go down to
the options portion of the window and choose to display 1000 sequences and 500
alignments. Now BLAST away!
Activity 3: Complete the following table using the results from BLAST. HINT to search
these results you can use the “Find” option under Edit in the web page pull down menu.
Enter the first part of the Latin name. You may, in a few cases, have to use “Find Next”,
i.e., repeat the search.
Species
Sallmonella
typhimurium
Yersinia
Pestis
Bacillus
Anthracis
Ames*
Nostoc sp.
Glycine max
Solanum
tuberosum
Aspergillus
Niger
Aspergillus
Nidulans
Agaricus
bisporus
Mus
Musculus
Rattus
norvegicus
Homo
Sapiens
Xenopus
laevis
Anopheles
gambiae
Drosophila
melanogaster
Gallus gallus
*
Type
Accession #
E-value
Bacterium
Bacterium
Bacterium
Blue-green
Alga
Green pea
Potato
Filamentous
Fungus
Filamentous
Fungus
Mushroom
fungus
Mouse
Rat
Human
African
Clawed Frog
Mosquito
Fruit Fly
Chicken
This is the strain of Anthrax that infected individuals via the mail after 9/11
%ID
% Similarity
Activity 4: Use either the Accession number or the reference number in the listing of the
sequences in the BLAST report to go to the appropriate page for each of the above
pyruvate kinase genes and paste the FASTA sequence into the WORD document. Once
again remove the first line(s) and type in substitute lines such as >Frog, >Mouse, etc. For
the Bacteria, Alga, and some fungi, use the Latin names to distinguish them. ClustalW
will pick up only the first word of each description, thus if you want to label the sequence
as Fruit Fly, write Fruit_Fly.
Save the WORD Document in your Lab 8 folder, but keep it handy for the next phase of
the lab.
Now you are ready to use ClustalW to create a Phylogenetic tree. The steps listed at the
bottom of Page 394 and on pages 395-397 BFD are very explicit. Follow these steps. If
you are unfamiliar with using ClustalW, refer to Chapter 9 in your manual and last
week’s lab.
Activity 5: Save a copy of the Multiple Sequence Alignment in your Lab 8 folder as a
Web Page (Complete).
Actvity 6: Save a copy of the ClustalW Guide Tree and also the Phylogenetic Tree by
pasting them into a WORD DOCUMENT. For the Phylogenetic Tree use the
Phylogram View of the tree. This gives you a better feeling for distance from the root of
the tree. In order to get a copy of the tree, center the part of the webpage containing the
tree in your browser window. Now simultaneously press the Alt key and the Print Screen
keys. This will copy that window to the clipboard. Go to the WORD document and
paste this image in the document. If you want to get more fancy you can eliminate the
Web page borders by first pasting the screen capture in Paint and cutting out just the Tree
and pasting it in the WORD document.
Discussion 7: Analyzing your Phylogenetic Tree.
Answer the following questions about phylogenetic trees:
1. What is a node? What does it represent?
2. What does it mean to say that the branch length is scaled in your tree?
3. Is your tree rooted or unrooted? What does this mean?
4. From your phylogenetic tree, identify two clades, and list the leaves, or OTUs, that
belong to each of the clades that you have chosen.
Answer the following questions about your tree:
5. Generally, how many different clade groups (groups of closely related sequences)
appear to occur in this dataset? List the species in each clade group. Do these clade
groups appear to make evolutionary sense?
6. Green plants versus fungi: Which group appears to be more closely related to animals
(flies, mosquitos, chickens, frogs, mammals)? How can you tell?
7. Do any of the sequences appear to represent an outgroup? What is meant by this
term?