* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download BIO/CS 251 Bioinformatics final project Spring 2006
Copy-number variation wikipedia , lookup
Non-coding DNA wikipedia , lookup
Human genome wikipedia , lookup
Public health genomics wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Genetic engineering wikipedia , lookup
Pathogenomics wikipedia , lookup
Gene therapy wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Gene desert wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Point mutation wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Minimal genome wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Genome (book) wikipedia , lookup
History of genetic engineering wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene expression programming wikipedia , lookup
Gene nomenclature wikipedia , lookup
Microevolution wikipedia , lookup
Protein moonlighting wikipedia , lookup
Genome editing wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Designer baby wikipedia , lookup
Helitron (biology) wikipedia , lookup
Genome evolution wikipedia , lookup
Gene expression profiling wikipedia , lookup
BIO/CS 251 Bioinformatics final project Spring 2006 Dr. James Genehunt in a newly released fungal genome Scientific objective: You will use bioinformatic approaches to identify, map, and analyze the genes contained in an uncharacterized chunk of the genome of a dangerous pathogenic fungus, Histoplasma capsulatum. H. capsulatum is a dimorphic fungus that can exist in either filamentous or yeastlike form. This means it can form cells in long, branching chains (filamentous) or can switch its lifestyle to grow as single, yeastlike cells. This is the form it takes when it establishes infections in the lungs or other tissues, where it grows and spreads, causing great damage. More information on H. capsulatum and photographs of the organism can be found here: http://botit.botany.wisc.edu/toms_fungi/jan2000.html http://www.doctorfungus.org/mycoses/human/histo/histo_index.htm http://www.doctorfungus.org/thefungi/histoplasma.htm http://www.doctorfungus.org/mycoses/human/histo/histoplamosis_c.htm You will be issued a 50,000 bp (50 kb) segment of the recently sequenced genome of H. capsulatum. This genome sequencing effort was performed by the Broad Institute at Massachusetts Institute of Technology (MIT), as part of the Fungal Genome Initiative (FGI): http://www.broad.mit.edu/annotation/fungi/fgi/ The initial sequence release for this genome was in September 2005. The H. capsulatum genome sequence is so new that it has not yet been annotated, meaning no one has systematically identified and mapped all of its genes. Therefore, you may be the first human to discover the genes in this stretch of “virgin” DNA. Project format: Your portal of entry into the H. capsulatum genome is the Histoplasma capsulatum Database at http://www.broad.mit.edu/annotation/fungi/histoplasma_capsulatum/ This site is designed in much the same way as the Aspergillus nidulans website that you worked with in Laboratory 9 (http://www.broad.mit.edu/annotation/fungi/aspergillus_nidulans/). Refer to this lab exercise for general information about navigating the H. capsulatum website, and for discovering genes, etc. You may choose one of the following 50 kb segments for your project: 1. 2. 3. 4. 5. 6. Supercontig 1.8, nt 205000 – 255000 Sc 1.2, 340000 – 390000 Sc 1.4, 60000 – 115000 Sc 1.7, 190000 – 240000 Sc 1.1, 2965000 – 3115000 Sc 1.7, 615000 – 665000 (rif1) (cdc7) (rad53) (rap1) (taz1) (nimO) Project objectives: A. Locate and identify all of the bona fide genes in the 50 kb stretch of H. capsulatum genomic DNA. Produce a scale map showing the following information: 1. Gene location and gene direction. 2. Number of exons composing each gene. 3. Identity, or probable identity of each gene. If the sequence is novel, list it as a novel hypothetical protein. B. For each gene, 1. Present the predicted amino acid sequence. If the gene is distributed into multiple exons, edit the files so that they are merged to form a single, full-length protein sequence. 2. An alignment of the edited full-length protein sequence with (1) its closest relative (ortholog) in Aspergillus nidulans, and (2) an ortholog that has a clear identity and function ascribed to it [(1) and (2) may be the same alignment, or in the case where the best alignment is to a protein of unknown function, show a second alignment to a protein of known function]. Make sure that the alignments contain the e-value + % identity and % similarity. 3. Identify any conserved domains, and briefly explain the nature and likely function of the conserved domain. 4. List the name or names of the protein encoded by the gene, and a brief 2-3 line description of the protein’s function. C. tBlastn the H. capsulatum genome with each gene that you discovered: 1. Determine whether each gene is unique, or whether it is paralogous with other members of a gene family. 2. Provide the output from each of the paralog searches, including alignments. D. Choose one gene family identified in C. above and use it to create a phylogram. What is the minimum number of gene duplication events leading to the chosen family of genes in H. capsulatum? E. Orthology search: Choose a newly discovered H. capsulatum gene that is unique, i.e., for which no similar or paralogous genes exist in H. capsulatum, but for which a blastp search of GenBank reveals orthologous genes from other species. Determine the approximate time of origin of your chosen gene by establishing the range of organisms in which it can be found. In other words, is this a gene that appears to fungal-specific? Present only in fungi and animals? Universal in eukaryotes but absent from prokaryotes? or universal to all life? Depending upon the level of conservation and degree of functional constraint, it may be necessary to perform iterative blastp searches, in other words you may need to use the H. capsulatum gene to obtain the orthologous worm protein, but then use the worm protein to find the human protein, etc. 1. Choose representative orthologs spanning the widest range of organisms possible, and present a multiple alignment of the H. capsulatum gene the chosen orthologs. 2. Create a phylogram that includes all of the chosen orthologs. F. Choose one conserved gene that has an ortholog in the budding yeast, Saccharomyces cerevisiae, and perform in silico microarray analysis of the yeast ortholog, using the microarrays available at the Saccharomyces Genome Database: http://www.yeastgenome.org/, as follows: 1. Search for the S. cerevisiae orthologs using the “Search SGD” box at http://seq.yeastgenome.org/cgi-bin/nph-blast2sgd, and then use the ‘GO annotation’ + ‘Function Junction’ to gather information about its function in budding yeast. This information will be located on the SGD BASIC INFORMATION page for your chosen yeast ortholog. 2. Genomic and proteomic analysis of one gene in Saccharomyces cerevisiae. Your entry point for each of these questions will be the SGD BASIC INFORMATION page for your chosen gene. 3. Is this gene essential in budding yeast? What is the phenotype of a systematic deletion, or null allele? 4. Expression analysis (DNA microarray analysis): in yeast, DNA microarray analysis is like performing 5000+ Northern blots simultaneously, on a surface no larger than a microscope slide. This allows one to assay the expression of 5000+ different genes in a single experiment. Use the SGD to analyze the expression of your chosen gene. Use ‘Functional Analysis’ to perform these analyses. Include in your answers the links to each microarray experiment. How does the expression of the gene vary under the following conditions that have been assayed for all or nearly all budding yeast genes? -- expression in response to alpha factor? (treatment of cells with alpha factor synchronizes them at G1 phase of the cell cycle). -- expression in response to agents that damage DNA. -- expression during diauxic shift (what is a diauxic shift?) -- expression in response to environmental changes -- expression during the cell cycle -- expression during sporulation (= meiosis) 5. Protein-protein interaction: does your protein interact with any other proteins in S. cerevisiae? -- use the ‘Interactions’ on the BASIC INFORMATION page, and examine the BIND, DIP, and GRID databases to learn if your protein has interacting partners. For each proteinprotein interaction, list the method used to detect the interaction (two-hybrid, affinity chromatography, synthetic lethality, etc.) -- Go to the following site: http://portal.curagen.com/pathcalling_portal/index.htm Click to ‘PATHCALLING’, then ‘YEAST DATABASE’, and enter your ‘Gene/Keyword’, and click enter. If an entry is found for the gene, click on the link under “__ entry(ies) found for that keyword”. This link will take you to a two-dimensional interaction map showing the various interactions between your protein and its interacting partners. -- print out the interaction map, and incorporate it into your final project report. Protein families involved in secretory function in Aspergillus nidulans: Your task is discover and characterize the following gene families according to their function and phylogeny, using the guidelines presented above: 1. Rab family GTP-binding proteins: these are involved in membrane trafficking, i.e., docking and fusion of transport vesicles and membranes 2. SNARE family of proteins: these interact with Rabs and with vesicles to facilitate membrane fusion events. 3. Mannose-6-phosphate receptors: these act as address labels to ensure that lysosomal enzymes synthesized in the ER and Golgi arrive at the correct destination (the lysosome). 4. KDEL-containing proteins: proteins carrying a C-terminal KDEL (Lys-Asp-Glu-Leu) address label are confined to the lumen of the Endoplasmic Reticulum. They include a diverse variety of enzymes and chaperones whose task is to modify and fold proteins that will eventually be secreted from the cell. KDEL proteins often hop a ride to the Golgi Apparatus, in which case these errant children are rounded up by KDEL receptor proteins, who escort them back to the ER. 5. ARF family of GTP-binding proteins: these proteins act as adaptors to facilitate the formation/pinching off of vesicles from the Golgi Apparatus. 6. Do filamentous fungi possess Clathrin-coated vesicles and Golgin proteins? 7. Identify and characterize Golgi-specific enzymes that function in protein glycosylation.