Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
LABORATORIO DI METODOLOGIE E TECNOLOGIE GENETICHE Esercitazione di Bioinformatica INTRODUZIONE: La bioinformatica viene definita come una scienza interdisciplinare coinvolgente la biologia, l’informatica, la matematica e la statistica per l’analisi di sequenze biologiche, genomi e per la predizione della funzione e della struttura di macromolecole. La bioinformatica nasce alla fine degli anni 70 con il concomitante sviluppo delle tecnologie del DNA ricombinante e quindi la pubblicazione delle prime sequenze di acidi nucleici. Le tecnologie informatiche sono diventate importanti per decodificare, attraverso l’implementazione di algoritmi che descrivono le regole biologiche, i messaggi criptati nelle bio-sequenze: sequenze di DNA, RNA o proteine. In questo contesto il progresso delle tecnologie informatiche, ha facilitato l’archiviazione di grandi quantità di dati e la diffusione delle informazioni attraverso le reti telematiche. L’esplosione di questa nuova disciplina ha avuto luogo con il sequenziamento di interi genomi di molti organismi di procarioti ed eucarioti. Primo fra tutti il genoma umano, la cui sequenza pressochè completa è stata messa a disposizione della comunità scientifica da un Consorzio Pubblico Internazionale e dalla Celera Genomics nel febbraio 2001. Tra le principali funzioni della bioinformatica rilevante è dunque quella di mettere a punto dei sistemi idonei per collezionare e interrogare l’enorme mole di dati biologici quotidianamente prodotti. Inoltre la bioinformatica tratta tutte le problematiche inerenti la progettazione, l’implementazione e l’applicazione di metodi matematico-statistici rivolti alla caratterizzazione funzionale delle sequenze biologiche, a studi sull’evoluzione molecolare ed a studi strutturali degli acidi nucleici e delle proteine. Questi ultimi aspetti, pur propriamente connessi con problematiche computazionali, vengono spesso fatti rientrare nel settore della biologia computazionale piuttosto che nella bioinformatica, che viene invece talvolta considerata una tecnologia a supporto della ricerca piuttosto che una vera e propria disciplina. OMIM (Online Mendelian Inheritance in Man Æ banca dati di malattie e disordini genetici): è la versione elettronica on line del lavoro iniziato da Victor A. McKusick presso la “Johns Hopkins University School of Medicine” agli inizi degli anni 60. E’ stato reso disponibile a livello internazionale nel 1987 dalla NLM (National Library of Medicine, USA), che ne ha curato la distribuzione, e in seguito distribuito elettronicamente dall’NCBI. La definizione di OMIM, in qualità di singolo punto di accesso, dà un’ importante informazione circa le sue potenzialità come database sulla genetica mendeliana, ricco com’è di informazioni specifiche su fenotipi, genotipi, nonché sulla sequenza, struttura e funzione genica. Nella sua prima edizione cartacea del 1966, all’epoca ancora MIM, il numero di entries era limitato a 1487, per passare a circa 9000 alla dodicesima edizione del 1997 e 11005 nel Dicembre del 1999. In realtà il passaggio MIM-OMIM segna una svolta anche a livello di impostazione dell’opera, ciò è evidente già dalla definizione che assume MIM nel corso degli anni. Dapprima, le edizioni stampate vengono considerate come cataloghi di fenotipi autosomici dominanti, autosomici recessivi e X-linked; è solo nel 1992 che il sottotitolo a MIM diventa cataloghi di geni umani e malattie geniche. Già dal 1999 si assiste ad una crescita esponenziale del database che in data 1 Ottobre 2001 conta 13005 entries, e recentemente (settembre 2004) 15593. ESERCITAZIONE 1: utilizzo guidato di OMIM Searching OMIM Finding information about genes, traits, and disorders This tutorial serves as a basic introduction to using Online Mendelian Inheritance in Man (OMIM), a large, searchable, current database of human genes, genetic traits, and hereditary disorders available from the National Center for Biotechnology Information (NCBI) Web site. There are 3 different interfaces available for accessing records in OMIM: • • • Gene Map - Lists genes in OMIM by cytogenetic location Morbid Map - Alphabetically lists the genetic disorders in OMIM Search - Provides options for searching by keyword(s) Each OMIM record summarizes the published scientific research relating to a particular gene, trait, or disorder. OMIM records link to the citations and abstracts of the sources for this research. If available, links to related records in other NCBI databases also are provided. Since OMIM was the source for the genes, traits, and disorders on each chromosome of the Human Genome Landmarks poster, it is a key resource for finding more information about what is listed on the poster. For disorders and traits, you can identify associated genes. For each gene, learn about its normal biological function, and how mutations in the gene can keep it from carrying out this function. Contents of this tutorial: • • • • Using OMIM's Gene Map Using OMIM's Morbid Map Searching OMIM Examining an OMIM record Tutorial Tips One option for following along with the steps described in this tutorial is to open two browser windows at once (one for the tutorial and one for OMIM) and toggle between these two windows as needed. Another option would be to print this tutorial out and then go to OMIM. Using OMIM's Gene Map What is Gene Map? Gene Map is one of three different ways to access records in OMIM. With Gene Map, users can browse a table of genes organized by cytogenetic map location starting with the p telomere of chromosome 1, continuing through the q telomere of chromosome 22, and ending with genes from the p telomere of X through the q telomere of Y. The genes, traits, and disorders listed on the Human Genome Landmarks poster were selected from Gene Map. Why would I use Gene Map? Use Gene Map to see the order of genes on each chromosome. It is a simple format for seeing which genes precede and follow other genes. Each Gene Map entry links to the OMIM record for a particular gene. How do I search Gene Map? At the OMIM home page select Search Gene Map from the blue navigation menu on the left. At the Gene Map home page, type hemochromatosis into the search box (as shown below) and click the Find button. Some Gene Map search tips: To see an ordered listing of all genes mapped to a particular chromosome, simply enter the chromosome number in the search box. For the X and Y chromosomes, be sure to capitalize X or Y. It is best to search Gene Map by chromosome number, chromosomal location, or gene symbol. Gene Map's search feature does not support searching by phrase. To search by disorder keyword, you can only enter a single term. For example, you would need to enter alzheimer instead of alzheimer disease. Gene Map searches will take you to the search term's first instance in the tabular file of genes and display 20 entries at a time. Clicking on the Find Next button will take you to the search term's next instance in Gene Map. The first three results from the search for hemochromatosis are shown below. 1q21, HFE2A to 1q21, RFH1 Location Symbol Title <<Move Up Move Down>> MIM # 1q21 HFE2A Hemochromatosis, 602390 type 2A 1q21 IL6R Interleukin-6 receptor IRTA1 Immunoglobulin superfamily receptor translocationassociated gene 1 1q21 Disorder Comments Method Mouse between Hemochromatosis, D1S442 and type 2A (2) D1S2347 Fd 147880 IL6R-like REa, A gene on chr.9 605876 fused with IGHA1 in multiple myeloma REc The first occurrence of the search term hemochromatosis in Gene Map's tabular file is for HFE2A, a gene on chromosome 1 associated with a type of juvenile hemochromatosis. This is not the most prevalent form of hemochromatosis. Click the Find Next button beside the search box on Gene Map until you find the Hemochromatosis gene with 6p21.3 as its location (see the results below). This is the gene for the most common type of hemochromatosis. 6p21.3, HFE to 6p21.3, HSPA1A Locatio Symbo n l Title <<Move Up Move Down>> MIM # Disorder HFE, HLAH, HFE1 Hemochromatos Hemochromatosis gene 23520 is (3); Porphyria 0 variegata, 176200 (3) 6p21.3 MHC MAJOR HISTOCOMPATIBILI TY COMPLEX 6p21.3 HLAA Major histocompatibility complex, class I, A 6p21.3 Comment Metho Mouse s d LD, F class I distal to class II 14280 0 13(Mr 2, Hfe) F, S, A, RE, Ch, D, Fd F 17(H2) Fields of each entry in Gene Map: Location - The cytogenetic map location of each gene. For the location 6p21.3, 6 is the chromosome number, p indicates the short arm of the chromosome, and 21.3 is the number assigned to a particular band on a chromosome. When chromosomes are stained in the lab, light and dark bands appear, and each band is numbered. The higher the number, the farther away the band is from the centromere. The location field of each entry in Gene Map links to NCBI's Map Viewer tool, where you can view chromosome and gene maps. Symbol - The official symbol for each gene and other symbols associated with the gene. In most cases, the first symbol listed is the official, unique symbol for the gene that has been approved by the HUGO Gene Nomenclature Committee. Title - The complete name of a gene. MIM# - The unique six-digit number assigned to each record in OMIM. The MIM# in each Gene Map entry links to a gene's full record in OMIM. Disorder - Names of disorders that have been linked to a particular gene. If a disorder has its own record in OMIM, a link to that record is provided. Comments - Additional gene information. Some comments may point out similarities or differences a gene has with other genes. Method - Symbols that represent the different methods used to map a particular gene. This field would be most meaningful to scientists. For explanations of these symbols, click on the Method link at the top of this column. Mouse - The cytogenetic location of the mouse ortholog (a similar sequence that is present in the mouse genome). The mouse map location links to the Mouse Genome Database. return to top Using OMIM's Morbid Map What is Morbid Map ? Morbid Map is a table of all the genetic disorders featured in OMIM. Why would I use OMIM's Morbid Map? Use Morbid Map to browse an alphabetical listing of human disorders. Find symbol(s) for the gene(s) associated with each disorder, the cytogenetic location of each disorder gene, and links to OMIM records. How do I search Morbid Map? At the OMIM home page select Search Morbid Map from the blue navigation menu on the left. At the Morbid Map home page, type hemochromatosis into the search box (as shown below) and click the Find button. Some Morbid Map search tips: It is best to search Morbid Map by gene symbol or disorder name. While Gene Map does not support phrase searching, Morbid Map does. With Morbid Map you can search for multiple word disorder names like cystic fibrosis or Duchenne muscular dystrophy. Morbid Map does not recognize non-alphanumeric characters such as dashes, commas, punctuation marks, or apostrophes. When you search Morbid Map, you will be taken to the search term's first instance in the tabular file of disorder names. Twenty entries are displayed at a time. Clicking on the Find Next button will take you to the search term's next instance. The first 10 results from the Morbid Map search for hemochromatosis are shown below. Disorder Symbol(s) OMIM Location Hemochromatosis (3) HFE, HLA-H, HFE1 235200 6p21.3 Hemochromatosis, juvenile, 602390 (3) HAMP, LEAP1, HEPC, HFE2 606464 19q13 Hemochromatosis, type 2A (2) HFE2A 602390 1q21 Hemochromatosis, type 3, 604250 (3) TFR2, HFE3 604720 7q22 Hemochromatosis, type 4, 606069 (3) SLC11A3, FPN1, IREG1, HFE4 604653 2q32 Hemodialysis-related amyloidosis (1) B2M 109700 15q21-q22 Hemoglobin H disease (3) HBA2 141850 Hemolytic anemia due to ADA excess (1) ADA 102700 20q13.11 Hemolytic anemia due to G6PD deficiency (3) G6PD, G6PD1 305900 Xq28 Hemolytic anemia due to PGK deficiency (3) PGK1, PGKA 311800 Xq13 16pterp13.3 Fields of each entry in Morbid Map: Disorder - The complete name of each disorder in OMIM. If there are separate OMIM records for a disorder and its associated gene(s), this field will contain the link to the disorder's OMIM record. Symbol(s) - The official symbol for each gene and other symbols associated with the gene. In most cases, the first symbol listed is the official, unique symbol for the gene that has been approved by the HUGO Gene Nomenclature Committee. OMIM - The unique six-digit number assigned to each record in OMIM. The MIM number in this field links to the OMIM record for the gene associated with each disorder. Location - The cytogenetic map location of the gene associated with a particular disorder. For the location 6p21.3, 6 is the chromosome number, p indicates the short arm of the chromosome, and 21.3 is the number assigned to a particular band on a chromosome. When chromosomes are stained in the lab, light and dark bands appear, and each band is numbered. The higher the number, the farther away the band is from the centromere. Notice that hemochromatosis has more than one entry because there is more than one type of hemochromatosis, each associated with a different gene. For example, the most common form of hereditary hemochromatosis is associated with the HFE gene on chromosome 6, while juvenile hemochromatosis is associated with the HAMP gene on chromosome 19. Although the most common type of hereditary hemochromatosis is caused by a defect in a single gene (HFE), most hereditary diseases are multigene disorders (disorders caused by mutations in more than one gene). Colon cancer, like other cancers, is a multigene disorder. When you search for colon cancer in Morbid Map, there will be several different entries for colon cancer where each entry corresponds to a different gene that has been linked to the development of colon cancer. It is important to understand that just because a person has a variant form of a gene that has been linked to the development of a genetic disorder does not necessarily mean that that person will develop the disorder. Other genes, as well as a variety of genetic and environmental factors are involved in the development of most genetic disorders. return to top Searching OMIM Searching by keyword is the way most users find records in OMIM. OMIM provides three different levels of searching: basic, advanced, and complex Boolean. Basic searching is done by simply typing text into the search box at the top of the OMIM home page. Advanced searching involves the use of Limits, Preview/Index, and History options available below the OMIM search box. With these options users can specify which fields of an OMIM record to search, browse the index of a particular field, or combine different searches. The most powerful option for searching OMIM is the complex Boolean option. Rather than selecting search fields and other criteria from the Limits page, complex Boolean searching involves the use of a command language to limit searches to specific fields. By adding search field qualifiers in square brackets to each search term and combining terms using Boolean operators (OR, AND, or NOT), a user can execute a much more specific search in a single step. See NCBI's Entrez Help for more information about Boolean operators. This section of the tutorial will demonstrate how to use some of NCBI's search field qualifiers to design more effective search strategies in OMIM. While hemochromatosis from chromosome 6 of the Human Genome Landmarks poster has been selected for use in this tutorial, the same steps can be followed for any disorder listed on the poster. If you do not have a printed copy of the Human Genome Landmarks poster, use the online version to select another disorder of interest. Basic searching A common assumption made by many Web users is that all they have to do to find the information they need is type a few key words into the search box and click a button to submit the search. Unfortunately, this does not always produce the best results. Let's see how the results from basic term searching differ from the results of a targeted search using field qualifiers. Type hemochromatosis into the search box at the top of the OMIM home page, and click Go to submit your search. This search returns 46 results. Which result is the one you want? By simply searching for "hemochromatosis," OMIM returns all results that contain "hemochromatosis" anywhere within a record. "Hemochromatosis" could be in the record title or just mentioned briefly in the text of a record. Searching with field qualifiers All of the genes, disorders and traits listed on the Human Genome Landmarks (HGL) poster were taken from the title fields of OMIM records. The field qualifier for the title field is [TI] or [TITL]. Since we selected our disorder from the HGL poster, we also know that hemochromatosis is found on chromosome 6. The field qualifier for specifying a particular chromosome is [CH] or [CHR]. To use a field qualifier in your search, simply add the qualifier to the end of your search term. For example, to search for hemochromatosis on chromosome 6 enter hemochromatosis[TI] AND 6[CHR] as shown in the screenshot below. Be sure to capitalize any Boolean operator (AND, OR, and NOT) you use in your search statements. Click Go to submit your search. The search should return only one result. Clicking on the MIM number *235200 opens the full OMIM record for hemochromatosis, which is examined in the next section of this tutorial. For more information about searching with field qualifiers, see the search fields section of OMIM Help. return to top Examining an OMIM record The OMIM record for hemochromatosis should look like the screenshot below. Let's examine the record a little more closely. • Each record features a blue navigation menu on the left with quick links to different sections within the record. • Each OMIM record is assigned a unique six-digit MIM number located at the top of each entry. Clicking on the MIM number link will open the record up in a simpler, frame-free format that is more suitable for printing. For a description of what the asterisk in the MIM number represents, see OMIM FAQs. • Below the MIM number, you will find the disorder or gene name and the official gene symbol. Since hemochromatosis is a simple disorder caused by mutations in only one gene, the official gene symbol is included with the disorder name at the top of the record. For complex genetic disorders, such as breast cancer, the official symbols of genes linked to the disorder will be identified in the text of the record. For hemochromatosis, the gene is named for the disorder to which it is linked. The gene that causes hemochromatosis is called the "hemochromatosis gene." This is misleading because it implies that the function of this gene is to cause hemochromatosis. In fact, the disorder only develops if an individual has two copies of a mutated version of this gene. The gene in its normal, non-mutated form codes for a protein that is involved with cellular uptake of iron. The official gene symbol, which is HFE for hemochromatosis, serves as a unique identifier for a gene. To be "official" a gene symbol must have been approved by the HUGO Gene Nomenclature Committee. If you want to search OMIM by gene symbol use the Gene Name search field [GN] or [GENE]. For example, an alternative method for searching for the hemochromatosis gene by symbol would be to enter HFE[GN] into the search box. *The gene symbol is especially useful when searching other databases (such as sequence, genome-mapping, and structure databases) for gene-specific information. • The gene map locus describes where a gene can be found on a chromosome. For the gene locus 6p21.3, 6 is the chromosome number, p indicates the short arm of the chromosome, and 21.3 is the number assigned to a particular band on a chromosome. The gene map locus links to the OMIM's Gene Map. • The amount of text within an OMIM record varies depending upon what is known about a particular gene, disorder, or trait. Since hemochromatosis is well studied, there is a lot of information about this disorder and its gene. Some of the different types of information that may be included in an OMIM record are: disorder description, nomenclature, clinical features, heterogeneity, mapping, biochemical features, genotype/phenotype correlations, animal models, and several others. • Although it is not a part of every OMIM record, another important part of many records is the ALLELIC VARIANTS section. This section typically describes some of the most common mutations associated with the development of disorders. • Some other features of each OMIM record are the references (with links to article citations and abstracts in MEDLINE), a list of contributors, creation date, and edit history (to see when the entry was last updated). The length of each OMIM record depends on how much information pertaining to a particular gene or disorder has been published and how much has been reviewed by OMIM staff. For example, the OMIM entry for the HFE gene is more than 50 printed pages long, while an OMIM entry for another condition that researchers know little about may only be 1 or 2 pages long. ESERCITAZIONE 2: utilizzo “libero” di OMIM (Online Mendelian Inheritance in Man Æ banca dati di malattie e disordini genetici). Scegliere una malattia a carattere genetico ed utilizzare OMIM seguendo i suggerimenti della precedente esercitazione. Reperire più informazioni possibili (ad es. geni coinvolti, localizzazione cromosomica, frequenza nella popolazione, eventuale cura, etc)