Download LABORATORIO DI METODOLOGIE E TECNOLOGIE GENETICHE

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
LABORATORIO DI METODOLOGIE E TECNOLOGIE GENETICHE
Esercitazione di Bioinformatica
INTRODUZIONE:
La bioinformatica viene definita come una scienza interdisciplinare coinvolgente la
biologia, l’informatica, la matematica e la statistica per l’analisi di sequenze biologiche,
genomi e per la predizione della funzione e della struttura di macromolecole.
La bioinformatica nasce alla fine degli anni 70 con il concomitante sviluppo delle
tecnologie del DNA ricombinante e quindi la pubblicazione delle prime sequenze di acidi
nucleici.
Le tecnologie informatiche sono diventate importanti per decodificare, attraverso
l’implementazione di algoritmi che descrivono le regole biologiche, i messaggi criptati nelle
bio-sequenze: sequenze di DNA, RNA o proteine.
In questo contesto il progresso delle tecnologie informatiche, ha facilitato l’archiviazione di
grandi quantità di dati e la diffusione delle informazioni attraverso le reti telematiche.
L’esplosione di questa nuova disciplina ha avuto luogo con il sequenziamento di interi
genomi di molti organismi di procarioti ed eucarioti. Primo fra tutti il genoma umano, la cui
sequenza pressochè completa è stata messa a disposizione della comunità scientifica da
un Consorzio Pubblico Internazionale e dalla Celera Genomics nel febbraio 2001.
Tra le principali funzioni della bioinformatica rilevante è dunque quella di mettere a punto
dei sistemi idonei per collezionare e interrogare l’enorme mole di dati biologici
quotidianamente prodotti. Inoltre la bioinformatica tratta tutte le problematiche inerenti la
progettazione, l’implementazione e l’applicazione di metodi matematico-statistici rivolti alla
caratterizzazione funzionale delle sequenze biologiche, a studi sull’evoluzione molecolare
ed a studi strutturali degli acidi nucleici e delle proteine.
Questi ultimi aspetti, pur propriamente connessi con problematiche computazionali,
vengono spesso fatti rientrare nel settore della biologia computazionale piuttosto che nella
bioinformatica, che viene invece talvolta considerata una tecnologia a supporto della
ricerca piuttosto che una vera e propria disciplina.
OMIM (Online Mendelian Inheritance in Man Æ banca dati di malattie e disordini genetici):
è la versione elettronica on line del lavoro iniziato da Victor A. McKusick presso la “Johns
Hopkins University School of Medicine” agli inizi degli anni 60. E’ stato reso disponibile a
livello internazionale nel 1987 dalla NLM (National Library of Medicine, USA), che ne ha
curato la distribuzione, e in seguito distribuito elettronicamente dall’NCBI.
La definizione di OMIM, in qualità di singolo punto di accesso, dà un’ importante
informazione circa le sue potenzialità come database sulla genetica mendeliana, ricco
com’è di informazioni specifiche su fenotipi, genotipi, nonché sulla sequenza, struttura e
funzione genica. Nella sua prima edizione cartacea del 1966, all’epoca ancora MIM, il
numero di entries era limitato a 1487, per passare a circa 9000 alla dodicesima edizione
del 1997 e 11005 nel Dicembre del 1999. In realtà il passaggio MIM-OMIM segna una
svolta anche a livello di impostazione dell’opera, ciò è evidente già dalla definizione che
assume MIM nel corso degli anni. Dapprima, le edizioni stampate vengono considerate
come cataloghi di fenotipi autosomici dominanti, autosomici recessivi e X-linked; è solo nel
1992 che il sottotitolo a MIM diventa cataloghi di geni umani e malattie geniche. Già dal
1999 si assiste ad una crescita esponenziale del database che in data 1 Ottobre 2001
conta 13005 entries, e recentemente (settembre 2004) 15593.
ESERCITAZIONE 1: utilizzo guidato di OMIM
Searching OMIM
Finding information about genes, traits, and disorders
This tutorial serves as a basic introduction to using Online Mendelian Inheritance in Man (OMIM), a
large, searchable, current database of human genes, genetic traits, and hereditary disorders available
from the National Center for Biotechnology Information (NCBI) Web site.
There are 3 different interfaces available for accessing records in OMIM:
•
•
•
Gene Map - Lists genes in OMIM by cytogenetic location
Morbid Map - Alphabetically lists the genetic disorders in OMIM
Search - Provides options for searching by keyword(s)
Each OMIM record summarizes the published scientific research relating to a particular gene, trait, or
disorder. OMIM records link to the citations and abstracts of the sources for this research. If available,
links to related records in other NCBI databases also are provided.
Since OMIM was the source for the genes, traits, and disorders on each chromosome of the Human
Genome Landmarks poster, it is a key resource for finding more information about what is listed on the
poster. For disorders and traits, you can identify associated genes. For each gene, learn about its
normal biological function, and how mutations in the gene can keep it from carrying out this function.
Contents of this tutorial:
•
•
•
•
Using OMIM's Gene Map
Using OMIM's Morbid Map
Searching OMIM
Examining an OMIM record
Tutorial Tips
One option for following along with the steps described in this tutorial is to open two browser windows
at once (one for the tutorial and one for OMIM) and toggle between these two windows as needed.
Another option would be to print this tutorial out and then go to OMIM.
Using OMIM's Gene Map
What is Gene Map?
Gene Map is one of three different ways to access records in OMIM. With Gene Map, users can
browse a table of genes organized by cytogenetic map location starting with the p telomere of
chromosome 1, continuing through the q telomere of chromosome 22, and ending with genes from the
p telomere of X through the q telomere of Y. The genes, traits, and disorders listed on the Human
Genome Landmarks poster were selected from Gene Map.
Why would I use Gene Map?
Use Gene Map to see the order of genes on each chromosome. It is a simple format for seeing which
genes precede and follow other genes. Each Gene Map entry links to the OMIM record for a particular
gene.
How do I search Gene Map?
At the OMIM home page select Search Gene Map from the blue navigation menu on the left.
At the Gene Map home page, type hemochromatosis into the search box (as shown below) and click
the Find button.
Some Gene Map search tips:
To see an ordered listing of all genes mapped to a particular chromosome, simply
enter the chromosome number in the search box. For the X and Y chromosomes, be
sure to capitalize X or Y.
It is best to search Gene Map by chromosome number, chromosomal location, or
gene symbol. Gene Map's search feature does not support searching by phrase. To
search by disorder keyword, you can only enter a single term. For example, you
would need to enter alzheimer instead of alzheimer disease.
Gene Map searches will take you to the search term's first instance in the tabular file of genes and
display 20 entries at a time. Clicking on the Find Next button will take you to the search term's next
instance in Gene Map.
The first three results from the search for hemochromatosis are shown below.
1q21, HFE2A to 1q21, RFH1
Location Symbol
Title
<<Move Up Move Down>>
MIM #
1q21
HFE2A
Hemochromatosis,
602390
type 2A
1q21
IL6R
Interleukin-6
receptor
IRTA1
Immunoglobulin
superfamily
receptor
translocationassociated gene 1
1q21
Disorder
Comments Method Mouse
between
Hemochromatosis,
D1S442 and
type 2A (2)
D1S2347
Fd
147880
IL6R-like
REa, A
gene on chr.9
605876
fused with
IGHA1 in
multiple
myeloma
REc
The first occurrence of the search term hemochromatosis in Gene Map's tabular file is for HFE2A, a
gene on chromosome 1 associated with a type of juvenile hemochromatosis. This is not the most
prevalent form of hemochromatosis.
Click the Find Next button beside the search box on Gene Map until you find the Hemochromatosis
gene with 6p21.3 as its location (see the results below). This is the gene for the most common type of
hemochromatosis.
6p21.3, HFE to 6p21.3, HSPA1A
Locatio Symbo
n
l
Title
<<Move Up Move Down>>
MIM
#
Disorder
HFE,
HLAH,
HFE1
Hemochromatos
Hemochromatosis gene 23520 is (3); Porphyria
0
variegata,
176200 (3)
6p21.3
MHC
MAJOR
HISTOCOMPATIBILI
TY COMPLEX
6p21.3
HLAA
Major
histocompatibility
complex, class I, A
6p21.3
Comment Metho
Mouse
s
d
LD, F
class I
distal to
class II
14280
0
13(Mr
2, Hfe)
F, S,
A, RE,
Ch, D,
Fd
F
17(H2)
Fields of each entry in Gene Map:
Location - The cytogenetic map location of each gene. For the location
6p21.3, 6 is the chromosome number, p indicates the short arm of the
chromosome, and 21.3 is the number assigned to a particular band on a
chromosome. When chromosomes are stained in the lab, light and dark
bands appear, and each band is numbered. The higher the number, the
farther away the band is from the centromere. The location field of each
entry in Gene Map links to NCBI's Map Viewer tool, where you can view
chromosome and gene maps.
Symbol - The official symbol for each gene and other symbols associated with the gene. In most cases, the
first symbol listed is the official, unique symbol for the gene that has been approved by the HUGO Gene
Nomenclature Committee.
Title - The complete name of a gene.
MIM# - The unique six-digit number assigned to each record in OMIM. The MIM# in each Gene Map entry
links to a gene's full record in OMIM.
Disorder - Names of disorders that have been linked to a particular gene. If a disorder has its own record in
OMIM, a link to that record is provided.
Comments - Additional gene information. Some comments may point out similarities or differences a gene
has with other genes.
Method - Symbols that represent the different methods used to map a particular gene. This field would be
most meaningful to scientists. For explanations of these symbols, click on the Method link at the top of this
column.
Mouse - The cytogenetic location of the mouse ortholog (a similar sequence that is present in the mouse
genome). The mouse map location links to the Mouse Genome Database.
return to top
Using OMIM's Morbid Map
What is Morbid Map ?
Morbid Map is a table of all the genetic disorders featured in OMIM.
Why would I use OMIM's Morbid Map?
Use Morbid Map to browse an alphabetical listing of human disorders. Find symbol(s) for the gene(s)
associated with each disorder, the cytogenetic location of each disorder gene, and links to OMIM
records.
How do I search Morbid Map?
At the OMIM home page select Search Morbid Map from the blue navigation menu on the left.
At the Morbid Map home page, type hemochromatosis into the search box (as shown below) and click
the Find button.
Some Morbid Map search tips:
It is best to search Morbid Map by gene symbol or disorder name. While Gene Map
does not support phrase searching, Morbid Map does. With Morbid Map you can
search for multiple word disorder names like cystic fibrosis or Duchenne
muscular dystrophy. Morbid Map does not recognize non-alphanumeric
characters such as dashes, commas, punctuation marks, or apostrophes.
When you search Morbid Map, you will be taken to the search term's first instance in the tabular file of
disorder names. Twenty entries are displayed at a time. Clicking on the Find Next button will take you
to the search term's next instance.
The first 10 results from the Morbid Map search for hemochromatosis are shown below.
Disorder
Symbol(s)
OMIM
Location
Hemochromatosis (3)
HFE, HLA-H, HFE1
235200 6p21.3
Hemochromatosis, juvenile, 602390 (3)
HAMP, LEAP1, HEPC, HFE2 606464 19q13
Hemochromatosis, type 2A (2)
HFE2A
602390 1q21
Hemochromatosis, type 3, 604250 (3)
TFR2, HFE3
604720 7q22
Hemochromatosis, type 4, 606069 (3)
SLC11A3, FPN1, IREG1,
HFE4
604653 2q32
Hemodialysis-related amyloidosis (1)
B2M
109700 15q21-q22
Hemoglobin H disease (3)
HBA2
141850
Hemolytic anemia due to ADA excess (1)
ADA
102700 20q13.11
Hemolytic anemia due to G6PD deficiency
(3)
G6PD, G6PD1
305900 Xq28
Hemolytic anemia due to PGK deficiency
(3)
PGK1, PGKA
311800 Xq13
16pterp13.3
Fields of each entry in Morbid Map:
Disorder - The complete name of each disorder in OMIM. If there are separate OMIM records for a disorder
and its associated gene(s), this field will contain the link to the disorder's OMIM record.
Symbol(s) - The official symbol for each gene and other symbols associated with the gene. In most cases, the
first symbol listed is the official, unique symbol for the gene that has been approved by the HUGO Gene
Nomenclature Committee.
OMIM - The unique six-digit number assigned to each record in OMIM. The MIM number in this field links to
the OMIM record for the gene associated with each disorder.
Location - The cytogenetic map location of the gene associated with a particular
disorder. For the location 6p21.3, 6 is the chromosome number, p indicates the short
arm of the chromosome, and 21.3 is the number assigned to a particular band on a
chromosome. When chromosomes are stained in the lab, light and dark bands
appear, and each band is numbered. The higher the number, the farther away the
band is from the centromere.
Notice that hemochromatosis has more than one entry because there is more than one type of
hemochromatosis, each associated with a different gene. For example, the most common form of
hereditary hemochromatosis is associated with the HFE gene on chromosome 6, while juvenile
hemochromatosis is associated with the HAMP gene on chromosome 19.
Although the most common type of hereditary hemochromatosis is caused by a defect in a single gene
(HFE), most hereditary diseases are multigene disorders (disorders caused by mutations in more than
one gene). Colon cancer, like other cancers, is a multigene disorder. When you search for colon
cancer in Morbid Map, there will be several different entries for colon cancer where each entry
corresponds to a different gene that has been linked to the development of colon cancer.
It is important to understand that just because a person has a variant form of a gene that has been
linked to the development of a genetic disorder does not necessarily mean that that person will develop
the disorder. Other genes, as well as a variety of genetic and environmental factors are involved in the
development of most genetic disorders.
return to top
Searching OMIM
Searching by keyword is the way most users find records in OMIM. OMIM provides three different
levels of searching: basic, advanced, and complex Boolean. Basic searching is done by simply typing
text into the search box at the top of the OMIM home page. Advanced searching involves the use of
Limits, Preview/Index, and History options available below the OMIM search box. With these options
users can specify which fields of an OMIM record to search, browse the index of a particular field, or
combine different searches.
The most powerful option for searching OMIM is the complex Boolean option. Rather than selecting
search fields and other criteria from the Limits page, complex Boolean searching involves the use of a
command language to limit searches to specific fields. By adding search field qualifiers in square
brackets to each search term and combining terms using Boolean operators (OR, AND, or NOT), a
user can execute a much more specific search in a single step. See NCBI's Entrez Help for more
information about Boolean operators.
This section of the tutorial will demonstrate how to use some of NCBI's search field qualifiers to design
more effective search strategies in OMIM.
While hemochromatosis from chromosome 6 of the Human Genome Landmarks poster has been
selected for use in this tutorial, the same steps can be followed for any disorder listed on the poster. If
you do not have a printed copy of the Human Genome Landmarks poster, use the online version to
select another disorder of interest.
Basic searching
A common assumption made by many Web users is that all they have to do to find the information they
need is type a few key words into the search box and click a button to submit the search. Unfortunately,
this does not always produce the best results. Let's see how the results from basic term searching
differ from the results of a targeted search using field qualifiers.
Type hemochromatosis into the search box at the top of the OMIM home page, and click Go to
submit your search.
This search returns 46 results. Which result is the one you want? By simply searching for
"hemochromatosis," OMIM returns all results that contain "hemochromatosis" anywhere within a
record. "Hemochromatosis" could be in the record title or just mentioned briefly in the text of a record.
Searching with field qualifiers
All of the genes, disorders and traits listed on the Human Genome Landmarks (HGL) poster were taken
from the title fields of OMIM records. The field qualifier for the title field is [TI] or [TITL]. Since we
selected our disorder from the HGL poster, we also know that hemochromatosis is found on
chromosome 6. The field qualifier for specifying a particular chromosome is [CH] or [CHR].
To use a field qualifier in your search, simply add the qualifier to the end of your search term. For
example, to search for hemochromatosis on chromosome 6 enter hemochromatosis[TI] AND 6[CHR]
as shown in the screenshot below. Be sure to capitalize any Boolean operator (AND, OR, and NOT)
you use in your search statements. Click Go to submit your search.
The search should return only one result.
Clicking on the MIM number *235200 opens the full OMIM record for hemochromatosis, which is
examined in the next section of this tutorial.
For more information about searching with field qualifiers, see the search fields section of OMIM Help.
return to top
Examining an OMIM record
The OMIM record for hemochromatosis should look like the screenshot below.
Let's examine the record a little more closely.
•
Each record features a blue navigation menu on the left with quick links to different sections
within the record.
•
Each OMIM record is assigned a unique six-digit MIM number located at the top of each entry.
Clicking on the MIM number link will open the record up in a simpler, frame-free format that is
more suitable for printing. For a description of what the asterisk in the MIM number represents,
see OMIM FAQs.
•
Below the MIM number, you will find the disorder or gene name and the official gene symbol.
Since hemochromatosis is a simple disorder caused by mutations in only one gene, the official
gene symbol is included with the disorder name at the top of the record. For complex genetic
disorders, such as breast cancer, the official symbols of genes linked to the disorder will be
identified in the text of the record.
For hemochromatosis, the gene is named for the disorder to which it is linked. The gene that
causes hemochromatosis is called the "hemochromatosis gene." This is misleading because it
implies that the function of this gene is to cause hemochromatosis. In fact, the disorder only
develops if an individual has two copies of a mutated version of this gene. The gene in its
normal, non-mutated form codes for a protein that is involved with cellular uptake of iron.
The official gene symbol, which is HFE for hemochromatosis, serves as a unique identifier for a
gene. To be "official" a gene symbol must have been approved by the HUGO Gene
Nomenclature Committee.
If you want to search OMIM by gene symbol use the Gene Name search field [GN] or [GENE].
For example, an alternative method for searching for the hemochromatosis gene by symbol
would be to enter HFE[GN] into the search box.
*The gene symbol is especially useful when searching other databases (such as
sequence, genome-mapping, and structure databases) for gene-specific information.
•
The gene map locus describes where a gene can be found on a chromosome. For the gene
locus 6p21.3, 6 is the chromosome number, p indicates the short arm of the chromosome, and
21.3 is the number assigned to a particular band on a chromosome. The gene map locus links
to the OMIM's Gene Map.
•
The amount of text within an OMIM record varies depending upon what is known about a
particular gene, disorder, or trait. Since hemochromatosis is well studied, there is a lot of
information about this disorder and its gene. Some of the different types of information that may
be included in an OMIM record are: disorder description, nomenclature, clinical features,
heterogeneity, mapping, biochemical features, genotype/phenotype correlations, animal
models, and several others.
•
Although it is not a part of every OMIM record, another important part of many records is the
ALLELIC VARIANTS section. This section typically describes some of the most common
mutations associated with the development of disorders.
•
Some other features of each OMIM record are the references (with links to article citations and
abstracts in MEDLINE), a list of contributors, creation date, and edit history (to see when the
entry was last updated).
The length of each OMIM record depends on how much information pertaining to a particular gene or
disorder has been published and how much has been reviewed by OMIM staff. For example, the OMIM
entry for the HFE gene is more than 50 printed pages long, while an OMIM entry for another condition
that researchers know little about may only be 1 or 2 pages long.
ESERCITAZIONE 2: utilizzo “libero” di OMIM (Online Mendelian Inheritance in Man Æ
banca dati di malattie e disordini genetici).
Scegliere una malattia a carattere genetico ed utilizzare OMIM seguendo i suggerimenti
della precedente esercitazione.
Reperire più informazioni possibili (ad es. geni coinvolti, localizzazione cromosomica,
frequenza nella popolazione, eventuale cura, etc)