Download corso di Genomica 2010-2011 lezione 1-2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
corso di Genomica 2010-2011
lezione 15-16
• laurea magistrale Biotecnologia
Industriale
Giovedì 2 dicembre 2010
aula 6
orario : Martedì ore 14.00 - 16.00
Giovedì ore 13.00 - 15.00
D. Frezza
recupero lezione del 30 Nov. 2010
regalo per St.Nikolaos
Lunedì 13 Dicembre seminario sui nuovi metodi di
sequenziamento presso la facoltà di Medicina
ore 9:00 aula anfiteatro (piano -1)
seminario Lunedì 13 Dicembre
sequencing new generation
MARCO ISLAND, Fla. — Ion Torrent Systems unveiled an electronic sequencer last week
that reads DNA on a semiconductor chip by measuring the release of hydrogen ions as
nucleotides get incorporated by DNA polymerase.
The instrument will cost less than $50,000 and generate "hundreds of millions of bases" and
"millions" of highly accurate reads per run, each several hundred bases in length, according
to Jonathan Rothberg, the company's co-founder and CEO. Each run will take about an
hour and cost less than $500.
Speaking in front of a packed audience at the end of the last session of the Advances in
Genome Biology and Technology conference here, Rothberg said that the company, which
is based in Guilford, Conn., and San Francisco and has been operating quietly since its
foundation in 2007, plans to sell tens of thousands of the instruments to laboratories around
the world.
Although the system uses polymerase-based sequencing-by-synthesis chemistry — like
most existing second-generation sequencers — it is the first to do away with lasers,
cameras, or labels, relying entirely on electronic detection. "The machine is now a chip,"
Rothberg said.
association studies
Association studies using common allelic variants are
cheaper and simpler than the complete resequencing of
candidate genes, and have been proposed as a powerful
means of identifying the common variants that underlie
complex traits. In their simplest form, association studies
compare the frequency of alleles or genotypes of a particular
variant between disease cases and controls.
Alternative approaches include using family-based controls to
avoid the potential problem of population stratification.
case control studies
confronto di frequenze alleliche tra i soggetti patologici o con
1 fenotipo e le frequenze della popolazione di controllo
problemi di stratificazione:
come si devono scegliere le 2 popolazioni da controllare?
I paesi oltreoceano hanno popolazioni miste (melting pot) e
devono essere pesate le componenti della popolazione di
controllo di riferimento e non tutti gli africani o gli europei
sono uguali, cioè non basta prendere dei neri e dei bianchi
come riferimento.
Esiste una statistica apposta per questo:
tests di caso-controlli
una meta analisi fatta su dati di letteratura pubblicati:
(andatevi a vedere cosa è una meta-analisi)
Ethnic difference in patients with type 2 diabetes mellitus in inter-East Asian populations: a
systematic review and meta-analysis focusing on gene polymorphism.
Takeuchi M, Okamoto K, Takagi T, Ishii H. J Diabetes. 2009 Dec;1(4):255-62.
METHODS: Data sources included MEDLINE and EMBASE between
January 2001 and October 2008. We conducted a search for articles
containing minor allele frequency (MAF) in the gene polymorphisms of
peroxisome proliferator-activated receptor-γ (PPARG), inward-rectifying
potassium channel Kir6.2 (KCNJ11), Calpain 10 (CAPN10), and
transcription factor 7-like 2 (TCF7L2). The pooled odds ratio was
calculated by using a fixed-effects model with the Mantel-Haenszel
method after confirming statistical evidence of homogeneity across the
ethnicities using the Breslow-Day test.
candidate gene association limits
Candidate-gene association studies have identified many of the genes
that are known to contribute to susceptibility to common disease. Such
studies are greatly facilitated by using indirect LINKAGEDISEQUILIBRIUM (LD)-based methods.
However,candidate-gene studies rely on having predicted the identity of
the correct gene or genes, usually on the basis of biological hypotheses or
the location of the candidate within a previously determined region of
linkage.Even if these hypotheses are broad (for example, involving the
testing of all genes in the insulin-signalling pathway), they will, at best,
identify only a fraction of genetic risk factors, even for diseases in which
the pathophysiology is relatively well understood. When the fundamental
physiological defects of a disease are unknown, the candidate-gene
approach will clearly be inadequate to fully explain the genetic basis of the
disease.
genome wide association approach
definizione: studio di associazione causale di varianti
genetiche con una rassegna del genoma.
Non ci sono preconcetti sulla regione genomica delle varianti.
Il metodo sfrutta la forza dell’associazione senza avere una
ipotesi sull’identità del gene causale.
E’ un metodo non “bias” (sapete cosa vuol dire?)
cioè privo di una preferenza di scelta, anche in presenza di
evidenze convincenti contrarie sulla funzione e localizzazione
dei geni causali.
Deve essere un metodo capace di trovare appunto i geni che
potrebbero sfuggire ad una indagine del tipo gene-candidato
in cui si suppone l’associazione di un metabolismo ai suoi geni
correlati come predisponenti.
Qui è l’opposto: ricerca dei geni non correlabili sulla base delle
evidenze note.
base statistica per WGS
Estimating haplotype frequencies by combining data from large DNA
pools with database information.
We assume that allele frequency data have been extracted from several large DNA pools,
each containing genetic material of up to hundreds of sampled individuals. Our goal is to
estimate the haplotype frequencies among the sampled individuals by combining the
pooled allele frequency data with prior knowledge about the set of possible haplotypes.
Such prior information can be obtained, for example, from a database such as HapMap.
We present a Bayesian haplotyping method for pooled DNA based on a continuous
approximation of the multinomial distribution. The proposed method is applicable when the
sizes of the DNA pools and/or the number of considered loci exceed the limits of several
earlier methods. In the example analyses, the proposed model clearly outperforms a
deterministic greedy algorithm on real data from the HapMap database. With a small
number of loci, the performance of the proposed method is similar to that of an EMalgorithm, which uses a multinormal approximation for the pooled allele frequencies, but
which does not utilize prior information about the haplotypes. The method has been
implemented using Matlab and the code is available upon request from the authors.
Gasbarra D, Kulathinal S, Pirinen M, Sillanpää MJ.
University of Helsinki, Helsinki.
IEEE/ACM Trans Comput Biol Bioinform. 2011 Jan-Mar;8(1):36-44.
perchè Genome Wide ass. studies
“approaches to mapping the genes that underlie common disease and
quantitative traits fall into two categories: CANDIDATE-GENE studies,
which use either association or resequencing approaches,
and genome-wide studies, which include both LINKAGE
MAPPING and genome-wide association studies. The approaches
and their advantages and disadvantages are summarized in TABLE 1.
In this review,we discuss these approaches and present arguments as
to why genome wide association studies might be advantageous for
identifying the genetic variants associated with common
disease.
One fundamentally different approach, ADMIXTURE MAPPING**, is
not discussed here but has been described elsewhere7–10.”
** studi su campioni di popolazioni mescolate es: America (USA,
Brasile ecc.) il problema è il controllo di riferimento
Hap Map project
Differences in individual bases are by far the most common type of genetic
variation. These genetic differences are known as single nucleotide
polymorphisms, or SNPs (pronounced "snips"). By identifying most of the
approximately 10 million SNPs estimated to occur commonly in the human
genome, the International HapMap Project is identifying the basis for a large
fraction of the genetic diversity in the human species.
However, testing all of the 10 million common SNPs in a person's chromosomes
would be extremely expensive. The development of the HapMap will enable
geneticists to take advantage of how SNPs and other genetic variants are
organized on chromosomes. Genetic variants that are near each other tend to be
inherited together. For example, all of the people who have an A rather than a G at
a particular location in a chromosome can have identical genetic variants at other
SNPs in the chromosomal region surrounding the A. These regions of linked
variants are known as haplotypes (Figure 2).
figura 2 SNPs
la combinazione dei singoli polimorfismi forma un aplotipo
QuickTime™ e un
decompressore TIFF (Non compresso)
sono necessari per visualizzare quest'immagine.
didascalia fig.2 aplotipi
Figure 2: The construction of the HapMap occurs in three steps.
(a) Single nucleotide polymorphisms(SNPs) are identified in DNA
samples from multiple indivduals. (b)Adjacent SNPs that are
inherited together are compiled into "haplotypes." (c)"Tag" SNPs
within haplotypes are identified that uniquely identify those
haplotypes. By genotyping the three tag SNPs shown in this
figure, researchers can identify which of the four haplotypes
shown here are present in each individual.
finalità del prog. HapMap
In many parts of our chromosomes, just a handful (manciata) of
haplotypes are found in humans. [See The Origins of Haplotypes:
http://snp.cshl.org/originhaplotype.html] In a given population, 55 percent of
people may have one version of a haplotype, 30 percent may have
another, 8 percent may have a third, and the rest may have a variety of
less common haplotypes. The International HapMap Project is identifying
these common haplotypes in four populations from different parts of the
world. It also is identifying "tag" SNPs that uniquely identify these
haplotypes. By testing an individual's tag SNPs (a process known as
genotyping), researchers will be able to identify the collection of
haplotypes in a person's DNA. The number of tag SNPs that contain most
of the information about the patterns of genetic variation is estimated to
be about 300,000 to 600,000, which is far fewer than the 10 million
common SNPs.
formazione degli aplotipi
Over the course of many generations, segments of the
ancestral chromosomes in an interbreeding population are
shuffled (mescolati) through repeated recombination events.
Some of the segments of the ancestral chromosomes occur
as regions of DNA sequences that are shared by multiple
individuals (Figure 1). These segments are regions of
chromosomes that have not been broken up by
recombination, and they are separated by places where
recombination has occurred. These segments are the
haplotypes that enable geneticists to search for genes
involved in diseases and other medically important traits.
utilità degli aplotipi
The fossil record and genetic evidence indicate that all humans today are
descended from anatomically modern ancestors who lived in Africa about
150,000 years ago. Because we are a relatively young species, most of
the variation in any current human population comes from the variation
present in the ancestral human population. Also, as humans migrated out
of Africa, they carried with them part but not all of the genetic variation
that existed in the ancestral population. As a result, the haplotypes seen
outside Africa tend to be subsets of the haplotypes inside Africa. In
addition, haplotypes in non-African populations tend to be longer than in
African populations, because populations in Africa have been larger
through much of our history and recombination has had more time there
to break up haplotypes.
figura meiosi X over
Figure 1: This diagram shows two
ancestral chromosomes being
scrambled through recombination over
many generations to yield different
descendant chromosomes. If a
genetic variant marked by the A on the
ancestral chromosome increases the
risk of a particular disease, the two
individuals in the current generation
who inherit that part of the ancestral
chromosome will be at increased risk.
Adjacent to the variant marked by the
A are many SNPs that can be used to
identify the location of the variant.
QuickTime™ e un
de com press ore TIFF (No n compre sso)
so no n ece ssari per vi sual izza re qu est'imm agin e.
dispersione degli aplotipi
As modern humans spread throughout the world, the
frequency of haplotypes came to vary from region to
region through random chance, natural selection, and
other genetic mechanisms. As a result, a given
haplotype can occur at different frequencies in different
populations, especially when those populations are
widely separated and unlikely to exchange much DNA
through mating. Also, new changes in DNA sequences,
known as mutations, have created new haplotypes, and
most of the recently arising haplotypes have not had
enough time to spread widely beyond the population and
geographic region in which they originated.
applicazioni di HapMap
Once the information on tag SNPs from the HapMap is available,
researchers will be able to use them to locate genes involved in
medically important traits. Consider the researcher trying to find genetic
variants associated with high blood pressure. Instead of determining the
identity of all SNPs in a person's DNA, the researcher would genotype a
much smaller number of tag SNPs to determine the collection of
haplotypes present in each subject. The researcher could focus on
specific candidate genes that may be associated with a disease, or even
look across the entire genome to find chromosomal regions that may be
associated with a disease. If people with high blood pressure tend to
share a particular haplotype, variants contributing to the disease might
be somewhere within or near that haplotype.