Download Human Genome Project

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Deoxyribozyme wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Segmental Duplication on the Human Y Chromosome wikipedia , lookup

Human genetic variation wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Oncogenomics wikipedia , lookup

Y chromosome wikipedia , lookup

Gene expression programming wikipedia , lookup

Point mutation wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Ridge (biology) wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Transposable element wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Karyotype wikipedia , lookup

Genetic engineering wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Public health genomics wikipedia , lookup

Gene expression profiling wikipedia , lookup

Neocentromere wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Genomic imprinting wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Chromosome wikipedia , lookup

X-inactivation wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Pathogenomics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

RNA-Seq wikipedia , lookup

Polyploid wikipedia , lookup

Metagenomics wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene wikipedia , lookup

Human Genome Project wikipedia , lookup

Human genome wikipedia , lookup

Minimal genome wikipedia , lookup

Helitron (biology) wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Designer baby wikipedia , lookup

History of genetic engineering wikipedia , lookup

Microevolution wikipedia , lookup

Genomics wikipedia , lookup

Genome (book) wikipedia , lookup

Genome editing wikipedia , lookup

Genome evolution wikipedia , lookup

Genomic library wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
HUMAN
GENOME
PROJECT
BASIC STRATEGY
How to determine the sequence of the roughly 3
billion base pairs of the human genome. Started in
1995.
Various side projects: genetic diseases, variations
between individuals, ethnic variation, comparison
to other species.
Strategy:
• 1. physical map relating specific DNA markers to the proper
chromosomal position.
• 2. Overlapping set of cloned DNAs (contigs)
• 3. sequencing and assembly
• 4. finding the genes in the sequence
• 5. annotation of gene function
GENETIC MAPPING
 Where and why genes are present inside chromosomes
 Simply means we need to locate genes in total genome
 A genetic map uses recombination, crossing over during
meiosis, to determine how frequently two genes (or markers) are
inherited together.
Genes
genotypes
phenotypes
Gene map
Linkage map
It tells you whether
chromosome
2 genes are close
or distantly related
No location
physical map
the presence of genes in
LINKAGE MAP
Genetic linkage is the tendency of genes that are located proximal
to each other on a chromosome to be inherited together during
meiosis.
Genes whose loci are nearer to each other are less likely to be
separated onto different chromatids during chromosomal
crossover, and are therefore said to be genetically linked.
In other words, the nearer two genes are on a chromosome, the
lower is the chance of a swap occurring between them, and the
more likely they are to be inherited together.
CHROMOSOME THEORY
OF LINKAGE
 Morgan, along with Castle formulated the chromosome theory of
linkage. It has the following postulates;
 1. Genes are found arranged in a linear manner in the
chromosomes.
 2. Genes which exhibit linkage are located on the same
chromosome.
 3. Genes generally tend to stay in parental combination, except in
cases of crossing over.
 4. The distance between linked genes in a chromosome
determines the strength of linkage. Genes located close to each
other show stronger linkage than that are located far from each
other, since the former are less likely to enter into crossing over.
However crossing over does not occur between linked genes in
every meiotic event, especially when the positions of the genes on
the chromosome are very near one another.
The frequency with which crossing over occurs between any two
linked genes is proportional to the distance between the loci along
the chromosome.
 1. At very small distances, crossover is very rare, and most
gametes are parental.
 2. As the distance between two genes increases, crossover
frequency increases. More recombinant gametes, fewer parental
gametes.
 3. When genetic loci are very far apart on the same chromosome,
crossing over nearly always occurs, and the
frequency of recombinant gametes approaches 50 percent.
WHAT IS MOLECULAR
MARKER?
DNA sequence used to mark a particular location on a
particular chromosomes.
GENETIC MARKERS
Modern genetic markers: SNPs
A genetic marker is a gene or DNA sequence with
a known location on a chromosome that can be
used to identify individuals or species.
It can be described as a variation (which may
arise due to mutation or alteration in the genomic
loci) that can be observed.
A genetic marker may be a short DNA sequence,
such as a sequence surrounding a single basepair change (single nucleotide polymorphism,
SNP), or a long one, like
What are they?
Variable sites in the genome
What are their uses?
Finding disease genes
Testing/estimating relationships
Studying population differences
Phenotype
Genotype
Brown eyes
BB or Bb
Blue eyes
bb
PHYSICAL MAPPING
 Cytogenetic mapping
A cytogenetic map is the visual appearance of a
chromosome when stained and examined under
a microscope.
Particularly important are visually distinct
regions, called light and dark bands, which give
each of the chromosomes a unique appearance.
This feature allows a person's chromosomes to
be studied in a clinical test known as a
karyotype, which allows scientists to look for
chromosomal alterations
PHYSICAL MAPS
A physical map determines where a
given DNA marker is located on the
DNA of the chromosome.
Genetic and physical maps are
(supposed to be) colinear—all the
genes appear in the same order in
both maps. But, distances are quite
different: there is very little
recombination in the centromeres, so
large DNA distances are very short
recombination distances.
Genetic maps using microsatellite
(SSR) markers were used to develop
physical maps: the appropriate SSR
sites were expected to be found on
the corresponding cloned DNA.
SEQUENCE TAGGED
SITES
Produced by sequencing RNA which in turn transcript from genes
RNA present Genes which are turned on in tissue
Its called “taq” because they are not really complete sequence of
genes, its only partially sequenced
SEQUENCE TAGGED SITES
a sequence tagged site (STS) is a short sequence that is unique in the
genome.
You obtain the sequence information from cloned DNA, and then
locate it in the genome.
Using PCR it is then possible to determine whether your STS is
present in any other clone or cell line.
Obtaining STS: sequencing the ends of large cloned DNAs (BACs or
YACs, for example).
Uniqueness: use the cloned DNA from the STS as a probe on a
Southern blot of genomic DNA: if the STS is unique, only 1 band will
hybridize.
Repetitive DNA is very common in the human genome, and many DNA
sequences are not unique.
A good source of unique DNA is EST clones: cDNA made from
messenger RNA.
SOMATIC CELL HYBRIDS
Human and mouse (or hamster) cultured cells can be
fused together using polyethylene glycol.
• The resulting fused cell is a heterokaryon: it has 2 nuclei from
different species.
• If the heterokaryon undergoes mitosis, the nuclei fuse.
• Human chromosomes are unstable in a mixed nucleus, and
most of them are randomly lost. The mouse chromosomes all
stay.
• Different cell lines can be established that contain different
combinations of human chromosomes
• You can identify which human chromosomes remain using
chromosome banding techniques.
A good way to determine which chromosome a DNA
sequence is on. Sometimes also for gene products
or phenotypes.
RADIATION HYBRIDS
Standard somatic cell fusions contain entire
human chromosomes. To locate a gene more
closely, you need to use chromosome fragments.
Start by irradiating human cells with a controlled
dose of X-rays: chromosomes break up. Then,
fuse the cells to mouse cells. The human
chromosome fragments get integrated into the
mouse chromosomes.
Create a panel of mouse/human hybrid cell lines.
• The current standard panels contain about 100 cell
lines.
• Each line contains about 32% of the human genome
• Average size of human genome fragment = 25 kbp
• More radiation = smaller fragments
Mapping: the hybrid cell lines contain random
human chromosome fragments, but closely linked
sites are usually in the same cell line (same basic
principle as recombination mapping).
• Until you have located some of the markers on the
chromosomes, radiation hybrid mapping only gives you
information about whether any two sequences are close
together on the chromosome.
CONTIGS
A contig is a set of partially overlapping
clones, a contiguous set of clones. No
gaps between them.
Contigs allow you to build up the
sequence of the chromosome over
much larger regions than any single
clone.
The first reasonably complete physical
map of the human genome involved
contigs generated by YACs (yeast
artificial chromosomes).
Initially, you have a collection of clones
with no information about how they are
ordered on the chromosome.
Contigs are built up by using PCR to
identify unique sequences (STS or EST)
on each clone, and then looking for
overlaps between the clones.
SEQUENCING STRATEGY
Once a contig map of the genome was
obtained, it was necessary to sequence
each individual clone.
Most of the actual human genome
sequencing was done on BAC clones,
which are less prone to rearrangement
than YAC clones. BACs are about 100200 kbp long.
Large clones are generally sequenced
by shotgun sequencing: The large
cloned DNA is randomly broken up into
a series of small fragments ( less than
1 kb). These fragments are cloned and
sequenced. A computer program then
assembles them based on overlaps
between the sequences of each clone.
To ensure that every bit has been
covered, you need to sequence random
clones until you have covered each
spot 5-10 times on average.
WHOLE GENOME SHOTGUN
SEQUENCING
Why bother with creating a large scale physical map: all that YAC and BAC
cloning, radiation hybrids, STS comparisons, etc? Why not just fragment
the whole genome into 1 kb pieces, sequence them all, and let the
computer assemble the whole genome?
In practice, the genome is cloned into large fragments first, and then each
large fragment is broken up for shotgun sequencing. But, the large
fragments are not ordered: no physical map or set of contigs is created.
Requires a lot of overlapping coverage
Also requires good software.
Very successful for prokaryotic genomes (10 Mbp or less).
• but the human genome is 300 times larger
Big problem: repeat sequence DNA, which is everywhere, and
especially near the centromere. To find overlaps between clones, you
need unique regions.
It remains unclear whether whole genome shotgun sequencing will
work if there is no other information available to provide order. It has
not been widely adopted for eukaryotic projects (so far).
EST (EXPRESSED
SEQUENCE TAG):
A unique stretch of DNA within a coding region of a gene
that is useful for identifying full-length genes and serves as
a landmark for mapping.
An EST is a sequence tagged site (STS) derived from cDNA.
An STS is a short segment of DNA which occurs but once in
the genome and whose location and base sequence are
known. STSs are detectable by the polymerase chain
reaction (PCR), are helpful in localizing and orienting
mapping and sequence data, and serve as landmarks in the
physical map of the genome.
EXPRESSED-SEQUENCE TAGS
(ESTS)
are cDNA sequences that have been sequenced from either
the 5’ or 3’ ends.
They may contain all or part of a particular cDNA coding
sequence,
and are useful for identifying unknown genes, mapping their
positions within a genome,
and as a potential source for genetic material when a fulllength cDNA is not available for a specific gene of interest.
GENE DETECTION
the best evidence that a given DNA sequence
is expressed is to find an EST (cDNA copy of
mRNA) that matches it.
Large numbers of EST libraries have been
constructed and sequenced.
• The primary result of this was to determine that many genes have
several different intron slicing patterns: sequences are exons in
some tissues but introns in others.
GENEsearches,
DETECTION
Homology
using BLAST, are a good way to find
genes. If a DNA sequence closely matches a sequence
from another organism, it has been evolutionarily
conserved, and that usually means that it is an expressed
gene.
Exon prediction: exons need to be open reading frames
(no stop codons), and they display patterns of nucleotide
usage different from random DNA. Several different
programs exist, and they give somewhat varying results.
“Hypothetical genes” are genes whose existence has
been predicted by computer but which lacks any
experimental or cross-species data to confirm it.
• a “conserved hypothetical gene” is a sequence that
matches other species even though there is no EST or
other experimental evidence for its expression
GENOME ANNOTATION
The process of identifying the locations of genes and all of
the coding regions in a genome and determining what those
genes do.
Once a genome is sequenced, it needs to be annotated to
make sense of it.
GENE ANNOTATION
There is a big problem of too much information
not uniformly coded or maintained. The
scientific literature contains numerous
examples of the same gene or protein with
several different names, and getting common
definitions of functions is even harder.
To counter this, the Gene Ontology Consortium
(GO) has created a controlled vocabulary of
about 11,000 terms.
Every gene product (protein) can be annotated
into three general categories:
• molecular function: what the protein actually
does, such as “kinase activity”
• biological process: what cellular process the
protein participates in, such as “signal
transduction”
• cellular component: where the protein is found
in the cell, such as “integral to the plasma
membrane”
Each gene product can have multiple
descriptive terms.
The terms are hierarchical: more specific terms
are contained within less specific terms.
But, a given term can have more than one
parent and more than one child term.
GO EXAMPLE