Download A unit of measurement on genetic maps is:

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene therapy of the human retina wikipedia , lookup

Point mutation wikipedia , lookup

Ridge (biology) wikipedia , lookup

X-inactivation wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Non-coding DNA wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Genomic library wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Transposable element wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Metagenomics wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Copy-number variation wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Tag SNP wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene nomenclature wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene therapy wikipedia , lookup

Minimal genome wikipedia , lookup

Genetic engineering wikipedia , lookup

Population genetics wikipedia , lookup

Human genome wikipedia , lookup

History of genetic engineering wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Gene desert wikipedia , lookup

Genomics wikipedia , lookup

Human genetic variation wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Pathogenomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Public health genomics wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genome evolution wikipedia , lookup

Genome (book) wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome editing wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
Name:________________________________
GN415
Midterm Exam
March 4, 2005
This exam is worth 30% of your final grade. There are 10 multiple choice and 5 True
or False questions (1 point each) and you should answer 6 of the 8 short answer (6
points each) questions, three from each of the two lists (problems and essays).
You have 60 minutes for the exam.
1.
A unit of measurement on physical maps is:
a)
b)
c)
d)
2.
kilobases
centimorgans
cytological bands
centimeters
ENCODE stands for:
a)
b)
c)
d)
3.
encyclopedia of database entries
encrypted oracle database encyclopedia
encyclopedia of DNA elements
eukaryotic normalized collection of DNA elements
Which of the following does not host a commonly used genome browser:
a)
b)
c)
d)
4.
the National Center for Biotechnology Information (NCBI)
the European Bioinformatics Institute (EBI)
the University of California at Santa Cruz (UCSC)
the United States National Science Foundation (NSF)
An E-value is:
a)
b)
c)
d)
5.
the expected number of codons in a sequence of nucleotides of length l
an expression of the likelihood that a QTL exists in an genome interval
the expected number of equally good sequence matches in a sequence database
an expression of the goodness of fit of a gene prediction to a cDNA sequence
The average number of PHRED 20 quality bases in a typical sequence read is close to:
a)
b)
c)
d)
10
100
500
2000
1
Name:________________________________
6.
A haplotype is:
a)
b)
c)
d)
7.
the set of polymorphic nucleotides found together on a single chromosome
a genotype that is unique to non-African populations
a genotype that is only found in a single individual in a population
a set of diploid genotypes at two or more loci in an individual
Alternative splicing refers to:
a)
b)
c)
d)
8.
a difference in the number of exons in two or more species
the production of two or more mRNAs from a single gene
regulation of two different genes by a single regulatory element
post-translational modification of the cleavage site of receptor proteins
Which of the following may contribute to false positive case-control associations?
a)
b)
c)
d)
9.
differences in allele frequency between populations
cultural differences between populations
nutritional differences between populations
all of the above
The F1 generation is:
a)
b)
c)
d)
the Ferrari Fanclub
the progeny of a cross between a father and his daughter
the first filial generation from a cross between two inbred lines
the J. Craig Venter Foundation
a)
b)
c)
d)
Transmission Disequilibrium refers to:
a 1:1 ratio of the two alleles in a set of unrelated affected individuals
an unexpectedly long haplotype block in a set of siblings
linkage disequilibrium between markers on two different chromosomes
un unequal ratio of two alleles in a set of unrelated affected individuals derived
from heterozygous parents.
10.
2
Name:________________________________
True or False (circle the appropriate option):
11.
Synteny refers to conservation of the order of genes along a chromosome in two
different species.
T
F
12.
2X shotgun sequencing coverage is sufficient to assemble 99% of the genome of a
multicellular eukaryote into fragments at least 1 Mb in length.
T
F
13.
The closer together two genetic polymorphisms are on a chromosome, the more rapidly
linkage disequilibrium decays over time.
T
F
14.
Current estimates suggest that there are fewer than 30,000 genes in the human genome.
T
F
15.
Tagging SNPs are designed to capture the majority of the genetic variation in haplotype
blocks, reducing the number of sites that must be tested in a genome scan.
T
F
Short Answer Problem Questions (6 points each; answer 3 of the 4 questions)
16.
Compute the best possible alignment for the following two sequences, assuming a gap
penalty of -2, a mismatch penalty of -1, and a match score of +1. If the last nucleotide in the
right hand sequence is a G instead of an A, does your answer change?
GCGCATA
and
GCGCTAA
GCGCATA
||||..|
GCGCTAA
Best fit
5 –2 = 3
GCGCATA||||-||GCGC-TAA
GCGCATA
||||...
GCGCTAG
GCGCATA||||-||GCGC-TAG
Best fit
6-4=2
4 –3 = 1
6-4=2
3
Name:________________________________
17. Generate a hypothetical Gene Ontology (GO) classification for any gene, real or
imaginary, that you may care to choose. Include two hierarchical levels in each of three GO
categories that you list.
Antennapedia
Cellular Location
Molecular Function
Biological Process
Nucleus
Transcription Factor
Developmental Regulation
Chromatin
Homeodomain Class
Axial pattern formation
18.
A BLAST search identifies two genes with significant matches and the following
alignments (the query sequence is the one in the middle; identity is indicated by a vertical line,
mismatch by a dot, and a gap by a dash):
1.
Q.
2.
ACCCGTA------------TATAATGCATTACGATGGGGATCGACTAC--------|||||||------------|||||||||||||||||||||||||||||--------ACCCGTATCGATGCCTAGCTATAATGCATTACGATGGGGATCGACTACGGATCCATC
||||.|||--||||||.|||||.|||||||.|||||.|.||||..|||||.||||.|
ACCCATAT--ATGCCTTGCTATGATGCATTGCGATGAGCATCGCATACGGCTCCAGC
a) Which alignment would you consider more likely to identify a homologous gene, and why?
The first alignment shows identity over a long stretch, and only two molecular changes
differentiate the sequences. The large number of gaps may suggest that the function of the
gene has changed, but these are still likely to be homologous (that is, derived from a common
anscestor). At least 11 molecular changes separate the query and sequence 2, so they probably
diverged in the more distant past.
b) How would you determine whether the sequences are orthologs or paralogs?
Paralogs are different copies of the gene in the same lineage, while orthologs are the same gene
in different species (lineages). You would first look to see if there are multiple similar
sequences in each of the species that have the gene, and if possible pull out the homologous
sequence from an outgroup. One possible explanation for Sequence 1 is that it is a
pseudogene, or a recently retrotransposed copy of the original gene, without small introns,
either of which would make it a paralog.
4
Name:________________________________
19. In a recent survey, 500 Americans were asked the question “Do you feel that the income
tax rate in this country is too high”, and simultaneously genotyped for a G/T regulatory
polymorphism in the promoter of the GspnA1 locus. The following genotype frequencies were
observed:
Yes group
No group
GG
GT
TT
10
115
125
25
105
120
a) What is the approximate allele frequency of G in the population?
There are 35 homozygotes for GG and 220 hetrozygotes, in a total of 500 individuals.
Therefore, the minor allele frequency (G) is (2x35 + 220)/(2x500) = 0.29. It is slightly lower
in the Yes group (135/500 = 0.27) than in the No group (155/500 = 0.31).
b) Is there any suggestion of an association between GspnA1 and taxation policy?
Possibly: there appears to be a deficit of GG homozygotes in the Yes group (21 would be
expected for an allele frequency of 0.29 in 250 individuals). The genotype frequencies match
Hardy-Weinberg proportions in the No group. A statistical test would have to be performed to
assess whether the difference is significant.
c) What does this say about the genetic basis of an economic belief?
Taken at face value, this may suggest that GG homozygotes are unlikely to believe that they
have high taxation. However, even if the association is significant, there may be a lot of
reasons causing a false positive result, including population stratification, and sampling
artifacts. The result would have to be replicated several times before it implied something
genetic, and some mechanistic hypothesis for the function of the GREENSPAN protein
developed.
5
Name:________________________________
Short Answer Essay Questions (6 points each; answer 3 of the 4 questions)
20.
If you were CEO of a company that produces a drug that significantly reduces the
memory loss associated with Alzheimer’s disease, but increases the risk of aneurysm in 5% of
patients, how might you use genomics to increase the likelihood that your drug is approved by
the FDA? Be as specific as you can in describing the pharmacogenetic approach and the
potential drawbacks.
My objective would be to identify a genetic marker that predicts the adverse side-effect. In this
case, I would conduct a case-control genome scan with the 100,000 human tagging SNPs from
the HapMap project, where the cases are as large a sample as I can find (at least 200) of
patients who took the drug and developed aneurysms, and the controls are patients who took
the drug without developing aneurysms. I would use a statistical test of association to find
genotypes that are more prevalent in either the cases or the controls.
Having found an association, it would be essential to replicate the study. With Alzheimers’
Disease, it is unlikely that most affected individuals still have living parents, so it will probably
not be possible to perform a parent-offspring transmission disequilibrium test, but it may be
possible to perform a sibling-based transmission disequilibrium study. Also, since the drug has
already been found to increase aneurysms, it may have been withdrawn prior to completion of
the clinical study, and it may not be possible to find enough individuals to perform a
replication. I would also have to check to see whether the susceptibility allele is at a different
frequency in different populations.
Even if an association isfound, it is unlikely that one marker will predict who will have the
adverse side-effects. It is possible that the identity of the gene might help the drug company
develop another drug to counteract the aneurysms. Alternatively, it might be used to identify
the at-risk-population and exclude them from taking the drug, even though many individuals
who might benefit are excluded.
6
Name:________________________________
21.
What are the three main methods used to identify genes in a genome sequence, and
what attributes of the gene annotation are currently most unreliable?
1. Experimental evidence for gene expression. For example, a match to an EST or cDNA
sequence in the database. Since genes are only transcribed in a subset of tissues, and many
have very low transcript abundance, failure to identify a cDNA does not mean that the
sequence is not part of a gene. Primers can be designed to specifically detect predicted
transcripts by RT-PCR.
2. Ab initio gene detection. Most gene annotation starts with Hidden Markov Models
(HMMs) that search for ordered strings of promteres, start sites, exons, introns, stop codons,
and 3’ polyA sites. Different parameters in the proabilistic models lead to different predictions
and multipel algorithms should be employed.
3. Search for a sequence match in the database of all genomes, generally using the Basic Local
Alignment Search Tool (BLAST). This looks for sequence conservation of at least 60
nucleotides (or 20 codons), and can be performed both with nucleotide and amino acid
sequences. It is based on the idea that genes are likely to be conserved over evolutionary time.
The most unreliable aspect of gene annotation is the gene structure. Gene prediction is pretty
good (perhaps 10% false prediction and failure to detect genes), but the identification of exon
boundaries and start sites is much more error prone. It is expensive and time-consuming to
determine mRNA structures completely, especially given alternative splicing. Another aspect
of poor annotation is the prediction of regulatory micro-RNAs and non-coding RNA genes.
7
Name:________________________________
22.
What is the difference between linkage mapping and linkage disequilibrium mapping?
Describe a general strategy for using both methods to identify a gene that predisposes human
children to autism.
Linkage mapping is performed in pedigrees, and is based on the idea that physically linked
genes on a chromosome are likely to co-segregate. Consequently, markers within several
centiMorgans tend to be linked and to give similar test statisitcs. Statistical methods are used
to infer the most likely location of a gene based on the association between a set of adjacent
merkers and the phenotype.
Linkage disequilibrium mapping is performed in populations, and is based on the idea that after
thousands of generations, recombination leaves the genome in small chunks of no more than a
few hundred kilobases (less than 0.1 centiMorgans) that are in linkage disequilibrium. It is
much higher resolution that linkage mapping, but requires two or three orders of magnitude
more genetic markers.
In general, linkage mapping could be used to narrow down an interval to 5cM (several Mb) in
a dozen or so pedigrees each with several autistic children. Next, I would examine the
predicted genes in the interval, and develop a set of 100 or so tagging SNPs that are expected
to capture most of the haplotype variation. I would then conduct a case-control association test
on a large population of 500 autistic children compared with 500 age and sex-matched contrl
children from a single population. Subsequently, it would be necessary to replicate the study
with an independent sample, or to perform a family-based transmission-disequilibrium test.
8
Name:________________________________
23.
Write a short essay describing one ethical, one social, and one legal implication of the
Human Genome Project.
An example of an ethical implication is whether parents should be encouraged to use genotypic
information to make decisions about their family planning. In some cases, couples may be
advised not to get married because their children are likely to have a particular disease.
An example of a social implication is ensuring equal access to the benefits of genomic research
for all sectors of society: urban and rural, wealthy and poor, and all racial groups. There is
already a concern that particular races are less likely to participate in genomic studies on the
basis of past experience with eugenics and unfair treatment.
An example of a legal implication is the potential for stigmatization of people on the basis of
their genetic predisposition to disease, leading to failure to provide health insurance or
employment.
9