Download Rationale of Genetic Studies Some goals of genetic studies include

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA vaccination wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Molecular cloning wikipedia , lookup

Genetic testing wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Expanded genetic code wikipedia , lookup

Genome evolution wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Messenger RNA wikipedia , lookup

Public health genomics wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

RNA-Seq wikipedia , lookup

Human genome wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

Epitranscriptome wikipedia , lookup

NEDD9 wikipedia , lookup

DNA supercoil wikipedia , lookup

Human genetic variation wikipedia , lookup

Genomics wikipedia , lookup

Ploidy wikipedia , lookup

X-inactivation wikipedia , lookup

Non-coding DNA wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Neocentromere wikipedia , lookup

Replisome wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Karyotype wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome editing wikipedia , lookup

Point mutation wikipedia , lookup

Genetic code wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Genetic engineering wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Gene wikipedia , lookup

Designer baby wikipedia , lookup

Helitron (biology) wikipedia , lookup

Polyploid wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genome (book) wikipedia , lookup

Primary transcript wikipedia , lookup

Chromosome wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

History of genetic engineering wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
BIOSTAT516 Statistical Methods in Genetic Epidemiology
Handout1, prepared by Kathleen Kerr and Stephanie Monks
Autumn 2005
Rationale of Genetic Studies
Some goals of genetic studies include:
• to identify the genetic causes of phenotypic variation
• develop genetic tests
o benefits to individuals and to society are still uncertain
• drug development
o finding genes responsible for a disease, or even a sub-type of disease,
provides valuable insight into how pathways could be targeted for drug
development
o identify genetic profiles associated with adverse drug reaction
Data Explosion!
The amount of data available for use in genetic studies has exploded in the last decade.
In the past few years we have seen the release of the first drafts of the 3 billion base
pair human genome and the genomes of model organisms.
In a recent build of the human genome, annotation data are available for approximately
32,000 genes with around 18,000 “confirmed” genes. The typical confirmed human
gene has 12 exons of an average length of 236 base pairs each, separated by introns of
an average length of 5,478 base pairs.
In addition, data are being generated daily on sequence variation between populations.
More and more data are becoming available that quantify the expression of these genes
at the mRNA and the protein level for a variety of tissues. As the genomes for more
and more organisms are sequenced, we have unprecedented homology information
between organisms.
The Need for Experimental Design and Statistics
With so much data and so many options, there is a pressing need for well-designed
studies that incorporate genetic variation along with the corresponding accurate and
efficient statistical methods. Our goal for the quarter will be
•
•
•
to study potential designs that incorporate genetic data,
learn the corresponding methods for analyzing data from these designs
Our goals in these tasks will be to:
o understand the basic idea of each type of study
o know the assumptions each type of analysis depends on for validity
o understand the limitations of different types of studies
o learn how to correctly interpret study results
1
BIOSTAT516 Statistical Methods in Genetic Epidemiology
Handout1, prepared by Kathleen Kerr and Stephanie Monks
Autumn 2005
Some Basic Terminology
I recommend Chapter 1 of the Sham text for a quick introduction to these fundamental
concepts.
Biologists distinguish two types of cells, eukaryotic cells and prokaryotic cells.
Eukaryotic cells differ from prokaryotic cells in that eukaryotic cells contain many
membrane bound organelles, small membrane-bound structures inside the cell that
carry out specialized functions. In particular, eukaryotic cells have a nucleus. Human
beings and probably any “animal” that you might think of are eukaryotes. Some
bacteria are prokaryotes.
The nucleus in a eukaryotic cell contains most of the genetic material of the cell (and
therefore the organism); the genetic material is encoded in DNA, which is packaged into
chromosomes. The centromere is the attachment site for the spindle fiber that moves
the chromosome during cell devision. The centromere defines two arms of the
chromosome, the short arm p and the long arm q. Chormosomes can be telocentric
(centromere at the end), acrocentric (centromere near one end), or metacentric
(centromere near the middle).
Chromosomes come in pairs. Chromosomes within a pair carry the same set of genes
and are called homologous. Chromosomes that carry different sets of genes are
called nonhomologous. In humans, the pair that determines an individual’s gender is
called the sex chromosomes. All other chromosomes are referred to as autosomes.
Every species has its own characteristic number of different chromosomes n. Humans
have 23 pairs of chromosomes, 22 autosomes and 2 sex chromosomes. The
autosomes are numbered 1-22 from largest to smallest (except #22 is actually slightly
larger than #21). Therefore, there are 46 chomosomes in a human somatic cell.
In humans, there are two sex chromosomes X and Y. Females have two X
chromosomes and males have one X and one Y. The mechanism of sex determination
is different in different species.
2
BIOSTAT516 Statistical Methods in Genetic Epidemiology
Handout1, prepared by Kathleen Kerr and Stephanie Monks
Autumn 2005
Mitosis is cell division that yields two identical diploid cells, which have two of each
chromosome. Meiosis is a special type of cell division that happens in reproductive
tissue yielding haploid cells (which have one of each chromosome) called gametes. In
females, the gametes are the egg cells and in males the gametes are the sperm cells.
Genetically, a chromosome is just a long string of DNA.
DNA is a biochemical molecule, but quantitative scientists
think of it more as “information” in some sense. We think of
DNA as a long string of letters that come from a four-letter
alphabet: A, T, G, C (Adenine, Thymine, Guanine,
Cytosine). DNA is a double-stranded molecule, with each
strand made up of A’s, T’s, G’s, and C’s. A very important
property of DNA is complementary base pairing between
the two strands (see the figure on the next-to-last
DNA
page): A and T always pair and G and C always pair.
Molecule
Complementary base pairing means that each single
strand of DNA contains all the information for recreating the full double-stranded molecule.
Cell
Nucleus
Chromosome
Gene
Nucleotides
Some sub-strings of DNA encode a “recipe.” These substrings are genes. Specifically, a gene is a sequence of
DNA that is transcribed into mRNA (messenger RNA), which, in turn, is translated into
protein. Proteins are strings of amino acids. There are twenty different amino acids.
3
BIOSTAT516 Statistical Methods in Genetic Epidemiology
Handout1, prepared by Kathleen Kerr and Stephanie Monks
Autumn 2005
The genetic code is the codebook that gives the correspondence between DNA and
protein. Every triplet of DNA bases (a codon) corresponds to a specific amino acid, or
else signals START or STOP. The genetic code is almost universal across species.
Exons
Promoter
I
III
II
DNA
Introns
Transcription
I
III
II
Splicing
mRNA
I
II
Exons
III
Translation
Protein
Double-stranded DNA:
5’...TGCATGCATGGTTGCA...3’ Coding or sense strand
3’...ACGTACGTACCAACGT...5’ Template or anti-sense strand
Transcription→reads template strand from 3’ to 5’ to produce mRNA
mRNA 5’...UGCAUGCAUGGUUGCA...3’
Translation→reads mRNA from 5’ to 3’ to produce polypeptides
N-terminal ...Cys Met His Gly Cys...C-terminal
1. Note that the coding strand is the one that is not used in transcribing the mRNA
molecule.
2. In transcription, the template strand is read from the 3’ to 5’ direction to produce
mRNA.
3. In translation, the mRNA is read from 5’ to 3’ to produce proteins.
A specific location on a chromosome, for instance the location of a gene, a SNP (singlenucleotide polymorphism), or another genetic marker, is a locus (plural: loci). There
can be more than one form of a locus. These forms are called alleles. When there is
more than one allele at a locus, the locus is said to be polymorphic.
4
BIOSTAT516 Statistical Methods in Genetic Epidemiology
Handout1, prepared by Kathleen Kerr and Stephanie Monks
Autumn 2005
When two haploid gametes unite, the complete diploid number of chromosomes is
reinstated. We see also that an individual has one chromosome of maternal origin and
one chromosome of paternal origin. Thus for a given locus an individual will have one
allele of maternal origin and one allele of paternal origin. These define an individual’s
genotype. If an individual has two copies of the same allele, then that individual is
homozygous at that locus. If an individual has two different alleles at a locus, then
s/he is heterozygous. Mendel’s First Law states that the two members of a gene pair
segregate (separate) from each other into the gametes, so that one-half of the gametes
carry one member of the pair and the other one-half of the gametes carry the other
member of the gene pair.
Gregor Mendel conducted pioneering work in Genetics performing breeding
experiments in plants. It is useful to consider some experiments similar to Mendel’s to
become proficient in the basic concepts of genetics.
Here are some basic exercises that should help you master these background
concepts.
1. It is known that about 22 percent of the double-stranded DNA of an organism
consists of thymine. Can the other base percentages be determined? If so, what are
they?
If T is 22% then A must also be 22% due to complementary base pairing. This then
accounts for 44% of the composition. C and G must then account for 56%, and since
they must also be equal, each accounts for 28%.
2. Double stranded DNA with 300 nucleotide pairs has a base composition of A=0.32,
G=0.18, C=0.18, and T=0.32. Assume that a single strand of RNA is transcribed from
this gene. Can you determine, from the information given, the base composition of the
RNA? If so, what is it?
This cannot be determined from the information because the coding strand of DNA
could have, for example, all A and G and the template strand could be entirely C and T,
or vice versa. These are extreme cases, but show that the base composition of the
RNA could vary wildly.
3. A certain DNA virus has a base ratio of (A+G)/(C+T)=0.85. Is this single- or doublestranded DNA? Explain.
It must be single-stranded. Otherwise, the ratio would be 1.
5
BIOSTAT516 Statistical Methods in Genetic Epidemiology
Handout1, prepared by Kathleen Kerr and Stephanie Monks
Autumn 2005
4. Consider a DNA triplet pair:
3’GTC5’
5’CAG3’
where the top strand is the template strand that transcribes mRNA. What is the amino
acid does the triplet code for?
We read the coding strand from 5’ to 3’ to see that the codon is CAG, which codes for
Glutamine.
5.
5’...TCGTTTAAGGGCTTGTGCGCCACGGAT...3’ coding strand
3’...AGCAAATTCCCGAACACGCGGTGCCTA...5’ template strand
1
2
3
(a) What are the first three proteins in the sequence?
Ser Phe Lys
(b) A base is added as the result of exposure to acridine dye (this is called a frameshift
mutation). At which position (2 or 3) would it likely have the most damaging effect on
the gene product? Explain.
Since translation happens in the 5’ to 3’ direction, an added base at position 2 is likely
more damaging since this would affect more codons.
(c) The base guanine is added at position 1. What effect would it have on the gene product?
The new sequence would be:
TCG TGT TAA,
In mRNA form:
UCG UGU UAA
which would code for: Ser Cys STOP Therefore, the second amino acid is Cys instead of Phe
and translation stops prematurely.
6
BIOSTAT516 Statistical Methods in Genetic Epidemiology
Handout1, prepared by Kathleen Kerr and Stephanie Monks
Autumn 2005
The RNA Codons
U
U
C
A
G
Second nucleotide
C
A
G
UUU Phenylalanine
(Phe)
UCU Serine
(Ser)
UAU Tyrosine
(Tyr)
UGU Cysteine
(Cys)
U
UUC Phe
UCC Ser
UAC Tyr
UGC Cys
C
UUA Leucine (Leu)
UCA Ser
UAA STOP
UGA STOP
A
UUG Leu
UCG Ser
UAG STOP
UGG Tryptophan
G
(Trp)
CUU Leucine (Leu)
CCU Proline
(Pro)
CAU Histidine
(His)
CGU Arginine
(Arg)
U
CUC Leu
CCC Pro
CAC His
CGC Arg
C
CUA Leu
CCA Pro
CAA Glutamine
(Gln)
CGA Arg
A
CUG Leu
CCG Pro
CAG Gln
CGG Arg
G
AUU Isoleucine (Ile)
ACU Threonine AAU Asparagine
(Thr)
(Asn)
AGU Serine (Ser) U
AUC Ile
ACC Thr
AAC Asn
AGC Ser
C
AUA Ile
ACA Thr
AAA Lysine (Lys)
AGA Arginine
(Arg)
A
AUG Methionine (Met)
or START
ACG Thr
AAG Lys
AGG Arg
G
GUU Valine Val
GCU Alanine
(Ala)
GAU Aspartic
acid (Asp)
GGU Glycine
(Gly)
U
GUC (Val)
GCC Ala
GAC Asp
GGC Gly
C
GUA Val
GCA Ala
GAA Glutamic
acid (Glu)
GGA Gly
A
GUG Val
GCG Ala
GAG Glu
GGG Gly
G
7