* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download No Slide Title - Computer Science Department, Technion
Behavioural genetics wikipedia , lookup
Medical genetics wikipedia , lookup
X-inactivation wikipedia , lookup
History of genetic engineering wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Genetic drift wikipedia , lookup
Ridge (biology) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Minimal genome wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Genomic imprinting wikipedia , lookup
Gene expression programming wikipedia , lookup
Public health genomics wikipedia , lookup
Genome evolution wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Gene expression profiling wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Population genetics wikipedia , lookup
Designer baby wikipedia , lookup
Genome (book) wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Computational Genetics Lecture 1 Background Readings: Chapter 2&3 of An introduction to Genetics, Griffiths et al. 2000, Seventh Edition (CS/Fishbach/Other libraries). This class has been edited from several sources. Primarily from Terry Speed’s homepage at Stanford and the Technion course “Introduction to Genetics”. Changes made by Dan Geiger. . Course Information Meetings: Lecture, by Dan Geiger: Thursdays 14:30 –16:30, Taub. Tutorial, by Anna Tzemach: Thursdays 12:30 –13:30, Taub 5. Grade: 50% in five question sets. These questions sets are obligatory. Each contains 4-6 theoretical problems. Submit in pairs in two weeks time. 50% exam for undergrads. Seminar for Graduate students. A few undergrad students may be allowed to replace the exam with a seminar lecture. Information and handouts: http://www.cs.technion.ac.il/~anna_bi/cs236633/ 2 Course Prerequisites Computer Science and Probability Background Algorithms 1 (cs234247) Probability (any course) Algorithms in computational biology (recommended, or take in parallel). Some Biology Background Formally: None, to allow CS students to take this course. Recommended: Introduction to Genetics (or in parallel). 3 Course Goals Learning about computational and mathematical methods for genetic analysis. We will focus on Gene hunting – finding genes for simple human diseases. Methods covered in depth: linkage analysis (using pedigree data), association analysis (using random samples). Another goal is to learn more about Bayesian networks usage for genetic linkage analysis. 4 Human Genome Most human cells contain 46 chromosomes: 2 sex chromosomes (X,Y): XY – in males. XX – in females. 22 pairs of chromosomes, named autosomes. 5 Genetic Information Gene – basic unit of genetic information. They determine the inherited characters. Genome – the collection of genetic information. Chromosomes – storage units of genes. 6 Sexual Reproduction egg Meiosis sperm gametes zygote 7 Source: Alberts et al The Double Helix 8 Central Dogma שעתוק Transcription Gene תרגום Translation mRNA Protein cells express different subset of the genes In different tissues and under different conditions 9 Chromosome Logical Structure Marker – Genes, SNP, Tandem repeats. Locus – location of markers. Allele – one variant form of a marker. Locus1 Possible Alleles: A1,A2 Locus2 Possible Alleles: B1,B2,B3 10 Alleles - the ABO locus example Phenotype Genotype A A/A, A/O B B/B, B/O AB A/B O O/O O is recessive to A. A is dominant over O. A and B are codominant. Multiple alleles: A,B,O. Trait = Character = Phenotype 11 מושגים: .1אלל רצסיבי ודומיננטי .כאשר קיים בתא גם האלל הרצסיבי וגם הדומיננטי ,הפנוטיפ שקובע האלל הדומיננטי משתלט. AA .2ו aa -הם הומוזיגוטים ) (Homozygoteלאלל הדומיננטי והרצסיבי ,בהתאמה Aa .הוא הטרוזיגוט ).(Hetrozygote .3אללים מרובים ),(A,B,O 12 (X-linked) תאחיזה למין genotype phenotype b - dominant allele. Namely, (b,b), (b,w) is Black. w - recessive allele. Namely, only (w,w) is White. This is an example of an X-linked )(תאחיזה למין trait/character. For males b alone is Black and w alone is white. There is no homolog gene ) ( גן הומולוגיon the Y chromose. 13 Mendel’s Work Modern genetics began with Mendel’s experiments on garden peas (Although, the ramification of his work were not realized during his life time). He studied seven contrasting pairs of characters, including: The form of ripe seeds: round, wrinkled The color of the seed albumen: yellow, green The length of the stem: long, short Mendel Gregor. 1866. Experiments on Plant Hybridization. Transactions of the Brünn Natural History Society. 14 Mendel’s first law Characters are controlled by pairs of genes which separate during the formation of the reproductive cells (meiosis) Aa A a 15 P: AA X F1: aa Aa F1 X F1 Aa X Aa test cross Aa X Gametes: A a Gametes: A a A AA Aa a Aa aa a Aa aa aa ~ ~ Phenotype: 1A : 1 a F2: 1 AA : 2 Aa : 1 aa Phenotype ~ A ~ a 16 מושגים: .1הכלאה של F1על עצמו :בדור F2היחס בין הצאצאים המראים הפנוטיפ הדומיננטי לאלו המראים הפנוטיפ הרצסיבי הוא – .3:1 .2הכלאת מבחן :הכלאת צאצאי F1על ההורה בעל הפנוטיפ הרצסיבי. היחס בין הצאצאים המראים הפנוטיפ הדומיננטי לאלו המראים הפנוטיפ הרצסיבי הוא – 1:1 17 Mendel's First low. Results of crosses in which parents differed for one character Parental Phenotype F1 F2 F2 ratio 1. Round X wrinkled seeds Round 5474 round; 1850 wrinkled 2.96:1 2. Yellow X green seeds yellow 6022 yellow; 2001 green 3.01:1 3. Purple X white petals purple 705 purple; 224 white 3.15:1 4. Inflated X pinched pods inflated 882 inflated; 299 pinched 2.95:1 5. Green X yellow pods green 428 green; 152 yellow 2.82:1 6. Axial X terminal flowers axial 651 axial; 207 terminal 3.14:1 7. Long X short stems long 787 lon; 277 short 2.84:1 Conclusion, First low: The two members of a gene pair segregate from each other into the gametes. 18 דוגמא לשושלת עם מוטציה רצסיבית (נישואין של בני דודים). 19 Polydactyly – A dominant mutation 20 Brachydactyly – A dominant mutation 21 Mendel’s second law When two or more pairs of genes segregate simultaneously, they do so independently. A a; B b AB PAB= PA PB Ab PAb=PA Pb aB PaB=Pa PB ab Pab=Pa Pb 22 23 Mendel's second low. A dihybrid cross for color and shape of pea seeds P F1 F2 wrinkled and yellow X round and green rrYY RRyy round yellow Rr Yy X Rr Yy round yellow round green wrinkled yellow wrinkled green 315 108 101 32 556 a. Check segregation pattern for each allele in F2: 416 yellow : 140 green (2.97:1) 423 round : 133 wrinkled (3.18:1) Conclusion: both traits behave as single genes, each carrying two different alleles. 24 Question: Is there independent assortment of alleles of the different genes? v Probability to get yellow is 3/4; probability to get round is 3/4; probability to get yellow round is 3/4 X 3/4, namely 9/16 vProbability to get yellow is 3/4; probability to get wrinkled is 1/4; probability to get yellow wrinkled is 3/4 X 1/4, namely 3/16 vProbability to get green is 3/4; probability to get round is 3/4; probability to get green round is 1/4 X 3/4, namely 3/16 vProbability to get green is 1/4; probability to get wrinkled is 1/4; probability to get green wrinkled is 1/4 X 1/4, namely 1 /16. 25 A standard presentation in terms of counts expected expected observed yellow round 9 312.75 315 yellow wrinkled 3 104.25 101 green round 3 104.25 108 green wrinkled 1 34.75 32 Total 16 556 556 Conclusion, second law: Different gene pairs assort independently in gamete formation 26 “Exceptions” to Mendel’s Second Law Morgan’s fruit fly data (1909): 2,839 flies Eye color A: red Wing length B: normal a: purple b: vestigial AABB aabb x AaBb Expected Observed AaBb 710 1,339 x Aabb 710 151 aabb aaBb 710 154 aabb 710 1,195 The pair AB stick together more than expected from Mendel’s law. 27 Morgan’s explanation A A B a a B F1: b b A a B a a b b b F2: A a B b b a A a a b Crossover has taken place b a a b B b 28 Parental types: Recombinants: AaBb, aabb Aabb, aaBb The proportion of recombinants between the two genes (or characters) is called the recombination fraction between these two genes. It is usually denoted by r or . For Morgan’s traits: r = (151 + 154)/2839 = 0.107 If r < 1/2: two genes are said to be linked. If r = 1/2: independent segregation (Mendel’s second law). 29 Recombination Phenomenon (Happens during Meiosis) Male or female Recombination Haplotype :תאי מין או זרע,ביצית 30 כרומוזומים מזווגים המראים כיאסמתה הכיאסמתה היא הביטוי הציטולוגי לשחלוף. 31 Example: ABO, AK1 on Chromosome 9 O A O O A2 A2 2 1 A2/A2 A1/A1 Phase inferred A O A1 A2 Recombinant A A 4 3 A2/A2 A1/A2 O O A1 A2 O A |O A2 | A2 5 A1/A2 Recombination fraction is 12/100 in males and 20/100 in females. One centi-morgan means one recombination every 100 meiosis. One centi-morgan corresponds to approx 1M nucleotides (with large variance) depending on location and sex. 32 סימונים מוסכמים בשושלות 33 Maximum Likelihood Principle What is the probability of data for this pedigree, assuming a recessive mutation ? What is the probability of data for this pedigree, assuming a dominant mutation ? Maximum likelihood principle: Choose the model that maximizes the probability of the data. 34 One locus: founder probabilities Founders are individuals whose parents are not in the pedigree. They may of may not be typed (namely, their genotype measured). Either way, we need to assign probabilities to their actual or possible genotypes. This is usually done by assuming Hardy-Weinberg equilibrium (H-W). If the frequency of D is .01, then H-W says: 1 Dd pr(Dd ) = 2x.01x.99 Genotypes of founder couples are (usually) treated as independent. 1 Dd 2 dd pr(pop Dd , mom dd ) = (2x.01x.99)x(.99)2 35 One locus: transmission probabilities Children get their genes from their parents’ genes, independently, according to Mendel’s laws; also independently for different children. Dd 1 2 3 Dd dd pr(kid 3 dd | pop 1 Dd & mom 2 Dd ) = 1/2 x 1/2 36 One locus: transmission probabilities - II Dd 3 dd 1 2 Dd 4 5 Dd DD pr(3 dd & 4 Dd & 5 DD | 1 Dd & 2 Dd ) = (1/2 x 1/2)x(2 x 1/2 x 1/2) x (1/2 x 1/2). The factor 2 comes from summing over the two mutually exclusive and equiprobable ways 4 can get a D and a d. 37 One locus: penetrance probabilities Pedigree analyses usually suppose that, given the genotype at all loci, and in some cases age and sex, the chance of having a particular phenotype depends only on genotype at one locus, and is independent of all other factors: genotypes at other loci, environment, genotypes and phenotypes of relatives, etc. Complete penetrance: DD pr(affected | DD ) = 1 Incomplete penetrance )(חדירות חלקית: DD pr(affected | DD ) = .8 38 One locus: penetrance - II Age and sex-dependent penetrance (liability classes) D D (45) pr( affected | DD , male, 45 y.o. ) = .6 39 חדירות חלקית: דוגמא למוטציה דומיננטית בה הפנוטיפ המוטנטי לא תמיד מתבטא אישה בריאה זו מעבירה לבתה את המוטציה הדומיננטית. 40 One locus: putting it all together Dd 3 2 1 5 4 dd Dd Dd DD Assume penetrances pr(affected | dd ) = .1, pr(affected | Dd ) = .3 pr(affected | DD ) = .8, and that allele D has frequency .01. The probability of data for this pedigree assuming penetrances of 1=0.1 and 2=0.3 is the product: (2 x .01 x .99 x .7) x (2 x .01 x .99 x .3) x (1/2 x 1/2 x .9) x (2 x 1/2 x 1/2 x .7) x (1/2 x 1/2 x .8) This is a function of the penetrances. By the maximum likelihood principle, the values for 1 and 1 that maximize this probability are the ML estimates. 41 Fully penetrant Recessive Disease 2 1 3 4 5 Let q be the probability of the disease allele. The probability of data for this pedigree assuming full penetrance is the product: L = (1-q) x q x (1-q) x q (3/4)(3/4)(1/4) Exercise: write the likelihood for a fully penetrant dominant disease. 42