* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 2610//16 1 Allele-specific expression, ASE [1] Defini8on of allele
Neuronal ceroid lipofuscinosis wikipedia , lookup
Human genome wikipedia , lookup
Genetic engineering wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Genetic drift wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Copy-number variation wikipedia , lookup
Gene therapy wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genome (book) wikipedia , lookup
Gene desert wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Gene nomenclature wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Genomic library wikipedia , lookup
Public health genomics wikipedia , lookup
Human Genome Project wikipedia , lookup
Pathogenomics wikipedia , lookup
Genomic imprinting wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene expression profiling wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Helitron (biology) wikipedia , lookup
Metagenomics wikipedia , lookup
Genome editing wikipedia , lookup
Gene expression programming wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Designer baby wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome evolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
2610//16 SciLifeLabRNA-seqcourse Allele-specificexpression,ASE OlofEmanuelsson KTHRoyalInsBtuteofTechnology [email protected] 2016-10-27,11:00-12:00Navet(E10),BMC,Uppsala [1]Defini=onofallele-specific expression(ASE) Addinganotherlayertotranscriptomecomplexity... ♀ ♂ Adopted from Unneberg, 2010 ...andeachgeneispresentontwochromosomes. =>ithastwoalleles ASE:allele-specificexpression Outline 1. Defini=onofASE 2. Detec=ngASE(introductorycase) 3. Applica=onsandprevalenceofASE 4.ImportantASEconsidera=ons (a)Variantcalling (b)MappingbiasASEtools (c)Manyvariantsinagene 5.ASEtools 6. GeneiASE–atooltodetectgeneswithASEfromRNA- seqdata Addinganotherlayertotranscriptomecomplexity... Adopted from Unneberg, 2010 Onegenecanproducemanydifferenttranscripts... Allele,defini=on Analleleisthevariantformofagivengene(orlocus). SomeBmes,differentallelescanresultindifferent observablephenotypictraits,suchasdifferentpigmentaBon. /…/ Ifbothallelesatagene(orlocus)onthehomologous chromosomesarethesame,theyandtheorganismare homozygouswithrespecttothatgene(orlocus).Ifthe allelesaredifferent,theyandtheorganismare heterozygouswithrespecttothatgene(orlocus). hXps://en.wikipedia.org/wiki/Allele 1 2610//16 Allele-specificexpression,defini=on AnimbalanceintranscripBonbetweenthematernaland paternalallelesatalocus. • I.e.,adevia=onfromtheexpected50/50ra=oof transcripBonfromthetwoallelesofadiploidorganism. • Canbeassessedwithinasingleindividual (Presentalsowhenploidy>2,e.g.,plants) Othereventsmayalsobe“allele-specific”,e.g. • transcripBonfactorbinding • DNAbackbonemethylaBon • X-chromosomeinacBvaBoninfemalemammals Detec=ngallele-specificexpression Wetlabtechnologies: • microarrays(ifdesignedproperly) • qRT-PCR+TaqMan • pyrosequencing • RNA-seq N.B.:asthesearesequence-basedtheywillnotprovideany informaBoninthecaseofahomozygousallele,althoughit maysBllbeexpressedpredominantlyfromonlyoneofthe chromosomes. eQTL–expressionquan=ta=vetraitloci Anotherapproach! Requiresmanysubjects [2]Detec=ngASE Detec=ngallele-specificexpressionusingRNA-seqdata • RNA-seqreadsprovidethesequenceofatranscript • ...whichenablesthedeterminaBonoftheallelicorigin ofthereadsoverlappingwiththeSNV Diploidgenome SNV Allele-specificgene expression(mRNAs) U U Allele-specificexpression,defini=on genomicDNA->transcript(e.g.mRNA) Allele-specificgene Diploidgenome expression(mRNAs) U U SNV • SNV=singlenucleoBdevariant • ThegenomicSNVisreflectedinthetranscribedRNA(Tis UinRNA). RNA-seqreads T T C Detec=ngallele-specificexpressionusingRNA-seqdata Generaloutline: 1.MaptheRNA-seqreads 2.Countthereadsthatmaptoeitherallele 3.Calculateeffectsizeandp-value C C C 2 2610//16 Detec=ngallele-specificexpressionusingRNA-seqdata 1.MaptheRNA-seqreads Detec=ngallele-specificexpressionusingRNA-seqdata 1.MaptheRNA-seqreads Detec=ngallele-specificexpressionusingRNA-seqdata 2.Countthereads Detec=ngallele-specificexpressionusingRNA-seqdata 3.Calculateeffectsizeandp-value Paternalallele(a) Maternalallele(A) …AGTCTTCCAATTAGC… …AGTCTTCTAATTAGC… Reads–10xcoverageofthelocus …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCCAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCCAATTAGC… …AGTCTTCCAATTAGC… Paternalallele(a) Maternalallele(A) …AGTCTTCCAATTAGC… …AGTCTTCTAATTAGC… Mappedreads …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCCAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCCAATTAGC… …AGTCTTCCAATTAGC… Paternalallele(a) Maternalallele(A) 3x …AGTCTTCCAATTAGC… 7x …AGTCTTCTAATTAGC… 3readsmappedtopaternalallele 7readsmappedtomaternalallele Intotal10readsmappedtothelocus Effectsize:(otherdefiniBonspossible) ASEeffect=calt/(calt+cref)–0.5 i.e.,thefracBonofcountsmappedtoalternaBvealleleminus0.5=> ifnoASEthenASEeffect=0 rangeofASEeffectis[-0.5,0.5] P-value:Usebinomialwithp=0.5(assuming50/50transcripBon) Ourexamplefrompreviousslide: Effectsize=ASEeffect=calt/(calt+cref)–0.5=3/(3+7)-0.5=–0.2 P-value:binomialtestfordeviaBonfrom50/50distribuBonbetween alleles(inR): > pbinom(3, size=10, prob=0.5) [1] 0.171875 ⇒ NotsignificantinthisparBcularexample ⇒ Ifcoveragewas30x(9+21reads)insteadof10x(3+7),thenp-value<0.03 eQTLvs.ASE eQTL ASE • Sufficientpowerwithasingle • Inter-individualdifferencesinexpression individual • Modesteffects • IdenBcalcellularenvironmentfor • LargenumberofSNP-genecombinaBons thetwochromosomes • Manysamplesneeded • NoassociaBontoregulatoryregion • Mayusemicroarraysforgeneexpression • MustuseRNA-seqforgene • Genotypingrequired expression [3]Applica=onsandprevalenceofASE 10individualsgenotyped 3 2610//16 Applica=onofASE Applica=onofASE Findproteinvariants Findcis-regulatoryvariant Allele-specificgene expression(mRNAs) cis-regulatory variant U U SNV PossibletodetectifyoualsohaveinformaBonaboutnon-transcribed variants(e.g.,fromwhole-genomeDNAsequencingorSNP-array). Diploidgenome Allele-specificgene expression(mRNAs) Differentproteins U U SNV Toinferachangedprotein,theSNVmustbe • incodingregion • non-synonymous Applica=onofASE Normalvs.tumorexpression Possibletodetectifyouhaveexpressionmeasuredfrombothnormaland tumorBssue(inthesameindividual). PrevalenceofASE GeneswithsignificantASE(%ofgeneswithheterozygousvariant). ImportantASEconsidera=ons [4]ImportantASEconsidera=ons (a) Variantdetec=on (b)Mappingbias (c)Manyvariantsinagene 4 2610//16 Variantdetec=on [4]ImportantASEconsidera=ons: (a)Variantdetec=on Variant=aposiBoninthegenomethatisdifferentfromanothergenome. • Homozygousvariant:thetwoallelesareidenBcaltoeachother • Heterozygousvariant:thetwoallelesaredifferent • “Ref.”=thealleleisthesameasforthereferencegenome • “Alt.”=alternate=thealleleisdifferentfromthereferencegenome • SNVisonetypeofvariant,othersincludeinserBon,deleBon,... Variantdetec=on=detecBngwhatvariantsarepresentinasample: 1. Variantcalling–anyposiBonwithevidenceofanalternaBvebase 2. VariantprioriBzaBon–definereliablevariantswithhighconfidence TypicallyperformedbasedongenomicDNAdata,from • Microarrays(e.g.IlluminaOmni2.5M) • Sequencing(e.g.whole-genomere-sequencingorexomesequencing) Variantdetec=onfromsequencingdata Startbymapthereads. Paternalallele(a) Maternalallele(A) …AGTCTTCCAATTAGC… …AGTCTTCTAATTAGC… Reads–10xcoverageofthelocus …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCCAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCCAATTAGC… …AGTCTTCCAATTAGC… Variantdetec=onfromsequencingdata Thisiswhatweactuallyhave: Paternalallele(a) Maternalallele(A) Referencesequence …AGTCTTCCAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… Mappedreads …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCCAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCCAATTAGC… …AGTCTTCCAATTAGC… =>needtodetectthevariantposiBonsinthereferencesequence Variantdetec=onfromsequencingdata OK,pieceofcake? Paternalallele(a) Maternalallele(A) …AGTCTTCCAATTAGC… …AGTCTTCTAATTAGC… Mappedreads …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCCAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCTAATTAGC… …AGTCTTCCAATTAGC… …AGTCTTCCAATTAGC… Variantdetec=onfromsequencingdata Standard:GATK(DePristoetal.,2011)orSamtools–worksonanymapped sequencingdata. GATKscorestheSNVsbytakingintoaccountanumberofcharacterisBcs, including: • Sequencingdepth(coverage) • Mappingquality • PosiBonbias(basequality) SpecificRNA-seqbasedtools: • Colib’read–LeBrasetal.,2016 • RVboost–Wangetal.,2014 • ACCUSA2–PiechoXaetal.,2013 GATKthemostwidelyused,evenforRNA-seq. 5 2610//16 Variantdetec=on–whichvariantstouse(priori=za=on)? VCFisatextfileformat(“flattext”).ExampleVCFoutputfromGATK: VariantsfromRNA-seq • Whatsequencingdepth? influencesthepower,seeè • Heterozygous • Othercriteria Filteringofknownvariants • KeeponlyvariantsindbSNP? ##fileformat=VCFv4.1 ... #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE1_NA12878 [SAMPLE1_BLALBA] ... 1 873762 . T C 5231.78 PASS [ANNOTATIONS] GT:AD:DP:GQ:PL 0/1:173,141:282:99:255,0,255 ... 1 877664 rs3828047 A G 3931.66 PASS [ANNOTATIONS] GT:AD:DP:GQ:PL 1/1:0,105:94:99:255,255,0... 1 899282 rs28548431 C T 71.77 PASS [ANNOTATIONS] GT:AD:DP:GQ:PL 0/1:1,3:4:26:103,0,26 ... 1 974165 rs9442391 T C 29.84 LowQual [ANNOTATIONS] GT:AD:DP:GQ:PL 0/1:14,4:14:61:61,0,255... ... GT:thegenotypeofthissampleatthissite(0/0,0/1,1/1,1/2,...).0=ref.,1=alt. AD:alleledepths,i.e.,thenumberofreadsthatsupporteachofthereported alleles GQ:qualityofassignedgenotype(max=99) FullspecificaBonofVCFfileformat:hXp://samtools.github.io/hts-specs/ AllelicraBo Variantdetec=on–VCF,VariantCallFormat :40 :67 :30 :20 50 Sequencingdepth Mappingbias [4]ImportantASEconsidera=ons: (b)Mappingbias Mappingbias–exampleinrealdata Heterozygousvariants(alt/ref)mapped toreferencegenome. X-axis,alternateallelefracBon[0,1] Y-axis,density (Datafrom16RNA-seqexperiments; VariantsdetectedwithRNA-seqdata orSNParray). Referencegenomevariants(“ref.”)haveanadvantageinthemapping. Maternalallele:…ATCGAATGAAGCTCATTGGATCAGAT… (ref.) Paternalallele: …ATCGAATGAAGCTTATTGGATCAGAT… (alt.) Reference: …ATCGAATGAAGCTCATTGGATCAGAT… Mappingofreads Readfrommaternalallele: AGCTCATT :::::::: Reference: ATCGAATGAAGCTCATTGGATCAGAT :::: ::: Readfrompaternalallele: AGCTTATT Thepaternalallelereadwillmapwithalowermappingquality. IncaseofsequencingerrororpoorbasequalityatanotherposiBon,this mightpushthemappingqualityofthepaternalallelereadbelowthe threshold,andthereadwillbediscarded. Mappingbias–waystogetarounditinASEdetec=on Maskingvariants({A,C,G,T}=>N) YoulooseinformaBon. ConstructallpossibleversionsofthegenomefromexisBngvariants CansoongenerateaprohibiBveamountofgenomeversions. Mapreadstodiploidgenome(ortranscriptome) Requiresthatyoueitherhaveorconstructthediploidgenome(or transcriptome)oftheindividual. Modfiythebinomialprobabilityptoreflectthemappingbias. RequiressimulaBontoproperlymodifyp. 6 2610//16 Mappingbias–waystogetarounditinASEdetec=on Maskingvariants({A,C,G,T}=>N) YoulooseinformaBon. ConstructallpossibleversionsofthegenomefromexisBngvariants CansoongenerateaprohibiBveamountofgenomeversions. Mapreadstodiploidgenome(ortranscriptome) Requiresthatyoueitherhaveorconstructthediploidgenome(or transcriptome)oftheindividual.E.g.,Turroetal.(2011)(transcriptome). Modfiythebinomialprobabilityptoreflectthemappingbias. RequiressimulaBontoproperlymodifyp.E.g.,Montgomeryetal.(2010). [4]ImportantASEconsidera=ons: (c)Manyvariantsinagene Manyvariantsinagene Manyvariantsinagene Morethanonevariantwithinageneiscommon: Allele-specificgene Diploidgenome expression(mRNAs) A U A U A 2different“haploisoforms” G C G G C SNV_1SNV_2 G C RNA-seqcontainsinformaBonthattherearetwoheterozygousSNVs. Allele-specificgene Diploidgenome RNA-seqreads expression(mRNAs) A T A U T A U A A G C G G G C C G SNV_1SNV_2 G C C G C G G G C C G C VariantsdetectedfromRNA-seqreads: • A/Gatonelocus • T/Catonelocus Manyvariantsinagene Manyvariantsinagene–phasing But:RNA-seqdoesnotnecessarilycapturetherelaBonbetweenthetwo SNVs. Allele-specificgene Diploidgenome RNA-seqreads expression(mRNAs) G C A U T A U A C G C G C G G G C C SNV_1SNV_2 G C A G T G C G A G C PossiblecombinaBons(haplotypes)fromRNA-seqreads: • A+TandG+C • A+CandG+T Phasing=decidingwhichallelesthatareonthesamechromosomal homologue Diploidgenome RNA-seqreads A U UC G A A T G C G C SNV_1SNV_2 G C G A G A T G A PossiblecombinaBons(haplotypes)fromRNA-seqreads: AT GC • A+TandG+C:and (2+2different AC GT • A+CandG+T:and “haploisoforms”) ButcanourRNA-seqreadsprovidethephase? G G C C 7 2610//16 PhasingisusefulbutnotnecessarytodetectASE PhasinginformaBontypicallyachievedbysequencingthegenomesofthe parentsofthesubject.Directhaplotypesequencingalsopossible. Ifyoudon’tknowthephase(andformostRNA-seqdatasets,youdon’t): • Trytoinferitfrom (a)yourRNA-seqdata–possible,buttypicallyonlyparBalphasing (b)exisBngpopulaBondata(LD)–notapplicableonnewvariants • DisregardfromitandcalculateASEanyway Phasing • reducesmappingbias • enablesthedetecBonofhaploisoformexpression(isoformsrepresenBng thetwohomologouschromosomes) • butisnotnecessarytodetectASEingeneswith>1SNV ASEtools [5]ASEtools ASEtools–whereonlyRNA-seqdatafromasingle individualisrequired. AlistoftoolsthatcandetectASE,givenspecifiedinputdata: • cisASE–pairedgenomic+transcriptomicdata,Liuetal.,2016 • MutRSeq–nonsynonomousSNVsfromRNA-seqdata,Fuetal.,2016 • GeneiASE–unphasedRNA-seqdata,Edsgärdetal.,2016 • ASE-TIGAR–parentaldatarequired,bayesian,Nariaietal.,2016 • ASEQ–pairedgenomic+transcriptomicdata,Romaneletal.,2015 • MBASED–phasedorunphasedRNA-seqdata,Maybaetal.,2015 • Allim–parentaldatarequired,Pandeyetal.,2013 • MMSEQ–aXemptshaploisoformidenBficaBon,Turroetal.,2011 • (Skelly)–requiresphaseddata,Skellyetal.,2011 • AlleleSeq–requiresgenomicsequence,Rozowskyetal.,2011 • (AlleleDB–databaseforASEetc.of1000genomes,Chenetal.,2016) • • • • • • • • • • • GeneiASE [6]GeneiASE cisASE–pairedgenomic+transcriptomicdata,Liuetal.,2016 MutRSeq–nonsynonomousSNVsfromRNA-seqdata,Fuetal.,2016 GeneiASE–unphasedRNA-seqdata,Edsgärdetal.,2016 ASE-TIGAR–parentaldatarequired,bayesian,Nariaietal.,2016 ASEQ–pairedgenomic+transcriptomicdata,Romaneletal.,2015 MBASED–phasedorunphasedRNA-seqdata,Maybaetal.,2015 Allim–parentaldatarequired,Pandeyetal.,2013 MMSEQ–aXemptshaploisoformidenBficaBon,Turroetal.,2011 (Skelly)–requiresphaseddata,Skellyetal.,2011 AlleleSeq–requiresgenomicsequence,Rozowskyetal.,2011 (AlleleDB–databaseforASEetc.of1000genomes,Chenetal.,2016) GeneiASEdetectsgeneswithsignificantASE,insingleindividualsandbased onlyonRNA-seqdata.HaplotypeinformaBon(phasing)isnotneeded. Datarequired: • RNA-seqdata Pre-processingrequired: • Mappingandqualitycontrolofreads • VariantdetecBon(e.g.,GATK) • Filtervariantsifdesired • Allelecountsforvariantsextractedintocustominputtextfile Availability: • Edsgärdetal.,Scien9ficReports6:21134,2016 • hXps://github.com/edsgard/geneiase(GNUGPL3license) 8 2610//16 GeneiASE Thesitua9on: Gene i SNV_1 SNV_2 SNV_3 SNV... • unphaseddata Ref 60 20 70 ... • non-uniformeffectwithingene Alt 40 80 30 ... • technicalvariability TheGeneiASEsolu9on: 1. Foreachgene,loopoverallitsSNVsandtheir2x1matrixofreadcounts 2. CalculateateststaBsBc(sij)foreachSNV,basedonreadcounts 3. CombinetheteststaBsBcsfortheSNVswithinagene=>teststaBsBcfor enBregene(gi)asdf 4. ResamplefromparametricnullSNVmodel(esBmatedfromDNAdata)105 Bmes,calculatetheresulBngdistribuBonofgeneteststaBsBc(g0i). 5. Comparegitog0iandcalculateap-valueforgenei. GeneiASE–nullmodel,andgene-wisep-valuecalcula=on 0.EsBmateSNVnullmodelparameters DNAbasedesBmateofthetechnicalvariability GeneiASE–calculateSNP-basedgene-wiseteststa=s=c 1.CountreadsforeachSNVinagene;add pseudocountsifrequired 2.CalculateSNVteststaBsBcsijbasedon absolutevalueofeffectsize,eff. Absolutevalueofeffectsize=>Undirectedeffect 3.CalculategeneteststaBsBcgiusing Stouffer-Liptakmethod;kisnumberof SNVsingenei RunningGeneiASE–input Sta=cASE • geneiase-cvtl-icvtl.test.input.tab Foreachgene(genei): 1.SampleallelecountsfromnullSNVmodel (Randomeffectmodel) 2.CalculatekSNVteststaBsBc k=numberofSNVsingenei 3.CalculategeneteststaBsBc (Stouffer-Liptak) Condi=on-dependentASE • geneiase-icd-iicd.test.input.tab 4.Reiterate1-3NBmes(default:105) 5.Calculatep-valueforgenei Onelinepergene. Outputcolumns: • feat:FeatureIDasspecifiedintheinputfile(typicallyageneidenBfier) • n.vars:Numberofvariantswithinthegene • mean.s:Meanofsacrossthevariantswithinthegene • median.s:Medianofsacrossthevariantswithinthegene • sd.s:StandarddeviaBonofsacrossthevariantswithinthegene • cv.s:CoefficientofvariaBonofsacrossthevariantswithinthegene • liptak.s:Stouffer-LiptakcombinaBonofs(calledgonpreviousslides) • p.nom:Nominalp-value • fdr:Benjamini-Hochbergcorrectedp-value (Reminder:sistheeffectsize-basedteststaBsBcforeachSNVinagene). RunningGeneiASE–results Thenumberofgeneswithsignificant(fdr<0.05)ASEasdetectedby GeneiASEfrom16RNA-seqsamples(primarywhitebloodcells). Numberofgenes (with≥2SNVs) RunningGeneiASE–output Millionmappedhigh-qualityreads 9 2610//16 Thankyouforyourafen=on contact:[email protected] 10