Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bioinformatics Tools for Genotyping Frances Tong Dr. Garry Larson, Ph.D City of Hope Department of Molecular Medicine Southern California Bioinformatics Institute Summer 2003 Funded by the National Science Foundation and the National Institutes of Health 1 Overview of Summer Program Learn ASP and VBScript Learn the biology Programming Project I : writing code for mining of online genetic data Programming Project II : writing a program to graph linkage disequilibrium data 2 Intro to ASP & VBScript ASP : Microsoft Active Server Pages * server generated web pages * similar to CGI but easier * works well with databases VBScript : Microsoft Visual Basic Scripting * scripting language to enhance HTML web pages * default language of ASP 3 Hello World! Sample ASP file (one line only!) <% response.write (“Hello, World!”) %> 4 Genetic Mapping of ASPs ASPs : affected sibling pairs Identification of genes associated with cancer in patients and siblings who both have cancer (breast, prostate, lung or colon) Determine allele sharing statistics of susceptibility genes Look at gene-gene interactions => Provide information on a person’s genetic risk of developing cancer 5 DNA Marker Genotyping Genetic marker : polymorphic gene or section of DNA that has identifiable physical location on a chromosome used to trace inheritance Ex. Microsatellite and SNP markers CDC2d Microsatellite Amplicon ~ 230bp AGTCCCAAAGAAGATGAGAC agactaaaccatcaactggaagtgaaaaaaatatagtcattg >>>>>>>>>>>>>>>>>>>>>>>> Microsatellite repeat (ca)15 aaattaaaa cacacacacacacacacacacacacacacaca cacacacacacacacacacacacacacaca ctaggtgaaacaactttatagatggaacaactc ca tacagaaaaagaattcatgaattggaaaattatagtggggaattcacatagaatgcatcacaaagagcaaaatgaatt TCTAAGGGGCAAAGCAAAGCA <<<<<<<<<<<<<<<<<<<<<<<<<< 6 Programming Project I: Tag Selection For Markers Need unique way to identify markers (like social security numbers for people) Chromosome locations are relative and change frequently (UCSC) Use ASP to automate data mining to ease the generation of these unique 50 base-pair tags for each marker in database Tags will be used to locate markers in genome 7 UCSC Genome Browser 8 Marker Tag Selection Submit accession number for microsatellite Submit accession number for snp Submit sequence surrounding simple repeat 9 Output chromosome Sequence Sequence startend Link to UCSC browser position position Inputted sequence with repeats highlighted in blue 10 Choosing a 50bp tag Copy and paste here Send sequence to UCSC 11 UCSC Blat Results Blat is similar to BLAST : searches for alignment in genome 12 List of markers and their tags 13 Convert to FASTA format FASTA format: >name sequence program converts marker tag file into fasta format automatically 14 Check tag selection Program sends fasta file to UCSC Blat 15 Linkage Disequilibrium A condition where two polymorphisms are found together on the same chromosome at a greater frequency than that predicted from the product of their individual frequencies. 16 5’ 3’ G/A T/C G : 0.88 T : 0.75 A : 0.12 C : 0.25 5’ G T 5’ Two snps and their base frequencies 3’ (0.88)(0.75) = 0.66 3’ (0.88)(0.25) = 0.22 G C 5’ 3’ (0.12)(0.75) = 0.09 A T 5’ 3’ (0.12)(0.25) = 0.03 A C 17 Expected frequencies IF observed frequencies of 2 variants together > expected frequencies => LINKAGE DISEQUILIBRIUM Expected Frequencies Observed Frequencies G&T 0.66 0.54 G&C 0.22 0.20 A&T 0.09 0.24 A&C 0.03 0.02 A and T together are in linkage disequilibrium 18 A Quantitative Measure of LD One of the most common measures of 2 linkage disequilibrium is It is a squared correlation coefficient => the correlation of alleles at two sites. 1 2 Special case: (“perfect LD”) ~ Exactly two out of the four possible haplotypes are observed. ~ Markers NOT separated by recombination 19 Programming Project II Program that helps visualize linkage disequilibrium by graphing scores such as Each pair of markers has such a score => pairwise comparisons 2 Marker 1 Marker 2 Marker 3 0.7 Marker 1 2 Marker 2 0.7 Marker 3 1 1 0.2 0.2 Symmetric! 20 Sample data for graphing Read data by row: Pairwise comparison of marker 1 and marker 7 results in two different kinds of measurements 21 GOLD – Graphical Overview of Linkage Disequilibrium Existing program from the Univ. of Michigan to graph linkage disequilibrium http://www.sph.umich.edu/csg/abecasis/GOLD/ Graphs based on a chromosomal position scale Works very well for long range pattern analysis, but hard to distinguish each specific measurement. 22 Comparison of Program Output Same input file Output from GOLD Output from LD Color (my program) Difficult to see individual points on graph 23 Easier to distinguish individual points LD Color Program Program written in ASP to graphically depict linkage disequilibrium in human genetic data Color coded for specific numerical ranges of different measures of each pair-wise comparison of markers Complete program: 4 files ; >1,000 lines of code 24 Program Features Data input : file uploading or text pasting Allows for variable file formats for input User defined colors and ranges Switch between different measures of LD View actual data on graph or just the colors Change size of graph Option to select specific rows of data 25 Upload your file Paste data 26 Specify marker columns 27 Choose label for numerical data inputted 28 Choose measure of Specify which column the data is located linkage disequilibrium 29 Same as before => used to specify data for other side of diagonal 30 Choose to display data on graph 31 Choose different sizes for the graph 32 Select only the markers you want graphed by choosing rows Default : all are graphed 33 Specify the ranges for the colors you want graphed. 34 Manual 35 Color Legend 36 Sample: Symmetric 37 Sample: Big Size! 38 Sample: Data On, Asymmetric 39 Sample: Row Select 40 Future Directions LD Color Mouseover tag to each cell on graph to show marker id (Javascript) Ability to accept more kinds of file formats Better form validation and error checking More functionality and linking to outside sources 41 Acknowledgements Dr. Garry Larson, Ph.D Dave Ko City of Hope Senior Programmer Analyst Louis Geller City of Hope Senior Research Associate Dr. Ted Krontiris, M.D.,Ph.D Principal Investigator The rest of the Krontiris Lab Southern California Bioinformatics Institute: Dr. Jamil Momand, Dr. Nancy Warter-Perez, Dr. Sandra Sharp & Dr. Wendie Johnston, Jackie Leung & rest of SoCalBSI staff Fellow interns NSF & NIH 42