Download Applying AI to Human Genome

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Applying AI to Human Genome
Part 1 : Collecting data
Prof. M. Embrechts
Robert Bress
Bram Heyns
Overview





Basics of DNA
Collecting the data
Collection : my application
Perl
Goal
Basics of DNA






DNA = polymer of 4 molecules : bases or
nucleotides
A = Adenine , C = Cytosine , G = Guanine , T = Thymine
Replication ( copying ) and translation ( reading )
=> double helix : AT , GC ( copying )
3 letter combination = codon
RNA : U = Uracil in place of T => Transcribing
Protein = polymer composed of 20 amino acids
( reading )
=> more complex structure than DNA
Transition DNA RNA Protein
Intron – Exon - Splicejunction
• exon 200 characters  intron thousands
• 30,000 genes identified out of possible 100,000
• Identification gene
patent
Summary





Human : 23 chromosomes
Chromosomes
thousands of genes
Gene
info : exons , comments : introns
Exons and introns
codons
Codon
bases
Datacollection




Human Genome Project
NCBI website : http//www.ncbi.nlm.nih.gov
Entrez-Nucleotide.htm
NCBI Sequence Viewer.htm
Datacollection




Human Genome Project
NCBI website : http//www.ncbi.nlm.nih.gov
Entrez-Nucleotide.htm
NCBI Sequence Viewer.htm
Datacollection : my application
BioBrowser
Download HTML
ExtractLinks()
Download HTML - data
ExtractData()
TranslateData()
Datacollection : my application
BioBrowser
Download HTML
ExtractLinks()
Download HTML - data
ExtractData()
TranslateData()
Perl








Practical Extraction and Report Language
POD – files -> web
Portability
Free – CPAN modules
String manipilation
Extremely powerfull regex-engine
Glue language designed for short and simple tasks, not
equal to lack of power or “serious” features
Tutorial : http://www.netcat.co.uk/rob/perl/win32perltut.html
Regular Expression – Pattern Matching





Practical Extraction and Report Language
Scan through data and extract useful
information
m/PATTERN/ s/PATTERN/REPLACEMENT/
1 line Perl = 100 lines C or Java
Complex, but easy
Regex examples









/[KCZ]arl^sa/
/<I>/(.*?)<\/I>/i
$1,$2,…
i,g,c,…
.,*,+,?
/([0-9a-zA-Z])+/ or /([\w])+/
s/us[^a-z]/them/g or s/us\W/them/g
/([acc|act][ttt|ttc|att])/
TIMTOWTDT
Part 2 : Applying AI





Our choice : evolutionary computing
First part : identify exon part
Second part : identify splicejunctions
Third part : combine previous parts
Hope to reach +90% accuracy
Questions
?
Related documents