Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Neural Network Applications
•
•
•
•
Problems
Input transformation
Network Architectures
Assessing Performance
Lecture 10, CS567
1
Problems
• Deducing the genetic code
• Predicting genes
• Predicting signal peptide cleavage sites
Lecture 10, CS567
2
Deducing the genetic code
• Problem: Given a codon, predict corresponding amino acid
• Of didactic value
– Trivial mapping table, after-the-fact
• Perfect classification problem, rather than prediction
– With minimal network
• Learning issues
– ‘Similar’ codons code for ‘similar’ amino acids
– Abundance of amino acids proportional to code
redundancy (this and previous point undermine effect of
mutations)
– Third base ‘wobble’
– N:1 mapping between codon and amino acid
Lecture 10, CS567
3
The genetic code
T
C
A
G
TTT Phe (F)
TTC "
T TTA Leu (L)
TTG "
TCT Ser (S)
TCC "
TCA "
TCG "
TAT Tyr (Y)
TAC
TAA Ter
TAG Ter
TGT Cys (C)
TGC
TGA Ter
TGG Trp (W)
CTT Leu (L)
CTC "
C CTA "
CTG "
CCT Pro (P)
CCC "
CCA "
CCG "
CAT His (H)
CAC "
CAA Gln (Q)
CAG "
CGT Arg (R)
CGC "
CGA "
CGG "
ATT Ile (I)
ATC "
A ATA "
ATG Met (M)
ACT Thr (T)
ACC "
ACA "
ACG "
AAT Asn (N)
AAC "
AAA Lys (K)
AAG "
AGT Ser (S)
AGC "
AGA Arg (R)
AGG "
GTT Val (V)
GTC "
G GTA "
GTG "
GCT Ala (A)
GCC "
GCA "
GCG "
GAT Asp (D)
GAC "
GAA Glu (E)
GAG "
GGT Gly (G)
GGC "
GGA "
GGG "
http://molbio.info.nih.gov/molbio/gcode.html
Lecture 10, CS567
4
Network Architecture
• Orthogonal coding (4X3)
•  2 hidden neurons (Is this a linear or non-linear
problem?)
• 20 output neurons
– Winner takes all
• Total of  86 parameters (How?)
• FFBP
Lecture 10, CS567
5
Deducing the genetic code (Fig 6.7)
Lecture 10, CS567
6
Deducing the
genetic code
(Fig 6.8)
Lecture 10, CS567
7
Improving classification error
• Training rate high for misclassified codons,
low otherwise (in addition to iteration
dependence)
• Balanced cycles (Balanced in terms of
amino acids, not codons)
• Adaptive training
– Present mis-classified examples more often
Lecture 10, CS567
8
Is it a gene or not a gene?
• Approaches depend on
– Bias at junctions of coding and non-coding regions
• Donor (5’ end of intron) and acceptor sites (3’ end of intron) have
biases in composition (GT [junk]+ C/U+ AG)
– Bias in composition of coding regions (but not of noncoding regions, eg, introns)
• Exons are “regular guys”, introns are “freshman dorm rooms”
• Seen as GC bias, codon usage frequency and codon bias
– Inverse relationship between the two (splice site strength
and regularity within exons)
• “Food exit sign on highway doesn’t need prominent restaurant
signs”
• “Stretch of prominent restaurant signs doesn’t need a sign
indicating food”
Lecture 10, CS567
9
Regularity within coding regions (Fig 6.11)
Bacteria
C. elegans
Lecture 10, CS567
Mammals
A. thaliana
10
Predicting Exons: The holy GRAIL
• Neural networks for gene prediction
– Input representation/transformation key
– NN per se trivial: MLP with single hidden layer and single
output neuron
– Input = Coding region candidate, transformed to
•
•
•
•
•
•
•
6mer (di-codon) score of candidate region
6mer (di-codon) score of flanking regions
GC composition of candidate region
GC composition of flanking region
Markov model score
Length of candidate
Splice site score
Lecture 10, CS567
11
Signal peptide (SignalP) prediction
• Signal peptides are N-terminal subsequences in
proteins that are “export tags” including a “dotted
line” (cleavage site) indicating point of detachment
– Coding is species specific
• Problem analogous to exon/intron delineation
– Distinguish between signalP and rest of protein
– Find junction between signalP and rest of protein
Lecture 10, CS567
12
Signal peptide (SignalP) prediction
• Two kinds of network that output, for each position,
– S-score: Probability of classification as signal peptide
– C-score: Probability of being the junction
• Key is post-processing – using S and C scores to come up
with final prediction
• C-score prediction: Based on Asymmetric windows (why?)
• S-score prediction: Based on Symmetric windows (why?)
• Y-score = (CidSi)1/2 where dSi = Average difference in Si
in windows of size d flanking position i
Lecture 10, CS567
13
Signal peptide (SignalP) prediction (Fig 6.5)
S
S
Lecture 10, CS567
14
Related documents