Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
List of types of proteins wikipedia , lookup
Biochemistry wikipedia , lookup
Network motif wikipedia , lookup
Expanded genetic code wikipedia , lookup
Western blot wikipedia , lookup
Molecular evolution wikipedia , lookup
Genetic engineering wikipedia , lookup
“Genetic algorithm-based optimization of hydrophobicity tables” by Moti Zviling, Hadas Leonov and Isaiah T. Arkin presented by: Nam Tonthat Background: Membrane Proteins ● ● ● ● ● constitute 20 to 35% of a genome part of the signal transduction pathway major target for pharmaceutical agents hard to crystallize due to their hydrophobic segments importance of membrane protein promotes effort to predict their presence with sequence information Background: Hydropathy Analysis ● Kyte & Doolittle used hydropathy analysis to predict the trans-membrane regions of bacteriorhodopsin (1982) ● developed the hydrophobicity scale ● scale was improved upon by using – the water-vapor free energy transfer – interior-exterior distribution of amino acids – free energy of amino acids when transferred from water to oil Background: Predicting with HMM & NN ● ● ● Hidden Markov models and neural networks are popular because of their probabilistic nature generally have a higher level of accuracy in comparison to hydropathy analysis HMM, NN, and GA are old concepts that have recently moved from theory and implemented in biological research Background: Genetic Algorithm (GA) ● Origin: John Holland and his colleague at the University of Michigan ● a search technique used to find approximate solutions ● use techniques inspired by evolutionary biology – ● inheritance, mutation, natural selection, and recombination General algorithm: – Choose initial population – Evaluate the fitness of the population – Select the “best” individuals to reproduce – Apply crossover and mutation operator – stop if algorithm converges, else repeat Goal ● To show that by applying a genetic algorithm to the existing hydrophobicty tables, they can improve the success rate of hydropathy analysis in predicting alpha helical membrane proteins. Methods: Constructing the Datasets ● ● consisted of alpha helical membrane and water soluble proteins selected proteins with unambiguous topology assignment, so that the training set will not bias due to an abundance of a certain topology ● ratio of 1:3 ● Training Set=> 90% ● – learning set: 90% – validation set: 10% Testing Set=> 10% Methods: Matthew's Correlation Coefficient ● ● ● used as a measure of predictive power ranges from -1 ≤ C ≤ 1 worst=> -1 , best=>1, random=>0 Methods: Genetic Algorithm Scheme ● ● ● ● input: 2 hydrophobicity tables – Kyte-Doolittle scale (Kyte and Doolittle, 1982) – Goldman-Engelman-Steitz scale (Engelman et al., 1986). The 2 tables are then bred to create 20 random tables. Read in the dataset and create a Final Testing Set (10%) and a Learning Set (90%) Learning set is partitioned into a Training Set (90%) and Methods: Genetic Algorithm Scheme ● ● ● ● Each table is then evaluated against the Training Set Best 2 tables are chosen Cross validation with the Validation Set Success: If the calculated C value is greater than the C value of the previous round – ● current 2 tables are used Failure: – previous 2 tables will be chosen Methods: Genetic Algorithm Scheme ● test for convergence: – no ● – the process will be repeated for one more generation yes: ● the algorithm will stop ● select the best 200 tables ● evaluate against the Final Test Set Methods: Population Generation Process ● crossing over event: – ● ● mutation event: – the number of mutation event is picked randomly – ±.05 who to replace: – ● the number of crossing over and positions are picked randomly C < .5 rate of replacement: – 20%-80%, replace with randomized tables Results ● are statistically based methods better? – depends on the person testing – depends on the training and datasets – HMM, NN, and GA are only as good as the person who wrote them Sources ● “Genetic Algorithm”. Wikipedia, the free encyclopedia. July 1, 2005. <http://en.wikipedia.org/wiki/Genetic_algorithm> ● “Introduction to Genetic Algorithms”, Matthew Wall. July 1, 2005. <http://lancet.mit.edu/~mbwall/presentations/IntroToGAs/> ● Moti Zviling, Hadas Leonov and Isaiah T. Arkin. “Genetic algorithm-based optimization of hydrophobicity tables.” Bioinformatics Vol 21 no. 11 (2005): 2651-2656.