Download Phylogeny slides

Phylogenetic Tree Construction and Related Problems Bioinformatics Problems    Multiple Sequence Alignment Construction of Phylogenetic Trees Determining Genomic Distance through Rearrangements Alignment of Multiple Sequences   We can extend the notion of alignment to multiple strings: -att-ct-ttactat-a-t An alignment of strings S1…Sn is described by strings S1’…Sn’   Each Si’ contains the characters of Si in order interspersed with spaces (-) No position exists that contain spaces for all Si’ Scoring a Multiple Alignment  Sum of pairs   Consider each pair in the set of sequences, determine the similarity score (using gap, match, and mismatch weights) for each pair, and add all pair-wise scores Distance from consensus   Consider all columns and count the total number of differences from the consensus character Variations: just count characters that differ from consensus or have a difference score for each differing character Multiple Alignment Problem    Formulation: Given sequences S1…Sn, obtain an optimal multiple alignment of the sequences “Optimal” depends on multiple alignment scoring method No known (correct) efficient algorithms for this problem Strategies   Brute force algorithm: consider all possible alignments, then determine the one that results in a best score (time complexity?) Common Heuristic: Use regular (two-string) alignment and then repeatedly add a string to a growing alignment   O(nm2) – where m is the the max string length Does not always produce an optimal alignment Phylogenetic Tree Construction    Multiple alignments are often performed on similar species Next step: Construct an evolutionary tree of these species Input to problem: evolutionary distances between each pair   Not the same as similarity score Example: edit distance--count number of indels needed to transform one string to the other Problem Formulation   Given a set of species and evolutionary distance between each pair (2d matrix of numbers), construct a phylogenetic tree consistent with the distances Details needed:   Characteristics/constraints of a tree Notion of “consistent” Phylogenetic Tree     Rooted Leaves are species Internal nodes are speculated ancestors Distances associated with each edge Example ancestor 1 4 ancestor 2 dog bat 3 cat Minimizing Distance Deviation    Phylogenetic tree implies pairwise distances (sum all edge distances in path) Compare with input distances to assess consistency Sum of Differences  Alternative computations for deviation (least squares) Example Suppose Input: D(dog,cat): 4 D(dog,bat): 7 D(cat,bat): 10 ancestor 1 4 ancestor 2 dog bat 3 cat Implied by Tree: D(dog,cat): 5 D(dog,bat): 7 D(cat,bat): 8 Difference = (5- 4) + (7-7) + (10-8) = 3 Character-based Tree Construction  Input:     a set C of m characters possible values for each character a set S of n species where each element of S is associated with a value for each character in C Output: a phylogenetic tree T such that    Elements in S are the leaves of T Each internal node of T has an assigned value per character Each character value induces a connected subtree of T Mutations on Genomes   Mutations can occur at the gene level (indels of nucleotides) or at the genome level (operations on genes) At the genome level, for similar species, the operations are rearrangements    Reversals Transpositions Translocations Genomes, Chromosomes, and Permutations     Genome: set of chromosomes Chromosome: sequence of genes Genes: unique across entire genome Can simplify the representation of a genome by arbitrarily concatenating the chromosomes  Result: a permutation of a set of unique genes Determining Genomic Distance   Recall: Phylogenetic tree construction need pair-wise distances For similar species with the same set of genes:   Can use number of rearrangement operations for distance Common distance model: obtain the fewest (optimal) rearrangements necessary to transform genome X to genome Y Some Assumptions Made  Can limit allowed rearrangements to some reasonable subset of possible arrangements   e.g., reversals apparently most common for plant species Sometimes, gene-blocks instead of individual genes are the elements that are permuted Problem Formulations  Given permutations (species) p and q of the set {1…n}, find the shortest sequence of rearrangements (mutations) that transform p to q   p.r1. r2 … rs = q Given a permutation p, find the shortest sequence of rearrangements that sort p  p.r1. r2 … rs = 123…n

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Phylogeny slides