* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Models for Structural and Numerical Alterations in Cancer
Mitochondrial DNA wikipedia , lookup
Gene nomenclature wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Neocentromere wikipedia , lookup
Gene therapy wikipedia , lookup
Y chromosome wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genetic engineering wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Transposable element wikipedia , lookup
Ridge (biology) wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Non-coding DNA wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Whole genome sequencing wikipedia , lookup
X-inactivation wikipedia , lookup
Genomic imprinting wikipedia , lookup
Copy-number variation wikipedia , lookup
Gene desert wikipedia , lookup
History of genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
Gene expression profiling wikipedia , lookup
Oncogenomics wikipedia , lookup
Pathogenomics wikipedia , lookup
Human genome wikipedia , lookup
Genomic library wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene expression programming wikipedia , lookup
Helitron (biology) wikipedia , lookup
Microevolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Designer baby wikipedia , lookup
Human Genome Project wikipedia , lookup
Genome editing wikipedia , lookup
Genome (book) wikipedia , lookup
Minimal genome wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15 Genome rearrangements Motivation I: evolution Human genome project Motivation II: cancer MCF-7 breast cancer cell-line Normal karyotype NCI, 2001 Definitions: gene • A gene – oriented segment: • A gene has two extremities: head and tail. • Positive: tailhead; Negative: headtail. Definitions: chromosome • Chromosome is a series of consecutive genes. • 2 consecutive extremities form an adjacency. • A telomere is an extremity that is not part of an adjacency. • Circular chrom. has no telomeres. Linear chrom. has 2 telomeres. Definitions: genome • A genome is a set of chromosomes. • Equivalently, a genome is a set of adjacencies. Π {ah , bh },{bt , ch },{dh , f h },{ ft , et } • Ordinary genome has one copy of each gene. Otherwise duplicated. GR distance problem • Distance dop(Π,Σ) – minimal number of operations between genomes Π and Σ. • Operations: – Reversals – Translocations – Transpositions – Others… The SCJ model • SCJ – Single Cut or Join (Feijão,Meidanis 11): – Cut an adjacency to 2 telomeres. – Join 2 telomeres to an adjacency. cut join • Simple and practical model. • Reflects evolutionary distance (Biller et al. 13) Models with multiple gene copies • Most models with multiple gene copies are NP-hard. • Not many models allow duplications or deletions. • Many normal and cancer genomes have multiple gene copies. The SCJD model • A duplication takes a linear chromosome and produces an additional copy of it. abc abc, abc • An SCJD operation is either a cut, or a join or a duplication. The SCJD distance • The minimal number of SCJD operations that transform an ordinary genome into a duplicated genome. Results outline • Characterize optimal solution structure. • Give a distance optimization function. • Solve the optimization problem. • Study the number of duplications in optimal scenario. SCJD optimal scenario structure • Theorem: There exists an optimal SCJD sorting scenario, consisting, in this order, of – SCJ operations on single-copy genes. – Duplications. – SCJ operations acting on duplicated genes. SCJs ' duplications 2 ' SCJs Proof outline • An SCJ operation acts on extremities on 2 duplicated genes or 2 unduplicated genes. • Preempting SCJ on unduplicated genes keeps a valid sorting scenario. • Preempt duplications while scenario is valid. Corollary: SCJD distance • Write the distance as a function of Γ’. • Find Γ’ that minimizes the distance. η – higher score for adj. in Γ and Δ Distance optimization solution • The following genome maximizes H: ' { | ( ) 0} • If Γ not linear, remove an adjacency with η=1 from each circular chromosome in Γ’ to obtain Γ’’. • Theorem: SCJD distance is computable in linear time. Controlling the number of duplications • Duplications are more “radical” events than cut or join. • Lemma: Our algorithm gives an optimal sorting scenario with a maximum number of duplications. Optimal solutions can have different numbers of duplications Minimizing duplications is hard • Theorem: Finding an optimal SCJD sorting scenario with a minimum number of duplications is NP-hard. • Reduction from Hamiltonian path problem on a directed graph with in/out degree 2. Proof outline • For a 2-digraph G and two vertices x, y, there is an Eulerian path P:xy. • Create a duplicated genome Σ from P and an empty genome Π. • Add auxiliary genes and k copies of Σ, Π. • There is a Hamiltonian path xy in G iff there is an optimal sorting scenario with k duplications. Summary • Genome rearrangements are important. • Problems with multiple gene copies are hard. • SCJD – allows SCJ and duplications: – Linear algorithm for the SCJD distance. – Study the number of duplications in optimal solution. • We hope to generalize the model and apply it on cancer data. Thank You!