* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download Models for Structural and Numerical Alterations in Cancer
Mitochondrial DNA wikipedia , lookup
Gene nomenclature wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Neocentromere wikipedia , lookup
Gene therapy wikipedia , lookup
Y chromosome wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genetic engineering wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Transposable element wikipedia , lookup
Ridge (biology) wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Non-coding DNA wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Whole genome sequencing wikipedia , lookup
X-inactivation wikipedia , lookup
Genomic imprinting wikipedia , lookup
Copy-number variation wikipedia , lookup
Gene desert wikipedia , lookup
History of genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
Gene expression profiling wikipedia , lookup
Oncogenomics wikipedia , lookup
Pathogenomics wikipedia , lookup
Human genome wikipedia , lookup
Genomic library wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene expression programming wikipedia , lookup
Helitron (biology) wikipedia , lookup
Microevolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Designer baby wikipedia , lookup
Human Genome Project wikipedia , lookup
Genome editing wikipedia , lookup
Genome (book) wikipedia , lookup
Minimal genome wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Sorting by Cuts, Joins and Whole
Chromosome Duplications
Ron Zeira and Ron Shamir
Combinatorial Pattern Matching 2015
30.6.15
Genome rearrangements
Motivation I: evolution
Human genome project
Motivation II: cancer
MCF-7 breast cancer cell-line
Normal karyotype
NCI, 2001
Definitions: gene
• A gene – oriented segment:
• A gene has two extremities: head and tail.
• Positive: tailhead; Negative: headtail.
Definitions: chromosome
• Chromosome is a series of consecutive genes.
• 2 consecutive extremities form an adjacency.
• A telomere is an extremity that is not part of
an adjacency.
• Circular chrom. has no telomeres. Linear
chrom. has 2 telomeres.
Definitions: genome
• A genome is a set of chromosomes.
• Equivalently, a genome is a set of adjacencies.
Π {ah , bh },{bt , ch },{dh , f h },{ ft , et }
• Ordinary genome has one copy of each gene.
Otherwise duplicated.
GR distance problem
• Distance dop(Π,Σ) – minimal number of
operations between genomes Π and Σ.
• Operations:
– Reversals
– Translocations
– Transpositions
– Others…
The SCJ model
• SCJ – Single Cut or Join (Feijão,Meidanis 11):
– Cut an adjacency to 2 telomeres.
– Join 2 telomeres to an adjacency.
cut
join
• Simple and practical model.
• Reflects evolutionary distance (Biller et al. 13)
Models with multiple gene copies
• Most models with multiple gene copies are
NP-hard.
• Not many models allow duplications or
deletions.
• Many normal and cancer genomes have
multiple gene copies.
The SCJD model
• A duplication takes a linear chromosome and
produces an additional copy of it.
abc abc, abc
• An SCJD operation is either a cut, or a join or a
duplication.
The SCJD distance
• The minimal number of SCJD operations that
transform an ordinary genome into a
duplicated genome.
Results outline
• Characterize optimal solution structure.
• Give a distance optimization function.
• Solve the optimization problem.
• Study the number of duplications in optimal
scenario.
SCJD optimal scenario structure
• Theorem: There exists an optimal SCJD sorting
scenario, consisting, in this order, of
– SCJ operations on single-copy genes.
– Duplications.
– SCJ operations acting on duplicated genes.
SCJs
'
duplications
2 '
SCJs
Proof outline
• An SCJ operation acts on extremities on 2
duplicated genes or 2 unduplicated genes.
• Preempting SCJ on unduplicated genes keeps
a valid sorting scenario.
• Preempt duplications while scenario is valid.
Corollary: SCJD distance
• Write the distance as a function of Γ’.
• Find Γ’ that minimizes the distance.
η – higher score for adj. in Γ and Δ
Distance optimization solution
• The following genome maximizes H:
' { | ( ) 0}
• If Γ not linear, remove an adjacency with η=1
from each circular chromosome in Γ’ to obtain
Γ’’.
• Theorem: SCJD distance is computable in
linear time.
Controlling the number of duplications
• Duplications are more “radical” events than
cut or join.
• Lemma: Our algorithm gives an optimal
sorting scenario with a maximum number of
duplications.
Optimal solutions can have different
numbers of duplications
Minimizing duplications is hard
• Theorem: Finding an optimal SCJD sorting
scenario with a minimum number of
duplications is NP-hard.
• Reduction from Hamiltonian path problem on
a directed graph with in/out degree 2.
Proof outline
• For a 2-digraph G and two vertices x, y, there
is an Eulerian path P:xy.
• Create a duplicated genome Σ from P and an
empty genome Π.
• Add auxiliary genes and k copies of Σ, Π.
• There is a Hamiltonian path xy in G iff there
is an optimal sorting scenario with k
duplications.
Summary
• Genome rearrangements are important.
• Problems with multiple gene copies are hard.
• SCJD – allows SCJ and duplications:
– Linear algorithm for the SCJD distance.
– Study the number of duplications in optimal
solution.
• We hope to generalize the model and apply it
on cancer data.
Thank You!