* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Models for Structural and Numerical Alterations in Cancer
Mitochondrial DNA wikipedia , lookup
Gene nomenclature wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Neocentromere wikipedia , lookup
Gene therapy wikipedia , lookup
Y chromosome wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genetic engineering wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Transposable element wikipedia , lookup
Ridge (biology) wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Non-coding DNA wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Whole genome sequencing wikipedia , lookup
X-inactivation wikipedia , lookup
Genomic imprinting wikipedia , lookup
Copy-number variation wikipedia , lookup
Gene desert wikipedia , lookup
History of genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
Gene expression profiling wikipedia , lookup
Oncogenomics wikipedia , lookup
Pathogenomics wikipedia , lookup
Human genome wikipedia , lookup
Genomic library wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene expression programming wikipedia , lookup
Helitron (biology) wikipedia , lookup
Microevolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Designer baby wikipedia , lookup
Human Genome Project wikipedia , lookup
Genome editing wikipedia , lookup
Genome (book) wikipedia , lookup
Minimal genome wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Sorting by Cuts, Joins and Whole
Chromosome Duplications
Ron Zeira and Ron Shamir
Combinatorial Pattern Matching 2015
30.6.15
Genome rearrangements
Motivation I: evolution
Human genome project
Motivation II: cancer
MCF-7 breast cancer cell-line
Normal karyotype
NCI, 2001
Definitions: gene
• A gene – oriented segment:
• A gene has two extremities: head and tail.
• Positive: tailhead; Negative: headtail.
Definitions: chromosome
• Chromosome is a series of consecutive genes.
• 2 consecutive extremities form an adjacency.
• A telomere is an extremity that is not part of
an adjacency.
• Circular chrom. has no telomeres. Linear
chrom. has 2 telomeres.
Definitions: genome
• A genome is a set of chromosomes.
• Equivalently, a genome is a set of adjacencies.
Π {ah , bh },{bt , ch },{dh , f h },{ ft , et }
• Ordinary genome has one copy of each gene.
Otherwise duplicated.
GR distance problem
• Distance dop(Π,Σ) – minimal number of
operations between genomes Π and Σ.
• Operations:
– Reversals
– Translocations
– Transpositions
– Others…
The SCJ model
• SCJ – Single Cut or Join (Feijão,Meidanis 11):
– Cut an adjacency to 2 telomeres.
– Join 2 telomeres to an adjacency.
cut
join
• Simple and practical model.
• Reflects evolutionary distance (Biller et al. 13)
Models with multiple gene copies
• Most models with multiple gene copies are
NP-hard.
• Not many models allow duplications or
deletions.
• Many normal and cancer genomes have
multiple gene copies.
The SCJD model
• A duplication takes a linear chromosome and
produces an additional copy of it.
abc abc, abc
• An SCJD operation is either a cut, or a join or a
duplication.
The SCJD distance
• The minimal number of SCJD operations that
transform an ordinary genome into a
duplicated genome.
Results outline
• Characterize optimal solution structure.
• Give a distance optimization function.
• Solve the optimization problem.
• Study the number of duplications in optimal
scenario.
SCJD optimal scenario structure
• Theorem: There exists an optimal SCJD sorting
scenario, consisting, in this order, of
– SCJ operations on single-copy genes.
– Duplications.
– SCJ operations acting on duplicated genes.
SCJs
'
duplications
2 '
SCJs
Proof outline
• An SCJ operation acts on extremities on 2
duplicated genes or 2 unduplicated genes.
• Preempting SCJ on unduplicated genes keeps
a valid sorting scenario.
• Preempt duplications while scenario is valid.
Corollary: SCJD distance
• Write the distance as a function of Γ’.
• Find Γ’ that minimizes the distance.
η – higher score for adj. in Γ and Δ
Distance optimization solution
• The following genome maximizes H:
' { | ( ) 0}
• If Γ not linear, remove an adjacency with η=1
from each circular chromosome in Γ’ to obtain
Γ’’.
• Theorem: SCJD distance is computable in
linear time.
Controlling the number of duplications
• Duplications are more “radical” events than
cut or join.
• Lemma: Our algorithm gives an optimal
sorting scenario with a maximum number of
duplications.
Optimal solutions can have different
numbers of duplications
Minimizing duplications is hard
• Theorem: Finding an optimal SCJD sorting
scenario with a minimum number of
duplications is NP-hard.
• Reduction from Hamiltonian path problem on
a directed graph with in/out degree 2.
Proof outline
• For a 2-digraph G and two vertices x, y, there
is an Eulerian path P:xy.
• Create a duplicated genome Σ from P and an
empty genome Π.
• Add auxiliary genes and k copies of Σ, Π.
• There is a Hamiltonian path xy in G iff there
is an optimal sorting scenario with k
duplications.
Summary
• Genome rearrangements are important.
• Problems with multiple gene copies are hard.
• SCJD – allows SCJ and duplications:
– Linear algorithm for the SCJD distance.
– Study the number of duplications in optimal
solution.
• We hope to generalize the model and apply it
on cancer data.
Thank You!