Download manual of aliquotG

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Neocentromere wikipedia , lookup

Oncogenomics wikipedia , lookup

Genetic engineering wikipedia , lookup

X-inactivation wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

RNA-Seq wikipedia , lookup

Segmental Duplication on the Human Y Chromosome wikipedia , lookup

Gene desert wikipedia , lookup

Gene wikipedia , lookup

History of genetic engineering wikipedia , lookup

Polyploid wikipedia , lookup

Microevolution wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene expression programming wikipedia , lookup

Transposable element wikipedia , lookup

Public health genomics wikipedia , lookup

NUMT wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Copy-number variation wikipedia , lookup

Designer baby wikipedia , lookup

Median graph wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Pathogenomics wikipedia , lookup

Genome (book) wikipedia , lookup

Helitron (biology) wikipedia , lookup

Minimal genome wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genomics wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Human genome wikipedia , lookup

Genomic library wikipedia , lookup

Human Genome Project wikipedia , lookup

Genome editing wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
manual of aliquotG
December 5, 2011
1
Installation and Usage
The program is designed to solve the Genome Aliquoting Problem (see our article in Citation). That’s to
reconstruct the genome (Gdup ) just after WGD (whole genome duplication) from an extant rearranged
duplicated genome. It is designed on a Linux platform (Ubuntu 10.04 by us). To install it, open a
terminal, extract all files into a folder, change directory to the folder by typing:
cd the folder
then type the following command to install it:
make
Now you will find the executable file aliquotG in ”the folder /bin/” and you can run it in the directory.
Usage:
aliquotG -i [infile] -o [outfile] <option>
Option:
–nd N
set the duplicate size as N
–d Depth set the search depth, large value will increase the run time
(recommend value 1—5)
Infile Format:
file include fasta like sequence. Sequence name begins with a ’>’ and contain only one line. The
name is separated into two part by ’|’, first is the species name, the second is chromosome’s name (or
scaffold’s name). Lines following each ’>’ is the sequence of the corresponding chromosomes, each
is represented by a sequence of signed nature integers. Examples is showed in the program files.
2
Method Summary
We implement the program using a heuristic algorithm. The process consists of three steps: (1) infer
strong adjacencies of the labeled perfectly duplicated genome Gdup ; (2) infer weak adjacency; (3) remove
circular chromosome and calculated the DCJ distance.
We denote the partial graph of a genome G as PG(G). In step 1, we calculate the weight for each
edge as the multiplicity of each edge in the partial graph, and ignore all edge whose weight is only 1.
Then we use the maximum weighted matching to infer a maximum match, and for each pair of matched
vertices, add an edge connecting them into a graph PG(H) (i.e. the partial graph of genome H. H is empty
initially, and is the result Gdup at last), and assign a weight r to the edge (where r is the duplicated
size, or number of genes of each gene family). And we label and contract all matched pair (see article in
citation).
In step 2, we assign a new weight pair (Np , Lp ) to each pairs of unmatched vertices (or use ’–d’ option
to constrain that the shortest path between the two vertices is 6Depth). Then use the same labeling
and contracting process as in step 1.
In step 3, we transform all circular chromosome in H to linear ones. And calculate the DCJ distance.
An example is show as follows:
1
Figure 1. An Example of the algorithm. Black edge: edge in Gobs or G. Gray dashed edge: edge
in Gdup or H. Top. Inferring strong adjacencies: each normal nature integers(gene family ID) represents a gene family, while the subscript(copy ID) represents different gene in the same gene family.
Gray shadow ellipses with same gray level indicate the adjacencies and corresponding edges in the partial
graph (top right). Cyan shadow highlight a strong adjacency (or edge). Red vertices are matched. Thick
black is the contracted edge. Black number on each edge indicate the corresponding copy ID in genome
Gobs . Middle. Inferring weak: Blue or light blue numbers are the weights Np, Lp for the pair of
2
vertices linked by gray dot edge. Here Depth is set to 1. Other symbols are the same as Top. Bottom.
Reconstructing Genome Gdup : Result genome is on the right.
3
Citation
This paper describe the program:
Zelin Chen, Shengfeng Huang, Yuxin Li and Anlong Xu. 2011. An improved heuristic algorithm for
genome aliquoting.
3