* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The SNP gff file is tab
Gene expression programming wikipedia , lookup
Designer baby wikipedia , lookup
Genomic imprinting wikipedia , lookup
Copy-number variation wikipedia , lookup
Gene expression profiling wikipedia , lookup
X-inactivation wikipedia , lookup
Genome (book) wikipedia , lookup
Microevolution wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
SNP gff3: Col 1: chromosome ID Col 2: source of result derived from (for SNP gff3 always "SoapSNP") Col 3: type of item (for SNP gff always "SNP") Col 4: start (SNP position) Col 5: end (certainly it is same to start in SNP) Col 6: quality score in phred unit Col 7: strand (always "+" because of method) Col 8: phase of SNPs, only available if the SNP is in the coding region Col 9: this field contain some sub-fields separated by space ID: the unique ID of a SNP. "rs***" is for SNP in dbSNP, “NOM1_” for novel SNPs. status: if the SNP is known or novel? "dbSNP" is for those in NCBI dbSNP dataset and "novel" for those not found in dbSNP ref: reference base of NCBI at the site allele: Diploid alleles of on this position support1: number of reads support for first allele. support2: number of reads support for second allele location: annotated region where the SNP located Indel gff3: Col 1: chromosome ID Col 2: source of result derived from (soap) Col 3: type of item (indel) Col 4: start position of indel Col 5: end position of indel. For deletions, the start position is defined as the position of first base lost and the end position is defined as the first base AFTER the deletion. Therefore, end subtracted by start will give the length of deletion. For insertions, the start and end is the same, the first base on the reference after the insertion event. Col 6: number of reads supporting the indel Col 7: strand information (“+”) Col 8: phase of indels, only available if the indel is in the coding region Col 9: this field contain some sub-fields separated by space ID: indel ID. “rs*” for those found in dbSNP and “NOM1_*” for novel ones. Status: if the status is a number, then it indicates the indel is found in dbSNP. The number itself indicate the relative deviation of coordinate comparing to the dbSNP. For example, status = 0 means the indel is found at the exact coordinate reported in dbSNP; status = -2 means the indel is found on the 2bp upstream of some dbSNP. The coordinate may have deviations from dbSNP because ambiguous method to determine the position. Type: negative numbers for deletion and positive for insertion. The absolute value is length of indel. location= annotated region where the indel located base = nucleotides of the insertion/deletion CNV gff3: Col 1: chromosome ID Col 2: source of result derived from Col 3: type of item Col 4: start position of indel Col 5: end position of indel Col 6: Quality, for CNV, this is marked as “*” Col 7: strand(always “+”) Col 8: phase(not available in CNV) Col 9: tags ID: the ID of CNV Type: type of CNV. “DupCNV” for duplications (extra copy number) and “DelCNV” for deletions (reduced copy number). DGV-variation: If the CNV exists in DGV variation database, then it will report the overlapped DGV with its DGV info; else, it will report as novel. DGV-indel: If the CNV exists in DGV indel database, then it will report the overlapped DGV with its DGV info; else, it will report as novel. mRNA: genes fallen into or partially fallen into the CNV regions. All genes overlapped with the CNV will be quoted into a pair of “”. The genes will be reported as “contained”, which means the whole gene is in the CNV element, or “broken” which means only a part of the gene is in the region. The genes are generally reported like “[Contained/Broken]: [refGene accession ID]: [full name of the gene];” ncRNA: non-coding RNA. The quoted part actually also is a gff annotation of ncRNA. It will be like this: ”[Contained/Broken]: [Annotation of ncRNA in gff format]”. Transposons: a sub-annotation in gff3 format will be quoted. The fields Julie have mentioned in previous mail are actually defined by RepeatMasker: “div” for divergence % comparing to repeat consensus used in RepeatMasker, “ins” is inserted % and “del” is deleted %. Tandem: a sub-annotation in gff3 format will be quoted. The annotions is similar to tranposons.