* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Supplemental Material I
Oncogenomics wikipedia , lookup
Public health genomics wikipedia , lookup
Point mutation wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Gene desert wikipedia , lookup
Neocentromere wikipedia , lookup
Y chromosome wikipedia , lookup
Metagenomics wikipedia , lookup
Essential gene wikipedia , lookup
Transposable element wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Non-coding DNA wikipedia , lookup
Human genome wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Pathogenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genome editing wikipedia , lookup
X-inactivation wikipedia , lookup
Helitron (biology) wikipedia , lookup
Gene expression programming wikipedia , lookup
Ridge (biology) wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genomic imprinting wikipedia , lookup
Microevolution wikipedia , lookup
Genome evolution wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Designer baby wikipedia , lookup
Gene expression profiling wikipedia , lookup
Minimal genome wikipedia , lookup
Genomic library wikipedia , lookup
Supplemental Text 1. Sequence annotation of 10 BAC clones from wheat chromosome 3B: Gene prediction, description and synteny with rice. We conducted gene prediction analysis for the remaining 18.5% non-TEs and nonrepeated DNA, using different search programs (see Supplemental Method 1 for detailed annotation method). Genes of known and unknown functions, or putative genes were defined based on predictions and the existence of rice or other Triticeae homologs. Hypothetical genes were identified based on prediction programs only. Pseudogenes were not well predicted and frameshifts need to be introduced within the CDS structure to better fit a putative function based on BLASTX (mainly with rice). Truncated pseudogenes (genes disrupted by large insertion or deletion) and highly degenerated CDS sequences were considered as gene-relics. Combined together, all these types of gene sequence information (GSI) account for only 1.0% of the sequence and are present in seven BAC clones (one or two genes per clone) while the remaining three BAC clones (TA3B95C9, TA3B95G2, TA3B63N2) contain no genes (indicated in Figure 1A and detailed in Supplemental Text 1, Supplemental Table 3 and Supplemental Table 4). Six genes (of known and unknown functions), and 2 putative genes were detected on 5 of the BAC clones (indicated on Figure 1A and detailed in Supplemental Table 3): BAC clone TA3B63B13 contains two genes of known functions, one of which was incompletely sequenced (located on the end of the BAC clone), BAC clone TA3B81B7 one putative gene, BAC clone TA3B95F5 one putative and two other genes of unknown functions, BAC clone TA3B63C11 one known gene and BAC clone TA3B63E4 one incompletely sequenced gene of unknown function. Charles et al. Supplemental_Text-1 1 In addition to genes (of known or unknown functions) and putative genes, the search for sequence homologies between the whole 18.5% non-TE and non-repeated DNA sequences and the rice genome sequence (http://www.tigr.org/tdb/e2k1/osa1/), allowed us to detect several conserved sequences between wheat and rice. As summarized, one pseudogene and four gene-relics detected in (respectively) the BAC clones TA3B54F7 (one pseudogene), TA3B63B7 (two gene-relics), TA3B81B7 (one gene-relic) and TA3B63C11 (one gene-relic) (Supplemental Table 3), could not be predicted with the CDS prediction program (FGENESH), as they show frameshifts, stop mutations, TE insertions and/or large indels, and are probably no longer functional (Supplemental Table 2). Three of these five truncated genes (pseudogenes and gene-relics) have resulted from TEs insertions (Supplemental Table 3). The wheat chromosome 3B is homologous to the rice chromosome 1. For orthology and synteny analysis, we considered the rice chromosome 1 and its duplicated segments that are found on other chromosomes (GUYOT et al. 2004 and TIGR site http://www.tigr.org/tdb/e2k1/osa1/segmental_dup/). Three BAC clones (TA3B63B13, TA3B81B7, TA3B95F5) have one or two of their orthologous rice genes that can be mapped on the rice chromosome 1 and were considered as confirmed in their synteny (Table 1). It is interesting to note that the two genes of known functions, separated by 88,114 bp on the BAC clone TA3B63B13 (Figure 1A) have their respective orthologs separated by 22,816 bp on rice chromosome 1. Thus, for this intergenic region, there is four-fold size difference between rice and wheat since their divergence from a common ancestor. Three other BAC clones (TA3B54F7, TA3B63C11 and TA3B63E4) also have homologs on rice chromosome 1, but the best match was observed with genes mapped on Charles et al. Supplemental_Text-1 2 other rice chromosomes (Supplemental Table 3). BAC clone TA3B63B7 shows, for its putative gene and pseudogene, homologies with rice genes located on rice chromosome other than chromosome 1 (Supplemental Table 3). No GSI or orthologous rice regions could be assigned to the three remaining BAC clones (TA3B95C9, TA3B95G2, TA3B63N2). Finally 10 hypothetical genes were identified based on gene prediction only in the BAC clones TA3B54F7 (one), TA3B63B13 (two), TA3B81B7 (one), TA3B95F5 (four), TA3B63C11 (two) (Supplemental Table 4). Sources: GUYOT, R., and B. KELLER, 2004 610–614. Ancestral genome duplication in rice. Genome 47: JURKA, J., 2000 Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 16: 418–420. JURKA, J., P. KLONOWSKI, V. DAGMAN and P. PELTON, 1996 CENSOR: a program for identification and elimination of repetitive elements from DNA sequences. Comput. Chem. 20: 119–121. MCCARTHY, E. M., and J. F. MCDONALD, 2003 LTR_STRUC: a novel search and identification program for LTR retrotransposons. Bioinformatics 19: 362–367. SONNHAMMER, E. L., and R. DURBIN, 1995 A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. Gene 167: GC1–GC10. Charles et al. Supplemental_Text-1 3