* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Appendix A: Gene Annotation
Gene therapy of the human retina wikipedia , lookup
Gene desert wikipedia , lookup
Pathogenomics wikipedia , lookup
Metagenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Ridge (biology) wikipedia , lookup
Minimal genome wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Genomic imprinting wikipedia , lookup
Genome (book) wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Microevolution wikipedia , lookup
Genome evolution wikipedia , lookup
Gene nomenclature wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Point mutation wikipedia , lookup
Protein moonlighting wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Gene expression programming wikipedia , lookup
Designer baby wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Helitron (biology) wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Appendix A: Gene Annotation In some cases, sequences were found to be incorrectly annotated by gene prediction programs. Identification of the SVC family has allowed us to re-evaluate some of these annotations. The changes described below have been communicated to FlyBase and will be reflected in a future release (T. Sheldon, personal communication to FlyBase). 1. The protein product for Dm gene CG12677 was replaced by a new sequence in 2006, yet the original fitted the family in terms of its canonical sequence identity, intron phasing and signal peptide. The unusual response pattern of its gene is also consistent with the expression analysis for the other Dm genes (Table 1). FlyBase will subsequently reinstate the original translation under a new symbol and ID number. 2. The C-terminus of D. melanogaster protein CG6301 is identified as a family member. It conforms to the canonical sequence, despite the existing record showing a single protein of 463 amino acids. Rather than an isolated case of a family member embedded as a domain in a larger protein, closer inspection of the deposited cDNA fragments for the record show that the N-terminus section has a poly-A tail, indicating that the predicted protein should in fact be two. In support of this claim is the Pfam entry for DUF745, to which a portion of the N-terminus of CG6301 (residues 94-302) is correctly predicted to belong, but where the C-terminus sequence fragment (containing the first of the cysteine residues) is incorrect. This 192 residue C-term fragment forms the true SVC, and consistent with the rest of the family it has a short signal peptide at its N-terminus indicating that it is secreted. 3. Dm gene CG14132 is currently annotated without a signal peptide sequence. A 5’ upstream coding exon has now been found and the sequence will be updated, bringing it into line with the predicted sequence RE17110p in GenBank. 4. Dm protein BK003543 is predicted in GenBank but the only corresponding cDNA record in FlyBase may be artefactual given its lack of introns. In support of its existence, however, it receives a Blast e-value of 3e-5 against protein CG6301, and displays the canonical pattern common to all SVC proteins; so for this paper it is predicted to be a genuine family member and has been kept in the alignment. BK003543 corresponds to gene identifier HDC03087. 5. Ixodes protein AAT92199 appears to be missing one of the conserved cysteines. The sequence was confidently identified by PsiBlast as a family member as it is very similar to AAT92107, so it has been included in the alignment. Similarly, Buthus protein AAK61816 appears to be truncated at its N-terminus. Both AAT92199 and AAK61816 are conceptual translations so their amino acid sequences cannot be verified. Appendix B In situ analysis of SVC genes In situ hybridization was carried out in whole, mixed-stage embryos and dissected L1, L2 and L3 larval tissue to ensure all developmental stages and tissues were represented. Corresponding sense probes were used for every tested SVC gene antisense probe. To rule out technical problems as the reason for our inability to detect a signal in most cases, we misexpressed several SVC transcripts in a tissuespecific pattern in both embryos and larvae using our transgenic UAS lines, and we detected strong expression using the same in situ procedure and the same probes used to determine their wild-type expression (Figure 4J-L, see Appendix E for details). A Paired-Gal4 driver [1] was used to drive expression in stripes in the embryos, and a pumpless-Gal4 driver [2] was used to express SVCs in larval fat body, gut and several other larval tissues. Generation and phenotypic analysis of transgenic SVC lines Transgenic lines for four SVC genes were generated. CG2081 and CG2444 were chosen because they are two of the infection-induced SVCs not expressed during normal development; CG15203 because of its transient expression in the fat body (a key tissue in immunity and inter-tissue communication); and CG15201 because of its lack of expression and its apparent absence of involvement in any biological function. The coding sequence of these four SVC genes was amplified from cDNA clones and subcloned into the pUAST vector using its XhoI/XbaI or EcoRI/XbaI sites. The cDNAs used as templates were as follows: GH22911 (CG2081), LP01642 (CG2444), RE35169 (CG15203) and RH06304 (CG15201). Primer sequences were: CTCGAGCAAAACATGGAGTCAATTAGCAGCATGATT and TCTAGATCACACCTGATACTTCCTCGCC (CG2081), GAATTCAAAACATGTCGCAGTTTAGCACCGTTGC and TCTAGATTAGATGTGCTTGTCGCAATTGTATTTC (CG2444), CTCGAGCAAAACATGCATCCGGAACGCTACGCC and TCTAGACTAGACCTTCTCCGCCGTATTC (CG15203) and GAATTCAAAACATGCACAACCGTTGCGGATCC and TCTAGATTAGGGACACTTGAGGCAGCAG (CG15201). Tubulin-Gal4 P{tubP-GAL4} [3], a ubiquitous driver, was used to force constitutive expression of all four genes in all tissues during development. Transgene expression was verified by crossing transgenic lines to paired-Gal4 and pumpless-Gal4 drivers and detecting expression of the relevant transcript in embryonic stripes or in the pumpless larval pattern, respectively, by means of in situ hybridisation (Figure 4J-L). Developmental rate of misexpression larvae was compared to wild-type control with the driver or the UAS line alone, and was found to be the same. The number of eclosed flies from the tub>>SVC crosses was compared to that of crosses where the driver line or the UAS line alone was outcrossed to wild-type flies. No differences in developmental rate or viability were observed. Same-age, same-sex flies were weighted to assess body-size and no differences were observed (Supplemental Figure 1). Adult flies mis-expressing SVC genes were intercrossed and they were found to be fully fertile. Appendix C: Analysis of Cysteine Spacing An examination of the spacing between cysteine residues in the SVCs and the 679 10cysteine VWC domains listed in the Pfam database [4] was performed. Although there is considerable variation in the spacing within both groups (Supplemental Table 1), the spacings observed in the SVC proteins fall within the ranges observed in the Pfam group in 5 of the 7 regions. In the case of C1-C2, the mean spacing observed in the Pfam group is 18.6 with a standard deviation of 2.9, compared to a mean spacing of 20.3 in the SVC proteins. In the case of C7-C8 there are only two sequences with a spacing higher than that observed in the Pfam group (CG6301 and BK003543, Figure 1A). Given that these are the only two sequences to have an extra C-terminus cysteine well within the Pfam range, the spacing may be artificially high due to misalignment. Four of the 717 Pfam VWC domains have only eight cysteine residues, consistent with the loss of the same pair as in the SVCs. These are in addition to the two 8-cysteine VWC domains in Drosophila Hemolectin, indicating that the pattern found in the SVC proteins is already observed in a small subset of known VWC domains. Appendix D: Analysis of signal peptides 40 of the 43 SVC proteins are predicted by SignalP to have signal peptides, and thus may be secreted. The three exceptions are from D. pseudoobscura, despite each having a clear orthologue (70-90% identical) in D. melanogaster which has a signal peptide (see Table 2). In the case of D. pseudoobscura protein GA13562, there appears to be a 57bp 5’ exon immediately upstream of the existing sequence which SignalP predicts to code for a signal peptide, exactly as in its D. melanogaster orthologue CG15199. In the other two cases, GA12782 and GA13565, no signal peptide is found immediately upstream, though conservation of such leader sequences may be too low to find by simple Blast search. Appendix E: Further Materials and Methods Fly strains Strains were obtained from the Bloomington stock center, except for P{tubP-GAL4} and P{GAL4-prd.F}, which were a gift from J.-P. Vincent, and P{ppl-GAL4.P}, which was a gift from P. Leopold. In situ hybridisation Standard in situ protocols were used to examine expression of eight Drosophila SVC genes, using riboprobes made using the following ESTs: CA807769 for CG15199, CA803791 for CG15202, RH67091 and RH06304 for CG15201, RE43475 for CG31997, LP01643 for CG2444, AT09846 for CG12677, RE35169 for CG15203 and GH22911 for CG2081. See Appendix B for additional information. Generation of transgenic constructs. The predicted coding sequences of four Drosophila SVCs (CG2081, CG15201, CG15203 and CG2444) were amplified using primers containing XhoI/XbaI or EcoRI/XbaI sites. The resulting fragments were subcloned into the pUAST transformation vector [5]. For each construct, at least two independent singleinsertion lines were established and used for further experiments. See Appendix B for additional information. Supplemental Figure Captions Supplemental Table 1. Observed spacings between cysteine residues. All of the 10-cysteine VWC proteins from Pfam family PF00093 were analysed. SVC proteins AAK61816 (truncated) and AAT92199 (missing cysteine) were excluded from the analysis (see Appendix C). The number of residues between each pair of cysteines is displayed as a range. All Pfam sequences displaying the typical ten-cysteine pattern were used. The numbers given to the cysteine residues are based on the 8-cysteine pattern. The two extra cysteines from the 10-cysteine pattern (3 and 5) were counted as any other residue. Supplemental Figure 1. Overexpression of SVC genes does not affect developmental growth. Transgenic flies overexpressing four Drosophila SVC genes were viable and had the same weight as wild-type flies grown in the same vial. Genotypes are as follows: Driver: yw;; tubulin-Gal4/+. Balancer: yw;; TM3, P{ActGFP}MR2,Ser/+. CG15203: p{UAS-CG15203}/+; tubulin-Gal4/+ or p{UAS-CG15203}/TM3, P{ActGFP}MR2,Ser. CG15201: p{UAS-CG15201}/+; tubulin-Gal4/+ or p{UASCG15201}/TM3, P{ActGFP}MR2,Ser. CG2081: p{UAS-CG2081}/tubulin-Gal4. CG2444: p{UAS-CG2444}/+; tubulin-Gal4/+. Corresponding controls were p{UAS} lines/+; TM3, P{ActGFP}MR2,Ser or UAS lines/TM3, P{ActGFP}MR2,Ser flies grown in the same vial. Two different UAS insertions were tested for each SVC gene. There was no significant difference between progeny overexpressing SVC genes and control progeny grown in the same vial. There was no weight difference between heterozygotes with a balancer chromosome and heterozygote driver lines, thus allowing the use of balancer heterozygotes as internal controls. Supplemental References 1. 2. 3. 4. 5. Alexandre C, Vincent JP: Requirements for transcriptional repression and activation by Engrailed in Drosophila embryos. Development 2003, 130:729-739. Zinke I, Schutz CS, Katzenberger JD, Bauer M, Pankratz MJ: Nutrient control of gene expression in Drosophila: microarray analysis of starvation and sugar-dependent response. Embo J 2002, 21:6162-6173. Lee T, Luo L: Mosaic analysis with a repressible cell marker for studies of gene function in neuronal morphogenesis. Neuron 1999, 22:451-461. Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, et al: The Pfam protein families database. Nucleic Acids Res 2004, 32:D138-141. Brand AH, Perrimon N: Targeted gene expression as a means of altering cell fates and generating dominant phenotypes. Development 1993, 118:401-415.