Download Appendix A: Gene Annotation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene therapy of the human retina wikipedia , lookup

Gene desert wikipedia , lookup

Pathogenomics wikipedia , lookup

Metagenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Ridge (biology) wikipedia , lookup

Minimal genome wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Genomic imprinting wikipedia , lookup

Genome (book) wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Microevolution wikipedia , lookup

Genome evolution wikipedia , lookup

Gene nomenclature wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene wikipedia , lookup

Point mutation wikipedia , lookup

Protein moonlighting wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Gene expression programming wikipedia , lookup

Designer baby wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Helitron (biology) wikipedia , lookup

NEDD9 wikipedia , lookup

RNA-Seq wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene expression profiling wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Appendix A: Gene Annotation
In some cases, sequences were found to be incorrectly annotated by gene prediction
programs. Identification of the SVC family has allowed us to re-evaluate some of
these annotations. The changes described below have been communicated to FlyBase
and will be reflected in a future release (T. Sheldon, personal communication to
FlyBase).
1. The protein product for Dm gene CG12677 was replaced by a new sequence
in 2006, yet the original fitted the family in terms of its canonical sequence
identity, intron phasing and signal peptide. The unusual response pattern of its
gene is also consistent with the expression analysis for the other Dm genes
(Table 1). FlyBase will subsequently reinstate the original translation under a
new symbol and ID number.
2. The C-terminus of D. melanogaster protein CG6301 is identified as a family
member. It conforms to the canonical sequence, despite the existing record
showing a single protein of 463 amino acids. Rather than an isolated case of a
family member embedded as a domain in a larger protein, closer inspection of
the deposited cDNA fragments for the record show that the N-terminus section
has a poly-A tail, indicating that the predicted protein should in fact be two.
In support of this claim is the Pfam entry for DUF745, to which a portion of
the N-terminus of CG6301 (residues 94-302) is correctly predicted to belong,
but where the C-terminus sequence fragment (containing the first of the
cysteine residues) is incorrect. This 192 residue C-term fragment forms the
true SVC, and consistent with the rest of the family it has a short signal
peptide at its N-terminus indicating that it is secreted.
3. Dm gene CG14132 is currently annotated without a signal peptide sequence.
A 5’ upstream coding exon has now been found and the sequence will be
updated, bringing it into line with the predicted sequence RE17110p in
GenBank.
4. Dm protein BK003543 is predicted in GenBank but the only corresponding
cDNA record in FlyBase may be artefactual given its lack of introns. In
support of its existence, however, it receives a Blast e-value of 3e-5 against
protein CG6301, and displays the canonical pattern common to all SVC
proteins; so for this paper it is predicted to be a genuine family member and
has been kept in the alignment. BK003543 corresponds to gene identifier
HDC03087.
5. Ixodes protein AAT92199 appears to be missing one of the conserved
cysteines. The sequence was confidently identified by PsiBlast as a family
member as it is very similar to AAT92107, so it has been included in the
alignment. Similarly, Buthus protein AAK61816 appears to be truncated at its
N-terminus. Both AAT92199 and AAK61816 are conceptual translations so
their amino acid sequences cannot be verified.
Appendix B
In situ analysis of SVC genes
In situ hybridization was carried out in whole, mixed-stage embryos and dissected L1,
L2 and L3 larval tissue to ensure all developmental stages and tissues were
represented. Corresponding sense probes were used for every tested SVC gene
antisense probe. To rule out technical problems as the reason for our inability to
detect a signal in most cases, we misexpressed several SVC transcripts in a tissuespecific pattern in both embryos and larvae using our transgenic UAS lines, and we
detected strong expression using the same in situ procedure and the same probes used
to determine their wild-type expression (Figure 4J-L, see Appendix E for details). A
Paired-Gal4 driver [1] was used to drive expression in stripes in the embryos, and a
pumpless-Gal4 driver [2] was used to express SVCs in larval fat body, gut and several
other larval tissues.
Generation and phenotypic analysis of transgenic SVC lines
Transgenic lines for four SVC genes were generated. CG2081 and CG2444 were
chosen because they are two of the infection-induced SVCs not expressed during
normal development; CG15203 because of its transient expression in the fat body (a
key tissue in immunity and inter-tissue communication); and CG15201 because of its
lack of expression and its apparent absence of involvement in any biological function.
The coding sequence of these four SVC genes was amplified from cDNA clones and
subcloned into the pUAST vector using its XhoI/XbaI or EcoRI/XbaI sites. The
cDNAs used as templates were as follows: GH22911 (CG2081), LP01642 (CG2444),
RE35169 (CG15203) and RH06304 (CG15201). Primer sequences were:
CTCGAGCAAAACATGGAGTCAATTAGCAGCATGATT and
TCTAGATCACACCTGATACTTCCTCGCC (CG2081),
GAATTCAAAACATGTCGCAGTTTAGCACCGTTGC and
TCTAGATTAGATGTGCTTGTCGCAATTGTATTTC (CG2444),
CTCGAGCAAAACATGCATCCGGAACGCTACGCC and
TCTAGACTAGACCTTCTCCGCCGTATTC (CG15203) and
GAATTCAAAACATGCACAACCGTTGCGGATCC and
TCTAGATTAGGGACACTTGAGGCAGCAG (CG15201).
Tubulin-Gal4 P{tubP-GAL4} [3], a ubiquitous driver, was used to force constitutive
expression of all four genes in all tissues during development. Transgene expression
was verified by crossing transgenic lines to paired-Gal4 and pumpless-Gal4 drivers
and detecting expression of the relevant transcript in embryonic stripes or in the
pumpless larval pattern, respectively, by means of in situ hybridisation (Figure 4J-L).
Developmental rate of misexpression larvae was compared to wild-type control with
the driver or the UAS line alone, and was found to be the same. The number of
eclosed flies from the tub>>SVC crosses was compared to that of crosses where the
driver line or the UAS line alone was outcrossed to wild-type flies. No differences in
developmental rate or viability were observed. Same-age, same-sex flies were
weighted to assess body-size and no differences were observed (Supplemental Figure
1). Adult flies mis-expressing SVC genes were intercrossed and they were found to be
fully fertile.
Appendix C: Analysis of Cysteine Spacing
An examination of the spacing between cysteine residues in the SVCs and the 679 10cysteine VWC domains listed in the Pfam database [4] was performed. Although
there is considerable variation in the spacing within both groups (Supplemental Table
1), the spacings observed in the SVC proteins fall within the ranges observed in the
Pfam group in 5 of the 7 regions. In the case of C1-C2, the mean spacing observed in
the Pfam group is 18.6 with a standard deviation of 2.9, compared to a mean spacing
of 20.3 in the SVC proteins. In the case of C7-C8 there are only two sequences with a
spacing higher than that observed in the Pfam group (CG6301 and BK003543, Figure
1A). Given that these are the only two sequences to have an extra C-terminus
cysteine well within the Pfam range, the spacing may be artificially high due to
misalignment. Four of the 717 Pfam VWC domains have only eight cysteine
residues, consistent with the loss of the same pair as in the SVCs. These are in
addition to the two 8-cysteine VWC domains in Drosophila Hemolectin, indicating
that the pattern found in the SVC proteins is already observed in a small subset of
known VWC domains.
Appendix D: Analysis of signal peptides
40 of the 43 SVC proteins are predicted by SignalP to have signal peptides, and thus
may be secreted. The three exceptions are from D. pseudoobscura, despite each
having a clear orthologue (70-90% identical) in D. melanogaster which has a signal
peptide (see Table 2). In the case of D. pseudoobscura protein GA13562, there
appears to be a 57bp 5’ exon immediately upstream of the existing sequence which
SignalP predicts to code for a signal peptide, exactly as in its D. melanogaster
orthologue CG15199. In the other two cases, GA12782 and GA13565, no signal
peptide is found immediately upstream, though conservation of such leader sequences
may be too low to find by simple Blast search.
Appendix E: Further Materials and Methods
Fly strains
Strains were obtained from the Bloomington stock center, except for P{tubP-GAL4}
and P{GAL4-prd.F}, which were a gift from J.-P. Vincent, and P{ppl-GAL4.P}, which
was a gift from P. Leopold.
In situ hybridisation
Standard in situ protocols were used to examine expression of eight Drosophila SVC
genes, using riboprobes made using the following ESTs: CA807769 for CG15199,
CA803791 for CG15202, RH67091 and RH06304 for CG15201, RE43475 for
CG31997, LP01643 for CG2444, AT09846 for CG12677, RE35169 for CG15203
and GH22911 for CG2081. See Appendix B for additional information.
Generation of transgenic constructs.
The predicted coding sequences of four Drosophila SVCs (CG2081, CG15201,
CG15203 and CG2444) were amplified using primers containing XhoI/XbaI or
EcoRI/XbaI sites. The resulting fragments were subcloned into the pUAST
transformation vector [5]. For each construct, at least two independent singleinsertion lines were established and used for further experiments. See Appendix B for
additional information.
Supplemental Figure Captions
Supplemental Table 1. Observed spacings between cysteine residues.
All of the 10-cysteine VWC proteins from Pfam family PF00093 were analysed.
SVC proteins AAK61816 (truncated) and AAT92199 (missing cysteine) were
excluded from the analysis (see Appendix C). The number of residues between each
pair of cysteines is displayed as a range. All Pfam sequences displaying the typical
ten-cysteine pattern were used. The numbers given to the cysteine residues are based
on the 8-cysteine pattern. The two extra cysteines from the 10-cysteine pattern (3 and
5) were counted as any other residue.
Supplemental Figure 1. Overexpression of SVC genes does not affect
developmental growth. Transgenic flies overexpressing four Drosophila SVC genes
were viable and had the same weight as wild-type flies grown in the same vial.
Genotypes are as follows:
Driver: yw;; tubulin-Gal4/+. Balancer: yw;; TM3, P{ActGFP}MR2,Ser/+. CG15203:
p{UAS-CG15203}/+; tubulin-Gal4/+ or p{UAS-CG15203}/TM3,
P{ActGFP}MR2,Ser. CG15201: p{UAS-CG15201}/+; tubulin-Gal4/+ or p{UASCG15201}/TM3, P{ActGFP}MR2,Ser. CG2081: p{UAS-CG2081}/tubulin-Gal4.
CG2444: p{UAS-CG2444}/+; tubulin-Gal4/+. Corresponding controls were p{UAS}
lines/+; TM3, P{ActGFP}MR2,Ser or UAS lines/TM3, P{ActGFP}MR2,Ser flies
grown in the same vial. Two different UAS insertions were tested for each SVC gene.
There was no significant difference between progeny overexpressing SVC genes and
control progeny grown in the same vial. There was no weight difference between
heterozygotes with a balancer chromosome and heterozygote driver lines, thus
allowing the use of balancer heterozygotes as internal controls.
Supplemental References
1.
2.
3.
4.
5.
Alexandre C, Vincent JP: Requirements for transcriptional repression and
activation by Engrailed in Drosophila embryos. Development 2003,
130:729-739.
Zinke I, Schutz CS, Katzenberger JD, Bauer M, Pankratz MJ: Nutrient
control of gene expression in Drosophila: microarray analysis of
starvation and sugar-dependent response. Embo J 2002, 21:6162-6173.
Lee T, Luo L: Mosaic analysis with a repressible cell marker for studies of
gene function in neuronal morphogenesis. Neuron 1999, 22:451-461.
Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna
A, Marshall M, Moxon S, Sonnhammer EL, et al: The Pfam protein families
database. Nucleic Acids Res 2004, 32:D138-141.
Brand AH, Perrimon N: Targeted gene expression as a means of altering
cell fates and generating dominant phenotypes. Development 1993,
118:401-415.