Download Promoter Analysis for Intestinally

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Therapeutic gene modulation wikipedia , lookup

Gene desert wikipedia , lookup

Essential gene wikipedia , lookup

Transposable element wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Microevolution wikipedia , lookup

Genomics wikipedia , lookup

Designer baby wikipedia , lookup

Genomic imprinting wikipedia , lookup

Genome (book) wikipedia , lookup

Genome evolution wikipedia , lookup

Non-coding DNA wikipedia , lookup

Point mutation wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Pathogenomics wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Human genome wikipedia , lookup

Microsatellite wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Ridge (biology) wikipedia , lookup

Minimal genome wikipedia , lookup

Gene wikipedia , lookup

Smith–Waterman algorithm wikipedia , lookup

Computational phylogenetics wikipedia , lookup

RNA-Seq wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression profiling wikipedia , lookup

Metagenomics wikipedia , lookup

Multiple sequence alignment wikipedia , lookup

Sequence alignment wikipedia , lookup

Transcript
Promoter Analysis for Intestinally-Expressed C. elegans genes
1. Objectives
a. Find conserved sites in the upstream regions of 74 intestinally-expressed
genes
b. Also analyze the orthologues of the genes in C. briggsae and C. remanei
c. Provide evidence, if possible, for the ELT-2 theory of intestinal gene
regulation
2. Summary:
a. Motif Discovery is complete for all 74 C. elegans genes, 57 C. briggsae
orthologues and 38 C. remanei orthologues
b. Hit sequences have been extracted and aligned
c. We need to discuss how to generate our final set of hits from this data
Completed to date
1. Motif Discovery in 74 C. elegans genes
a. 2 Motif Discovery algorithms used
i. MotifSampler
1. Settings: 100 iterations, up to 5 motifs reported per sequence
2. Background: 150 kb of randomly chosen concatenated
upstream sequences
3. Motif lengths: 6, 8, 10, 12
4. Filtering step:
a. Only kept those motifs that were found in the exact
same place at least 7 times
b. Overlapping hits that met this criteria were merged into
one long hit
5. Found motifs that met this criteria on 58 of the 74 sequences
ii. RSAT
1. Word counter
2. Background: *all* upstream sequences from C. elegans had all
of their “words” counted
3. Default settings used
4. Motif Lengths: 6, 7, 8 (those are the only possibilities)
5. Found significant hits on all 74 sequences
b. Results:
i. Image of all MotifSampler results: Cele_all_motifsampler.GIF
ii. Image of filtered MotifSampler results:
Cele_filtered_motifsampler.GIF
iii. Image of RSAT results: Cele_RSAT.GIF
iv. Image of filtered MotifSampler results plus RSAT results:
Cele_RSAT_filt_motifsampler.GIF
c. Observations
i. Lots of overlap between MotifSampler and RSAT predictions
ii. RSAT finds all occurrences of a given sequence, while MotifSampler
only finds some of them and ignores others
iii. However, in general RSAT returns too many results to be useful by
itself, especially at length 6 bp
2. Analysis of Orthologous sequences
a. Origin of orthologues:
i. C. briggsae:
1. Wormbase cb25 release
2. 57 orthologues found
ii. C. remanei:
1. C. elegans Wormpep sequence aligned against remanei
supercontigs using WABA
2. Only non-ambiguous results that match right from the
Wormpep ATG were used.
3. 38 sequences found
b. Analysis method same as for C. elegans
c. Motif Discovery Results:
i. C. briggsae: image of filtered MotifSampler results and RSAT:
Cbri_RSAT_filt_motifsampler.GIF
ii. C. remanei : image of filtered MotifSampler results and RSAT:
Crem_RSAT_filt_motifsampler.GIF
3. Motif Sequences
a. The sequences of all hits were extracted and flipped to the strand that
maximized As and Gs.
b. The sequences were then run through ClustalW. Alignments can be seen in
the following file:
i. C. elegans : Cele_all_hits_aligned.txt
ii. C. briggsae: Cbri_all_hits_aligned.txt
iii. C. remanei : Crem_all_hits_aligned.txt
c. Observations:
i. Most of the hits in all 3 species are mostly TGATAA sites or some
variation, but a few aren’t related to TGATAA at all
ii. Hits vary hugely in length (due to the merging of overlapping
motifsampler hits of the same length)
iii. Each result set was extracted independently, so these hits overlap with
each other in the original sequence and appear varying numbers of
times
iv. It is not at all clear which of these hits we should use to form a
Position Frequency Matrix and which ones we should discard
Future Work
1. Position Frequency Matrices
a. Need to determine possible true motif length and consensus sequence
b. The consensus sequence can then be used to scan the original upstream
regions to generate full-length motifs
c. A final set of PFMs and logos can be generated
2. Negative Controls (will be done last after procedure for test sets is finalized)
a. Mirror image genes
b. Set of 74 randomly chosen C. elegans genes