* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Finding needles in a haystack - predicting gene regulatory pathways
Biology and consumer behaviour wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Point mutation wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene desert wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Adeno-associated virus wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Human genetic variation wikipedia , lookup
Copy-number variation wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genetic engineering wikipedia , lookup
Microevolution wikipedia , lookup
Gene expression profiling wikipedia , lookup
Oncogenomics wikipedia , lookup
Primary transcript wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Designer baby wikipedia , lookup
Genome (book) wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Public health genomics wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Transposable element wikipedia , lookup
Metagenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Minimal genome wikipedia , lookup
Pathogenomics wikipedia , lookup
Non-coding DNA wikipedia , lookup
Genomic library wikipedia , lookup
Helitron (biology) wikipedia , lookup
Human genome wikipedia , lookup
Human Genome Project wikipedia , lookup
Finding needles in a haystack - predicting gene regulatory pathways from microarray and yeast sequence data; and a selection of human genome anecdotes David Landsman Computational Biology Branch, NCBI NLM NIH The gathering of sequence information has accelerated to the point where it is reasonable to expect more than 10 bacterial and archeal, and 1-2 eukaryotic complete genome sequences being deposited in the public databases in a given year. In addition, the identification of the open reading frames in a genome is a challenge that is being met both computationally and experimentally and there are considerable efforts underway to expedite the determination of many of the protein folds and structures resulting from these results. However, the regulatory networks which underpin the normal functioning of cells and which represent the interactions between the genome protein and RNA products are less well understood. For example, in the yeast, Saccharomyces cerevisiae, there are predicted to be about 300 DNA-binding proteins with a wide variety of specific or non-specific DNA binding. Many of the sites that these proteins bind to are, as yet, undiscovered and several methods for prediction have been developed. Many of these methods use consensus pattern and matrix-based searches which are designed to predict cis-acting transcriptional regulatory sequences but have historically been subject to large numbers of false positives. We sought to decrease the rate of false positive detection by incorporating expression profile data into a consensus pattern-based search methodology. Based on our analysis, we have developed a web-based tool called PROSPECT, which allows consensus pattern-based searching of gene clusters obtained from microarray data. For millions of years, L1 retrotransposons have been duplicating in mammalian genomes by an efficient “copy and paste” mechanism; consequently, L1s now make up 15% of the human genome. These autonomous elements are thought to have played an important role in the expansion and evolution of our genome. For example, a recent, and still active, L1 element was found to have inserted into genes, thereby causing disease. We will show examples of 3’ transduction events for this particular L1 element which is another mechanism by which L1s have probably shaped the human genome. Processed pseudogenes are created by retrotransposition, a process by which a mRNA is reverse transcribed into DNA and inserted in a new location in a genome. In the human genome, the total number of processed pseudogenes is estimated as approximately 20,000, and while most are inactive, some may acquire a new promoter and remain functional. To understand the nature of this process we set out to conduct a detailed genomic survey of processed pseudogenes, concentrating on three families of HMG (high mobility group) genes (e.g. HMGN, HMBA, and HMGB), which are known to have numerous processed pseudogenes. We will present the general characteristics of the insertions found, as well as describe some unique retrotransposition events.