* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Project - MSCBIO 2025
Genetic engineering wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene therapy wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
X-inactivation wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Oncogenomics wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Gene nomenclature wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Public health genomics wikipedia , lookup
Gene desert wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Pathogenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Essential gene wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Genome evolution wikipedia , lookup
Genomic imprinting wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Ridge (biology) wikipedia , lookup
Minimal genome wikipedia , lookup
Genome (book) wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
Epigenetics of human development wikipedia , lookup
RNA-seq analysis is a valuable tool for investigating gene expression levels. After the analysis is done you need to filter the information for genes that have significant differences from a control (wild-type). Here you will take a .csv file containing a gene list and their statistics from the analysis and filter it for the important genes with statistical significance and get an idea of how the fold changes trend positively or negatively. 60% >> assign1.py Diff_gene_genes3.csv (any of the three files) Here are three files of differentially expressed genes between mutant and wild-type zebrafish, provided by Michael Tsang’s developmental/regenerative lab. Your code needs to take a single argument to read the file in using the Pandas package. Print the number of total genes in the file and the column names only of the file (not a list of the names). Hint: familiarize yourself with the column names it will help with the rest of the assignment. 70% >> assign1.py Diff_gene_genes3.csv (any of the three files) Next drop the columns named logCPM and LR as they will not be used in your filtering. Now you need to focus on genes that have a pvalue less than 0.05 and produce a table that only contains those genes. Print the number of genes from the original table and after the Pvalue filter and the columns names. Notice a difference from the total genes and less columns. 80% >> assign1.py Diff_gene_genes3.csv (any of the three files) Next you want to look at genes with high fold changes. You will want a table that contains all genes with a fold change (logFC) greater than 2. Print the number of genes you have in this table which should all have pvalues < 0.05 and logFC > 2. 90% >> assign1.py Diff_gene_genes3.csv (any of the three files) Now filter by an FDR (false discovery rate)<0.05 using the original file without unwanted columns and a logFC > 1.5 this time. Print the total genes in this table and the top ten genes in the table sorted so the most significant is on top. Hint: you will no longer need to have the pvalue limit the FDR is a better stat to use in research reporting. 100% >> assign1.py Diff_gene_genes3.csv (any of the three files) Finally, you will use matplotlib to produce a histogram representing the fold change in the experiment. This will give you an idea of how the fold change is distributed positively or negatively. Set the bins to 30 to get a better idea of the fold change distribution.