Download Project - MSCBIO 2025

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

NEDD9 wikipedia , lookup

Genetic engineering wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene therapy wikipedia , lookup

Epistasis wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

X-inactivation wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Oncogenomics wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Gene nomenclature wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Public health genomics wikipedia , lookup

Gene desert wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Pathogenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Essential gene wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Genome evolution wikipedia , lookup

Gene wikipedia , lookup

Genomic imprinting wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Ridge (biology) wikipedia , lookup

RNA-Seq wikipedia , lookup

Minimal genome wikipedia , lookup

Genome (book) wikipedia , lookup

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
RNA-seq analysis is a valuable tool for investigating gene expression levels. After the
analysis is done you need to filter the information for genes that have significant
differences from a control (wild-type). Here you will take a .csv file containing a
gene list and their statistics from the analysis and filter it for the important genes
with statistical significance and get an idea of how the fold changes trend positively
or negatively.
60% >> assign1.py Diff_gene_genes3.csv (any of the three files)
Here are three files of differentially expressed genes between mutant and wild-type
zebrafish, provided by Michael Tsang’s developmental/regenerative lab. Your code
needs to take a single argument to read the file in using the Pandas package. Print
the number of total genes in the file and the column names only of the file (not a list
of the names). Hint: familiarize yourself with the column names it will help with the
rest of the assignment.
70% >> assign1.py Diff_gene_genes3.csv (any of the three files)
Next drop the columns named logCPM and LR as they will not be used in your
filtering. Now you need to focus on genes that have a pvalue less than 0.05 and
produce a table that only contains those genes. Print the number of genes from the
original table and after the Pvalue filter and the columns names. Notice a difference
from the total genes and less columns.
80% >> assign1.py Diff_gene_genes3.csv (any of the three files)
Next you want to look at genes with high fold changes. You will want a table that
contains all genes with a fold change (logFC) greater than 2. Print the number of
genes you have in this table which should all have pvalues < 0.05 and logFC > 2.
90% >> assign1.py Diff_gene_genes3.csv (any of the three files)
Now filter by an FDR (false discovery rate)<0.05 using the original file without
unwanted columns and a logFC > 1.5 this time. Print the total genes in this table and
the top ten genes in the table sorted so the most significant is on top. Hint: you will
no longer need to have the pvalue limit the FDR is a better stat to use in research
reporting.
100% >> assign1.py Diff_gene_genes3.csv (any of the three files)
Finally, you will use matplotlib to produce a histogram representing the fold change
in the experiment. This will give you an idea of how the fold change is distributed
positively or negatively. Set the bins to 30 to get a better idea of the fold change
distribution.