* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Final project
Pathogenomics wikipedia , lookup
Designer baby wikipedia , lookup
Metagenomics wikipedia , lookup
Genome (book) wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Minimal genome wikipedia , lookup
Microevolution wikipedia , lookup
Gene expression profiling wikipedia , lookup
Genome editing wikipedia , lookup
Helitron (biology) wikipedia , lookup
Genome evolution wikipedia , lookup
GCBA815: Final Project: Due on 12/14/15, Monday by 5pm. Instructions: 1. Questions 1-3 have individual assignments as shown below. Questions 4-6 are common to all. 2. Your final project should contain a single Word file with your responses to all questions. The Word file name MUST start with your name. Please email it to [email protected] Question 1: (6 points) A. (3 points) Submit the translation products for your nucleotide sequence (EST-GID provided) in all 6 frames (use the EBI tool, ‘Transeq’). Do you think your DNA sequence makes any functionally sensible protein product? If so, in which frame? B. (3 points) Run a ‘blastx’ search for your DNA sequence at an E-value of cutoff of 0.01 and explain the function of the protein found from the most meaningful frame. Is this frame the same as the frame you found in Question 1 that makes a functionally sensible protein product? Explain your result. Question 2: (5 points) Find 5 orthologous protein sequences for your favorite gene making sure that each sequence is coming from a different species. Extract FASTA sequences for these 5 proteins using the NCBI Entrez system, and get multiple alignments using CLUSTALW program. Copy and paste the colored output of CLUSTALW alignment in your report. Question 3: (5 points) List the unique RefSeq genes (not LINCs or MIRs) that are present on the chromosome band assigned to you using the UCSC Genome Browser. Briefly comment on the genes you found on this chromosomal segment such as if they belong to a specific gene family, unrelated, have no genes, etc. If more than 10 genes exist in your chromosomal band, just show any 10 genes Question 4: (7 points) (2 points) Download the HIV-AD8 genome (accession number AF004394.1 from NCBI) to a text file and load it into VectorNTI (you will need to install it on your PC) A. (2 points) Find ORFs in the genome using at least 300bp setting (default is 50 bp). How many ORFs did you find on each strand? B. (3 points) Design primers for the gene from positions 2358-5093bp using default parameters but with a requirement of getting at least 2000bp PCR product. a. How many primer pairs did you find, and what is the length of the product for each pair? b. Translate this gene into a protein and run it against the PFAM database. List the protein domains you found from the PFAM search? Question 5: (5 points) Consider the following non-synonymous mutations observed in a tumor genome. State how each mutation can potentially impact the structure/function of a protein (without using any extraneous information). Give correct reasoning for each mutation based on the physicochemical properties of the amino acids. a) V à L b) C à W c) K à D d) G à W e) P à S Question 6: (12 points) There are three main bioinformatic analyses (data pre-processing, variant discovery, and variant call refinement) performed when identifying genetic variants from raw next generation sequencing (NGS) data. List the main steps performed in each of the following three stages of analysis and describe the purpose of each of those steps. 1. Data pre-processing (4 points) 2. Variant discovery (2 points) 3. Variant call refinement (2 points) Run the data analysis: (4 points) Run the entire NGS analysis pipeline exactly as we did in class. You will be using the same script that you used last time; however new raw reads (fastq files) have been added, so your final files will be different than the ones generated in class. Remember, the final files will be located in this folder on cbsb: /storage/gcba815/USER_NAME/variant_calling_work/vcf_files/ where “USER_NAME” should be replaced by your user name. Once the pipeline has finished, you will find the final variant file (called VCF file with ‘.vcf’ extension.). Identify any interesting variants in the final annotated.snv.vcf file and explain why those variants might be interesting. Submit your vcf file? Change the name of the snv.vcf file (your_filename.annotated.snv.vcf) to “yourname.vcf” and copy that file to the following path “/storage/share/project/submit/” Note: If you don’t change the filename with your name, it can be overwritten. So make sure you change the file name before you copy it to the above folder.