Download Final project

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pathogenomics wikipedia , lookup

Designer baby wikipedia , lookup

Metagenomics wikipedia , lookup

Genome (book) wikipedia , lookup

Gene wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Minimal genome wikipedia , lookup

Microevolution wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genome editing wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genomics wikipedia , lookup

Genome evolution wikipedia , lookup

Point mutation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
GCBA815: Final Project: Due on 12/14/15, Monday by 5pm.
Instructions:
1. Questions 1-3 have individual assignments as shown below. Questions 4-6 are common
to all.
2. Your final project should contain a single Word file with your responses to all questions.
The Word file name MUST start with your name. Please email it to
[email protected]
Question 1: (6 points)
A. (3 points) Submit the translation products for your nucleotide sequence (EST-GID provided)
in all 6 frames (use the EBI tool, ‘Transeq’). Do you think your DNA sequence makes any
functionally sensible protein product? If so, in which frame?
B. (3 points) Run a ‘blastx’ search for your DNA sequence at an E-value of cutoff of 0.01
and explain the function of the protein found from the most meaningful frame. Is this frame the
same as the frame you found in Question 1 that makes a functionally sensible protein product?
Explain your result.
Question 2: (5 points)
Find 5 orthologous protein sequences for your favorite gene making sure that each sequence is
coming from a different species. Extract FASTA sequences for these 5 proteins using the NCBI
Entrez system, and get multiple alignments using CLUSTALW program. Copy and paste the
colored output of CLUSTALW alignment in your report.
Question 3: (5 points)
List the unique RefSeq genes (not LINCs or MIRs) that are present on the chromosome band
assigned to you using the UCSC Genome Browser. Briefly comment on the genes you found on
this chromosomal segment such as if they belong to a specific gene family, unrelated, have no
genes, etc. If more than 10 genes exist in your chromosomal band, just show any 10 genes
Question 4: (7 points)
(2 points) Download the HIV-AD8 genome (accession number AF004394.1 from NCBI) to a text
file and load it into VectorNTI (you will need to install it on your PC)
A. (2 points) Find ORFs in the genome using at least 300bp setting (default is 50 bp). How
many ORFs did you find on each strand?
B. (3 points) Design primers for the gene from positions 2358-5093bp using default
parameters but with a requirement of getting at least 2000bp PCR product.
a. How many primer pairs did you find, and what is the length of the product for
each pair?
b. Translate this gene into a protein and run it against the PFAM database. List the
protein domains you found from the PFAM search?
Question 5: (5 points)
Consider the following non-synonymous mutations observed in a tumor genome. State how
each mutation can potentially impact the structure/function of a protein (without using any
extraneous information). Give correct reasoning for each mutation based on the
physicochemical properties of the amino acids.
a) V à L
b) C à W
c) K à D
d) G à W
e) P à S
Question 6: (12 points)
There are three main bioinformatic analyses (data pre-processing, variant discovery, and variant
call refinement) performed when identifying genetic variants from raw next generation
sequencing (NGS) data. List the main steps performed in each of the following three stages of
analysis and describe the purpose of each of those steps.
1. Data pre-processing (4 points)
2. Variant discovery (2 points)
3. Variant call refinement (2 points)
Run the data analysis: (4 points)
Run the entire NGS analysis pipeline exactly as we did in class. You will be using the same
script that you used last time; however new raw reads (fastq files) have been added, so your
final files will be different than the ones generated in class. Remember, the final files will be
located in this folder on cbsb: /storage/gcba815/USER_NAME/variant_calling_work/vcf_files/
where “USER_NAME” should be replaced by your user name. Once the pipeline has finished,
you will find the final variant file (called VCF file with ‘.vcf’ extension.). Identify any interesting
variants in the final annotated.snv.vcf file and explain why those variants might be
interesting.
Submit your vcf file? Change the name of the snv.vcf file (your_filename.annotated.snv.vcf) to
“yourname.vcf” and copy that file to the following path “/storage/share/project/submit/”
Note: If you don’t change the filename with your name, it can be overwritten. So make sure you
change the file name before you copy it to the above folder.