Download Assignment 4 Answers

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

United Kingdom National DNA Database wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Protein moonlighting wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Public health genomics wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Copy-number variation wikipedia , lookup

Gene expression programming wikipedia , lookup

Transposable element wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

NEDD9 wikipedia , lookup

Pathogenomics wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genome (book) wikipedia , lookup

Gene therapy wikipedia , lookup

Non-coding DNA wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene desert wikipedia , lookup

Genomic library wikipedia , lookup

Genetic engineering wikipedia , lookup

Metagenomics wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Human genome wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene wikipedia , lookup

Sequence alignment wikipedia , lookup

History of genetic engineering wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome evolution wikipedia , lookup

Point mutation wikipedia , lookup

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

Genomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Genome editing wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Assignment IV
Blast
1. The following “thing” has just been sequenced in your lab. Run a blastn
against the nr/nt database to identify similar sequences.
CCTTGCGCAACGTGGTGCCGAAGCCGGTGCAGAAGCCGCATCCGCCGCTGTGGATGGCCTGCTCGCAGCT
TCCGACCATCGAGCGCGCCGGCCGTCATGGTTTTGGCGCGCTCGGCTTCCAGTTCGTCAGCGCCGACGCC
GCCCATGCCTGGGTGCACGCCTATTACAACGCCATGACCAAGCGGCTGCACAAGCTTGCGGACTACGAGA
TCAACCCGAACATGGCGCTGGTGTCGTTCTTCATGTGCGCGAAGACCGACGAGGAGGCGCGCGCCCGCGC
CGACGGCGCCACCTTCTTCCAGTTCGCGCTGCGCTTCTATGGCGCCGCGCAGAACCGCCAGCGGCCCGCG
CCCTACACCGTCAATATGTGGGACGAGTACAACAAGTGGAAGCGCGACAATCCAGAGGCCCAGGAGGCGG
CGCTGCGCGGCGGCCTGATCGGCTCGCCGGAGACGATCCGCAAGAAGCTCAGGCGCTTCCAGAGCTCGCA
TATCGACCAGGTCATCCTGCTCAACCAGGCCGGCAAGAACAGCCACGAGCACATCTGCGAATCGCTCGAA
TTGTTCGGCCGGGAGGTGATGCCGGAGTTCCAGAACGATCCCGCCCAGGCCGCCTGGAAGAGGAGCGTCA
a. The genome of which organism contains this sequence. Describe the
organism. List three papers with this organism in the title which have been
published in 2011. Has the genome of this organism been sequenced? What
is the predicted function of the gene? (10 points)
a) Bradyrhizobium japonicum. Bradyrhizobium japonicum is gramnegative, rod shaped, nitrogen fixing bacteria. It forms a symbiotic
relationship with Glycine max, a soybean plant. It is located on the root tips
of the soy bean plant.
Papers:
1) A role for Bradyrhizobium japonicum ECF16 sigma factor, EcfS, in the
formation of a functional symbiosis with soybean
2) Whole-Genome Expression Profiling of Bradyrhizobium japonicum in
Response to Hydrogen Peroxide.
3) A dual-targeted soybean protein is involved in Bradyrhizobium
japonicum infection of soybean root hair and cortical cells.
The genome has been sequenced. Putative cyclic NTP-binding protein gene
b. Perform a blastx search against SwissProt. Why are the results different?
(10 points)
On the one hand, searching SwissProt reduces the number of redundant hits,
on the other hand, SwissProt is not updated every day and the criteria for
inclusion into SwissProt is quite stringent, so not everything is there.
2. Search the sequences “BarakObama” and “MicheleBachmann” with
blastp in the protein nr database. Which candidate has more hits with Evalues less than 100?
(10 points)
MicheleBachmann has 79. BarakObama has 9
3. The Caenorhabditis elegans gene SMA-4 is a member of the dwarfin
gene family, also called the MAD family, which plays a role in transforming
growth factor beta-mediated signal transduction. Find the first hit of the
SMA-4 protein (Accession P45897) in a different species using proteinprotein BLAST (blastp) and the nr database.
a. What is the first hit? (10 points)
Accession: XP_003110943.1 CRE-SMA-4 protein [Caenorhabditis remanei]
Exclude Caenorhabditis elegans or use the query:
animalia[orgn] NOT Caenorhabditis elegans[orgn]
b. What Entrez query will you use to search for this gene in all animals
excluding nematodes? (10 points)
animalia[orgn] NOT nematoda[orgn]
4. The following amplified DNA sequence is associated with a human
disease gene polymorphism:
GTTCACACTCTCTGCACTACCTCTTCATGGGTGCCTCAGAGCAGGACCTTGGTCTTTCCTTGTTTGAAGCTTTGGGCTAC
GTGGATGACCAGCTGTTCGTGTTCTATGATGATGAGAGTCGCCGTGTGGAGCCCCGAACTCCATGGGTTTCCAGTAGAA
TTTCAAGCCAGATGTGGCTGCAGCTGAGTCAGAGTCTGAAAGGGTGGGATCACATGTTCACTGTTGACTTCTGGACTAT
TATGGAAAATCACAACCACAGCAAGG
a. Use this sequence to Blast the human genome to identify the gene. What
gene is it? (5 points)
HFE gene
b. On what chromosome is this gene located? (5 points)
Chromosome 6
c. Use the link to “SNP:GeneView” and the BLAST alignment of the first
hit to identify the position and nature of the polymorphism. Hint: it is in
exon 2.
Describe the polymorphism. What are the alleles? (10 points)
Using blast’s alignment it is easy to identify the mismatch (G/C at 347 of
the reference sequence): The alleles are G/C. Since the record for the
amino acid change cannot be directly identified I will not deduct any
marks
5. John completed his homework assignment for the “Bioinformatics for
Biologists” class. He used Blast to search a sequence in the nr database. He
found a hit with an E-value of 10e-4. Amazed by the speed and efficiency of
the program he fell into a coma for 10 years. When he woke up, he searched
the same sequence again and found the same hit. Would you expect the Evalue of this hit to be the same? Explain. (15 points)
Answer:
The E-value will increase because it depends on the size of the database,
which is expected to increase tremendously. The bigger the database, the
higher the probability will be to find a sequence by chance.
6. Your lab has acquired a DNA sequence of a protein-coding gene from an
unidentified virus. Which of the Blast programs Blastn (nucleotide-Blast) or
tBlastx, is more likely to find significant hits of related sequences with low
sequence similarity? Explain. (15 points)
Answer: There are 20 amino-acids but only 4 nucleotides. Two unrelated
DNA sequences will have 25% sequence identity on average, whereas two
unrelated amino-acid sequences will have 5% sequence identity average.
Therefore, a search at the amino-acid level is more sensitive and will be able
to detect more distant homologs.