Download DNA ANALYSIS: Public vs private access to the human genome

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genomic library wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genome (book) wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Deoxyribozyme wikipedia , lookup

DNA barcoding wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene expression profiling wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Pathogenomics wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Human genome wikipedia , lookup

History of genetic engineering wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene wikipedia , lookup

Sequence alignment wikipedia , lookup

Designer baby wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

RNA-Seq wikipedia , lookup

Microevolution wikipedia , lookup

Point mutation wikipedia , lookup

Microsatellite wikipedia , lookup

Genome editing wikipedia , lookup

Genomics wikipedia , lookup

Metagenomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Spring 2007
Biology 212 General Genetics
Bioinformatics Workshop
THE GOALS OF THIS TUTORIAL ARE:



to demonstrate the powerful tools for analyzing gene and protein sequences, many of
which are free to the public
to illustrate programs for analyzing the zebrafish cDNA clones you are working with
to see how genome analysis of the zebrafish can be extended to help understand human
genes
KEYWORDS:
BCM Search Launcher: A set of programs available at the Baylor College of Medicine
web site for DNA sequence analysis.
bioinformatics: The use of computing to analyze and store gene and protein
sequences.
BLAST: Programs which compare nucleotide or protein sequences and look for
similarities. For example, BLAST can be used to find a human gene like that of a known
mouse or fruit fly gene. In drug discovery, regions of newly identified proteins found to
be similar to existing proteins can help suggest new drug targets.
database: Stored DNA or protein sequence files or protein structure files.
GenBank: A publically accessible database consisting of DNA sequences. Currently
administered by the National Center for Biotechnology Information.
genomics: Study of the genomes (DNA sequences) of organisms.
proteomics: Study of the structure and function of proteins, the products of genes.
NCBI: National Center for Biotechnology Information. A US government sponsored site
which allows public access to research articles (PubMED), to databases such as
GenBank (Entrez) and to nucleotide and protein sequence analysis programs and
databases (BLAST).
NEBcutter: Utility that identifies restriction sites in a DNA sequence.
Use this tutorial to learn more about analyzing DNA sequences, using tools
available on the internet. Then apply what you learn to the particular cDNA sequence
you have been given for your lab project.
1. SEQUENCE ANALYSIS TO IDENTIFY GENES AND LOCATE NEW DRUG
TARGETS
Many genes that are discovered are similar in some regions to previously studied
genes or proteins. Discovering these similarities using computer analyses saves time
and money and can give companies a competitive edge on identifying new products and
possible drug targets.
A. TUTORIAL: HOW DO I FIND THE SEQUENCE FOR A PARTICULAR GENE?
For the first part of the tutorial, go to the following web site:
http://www.ncbi.nlm.nih.gov/
1
This is the site for the National Center for Biotechnology Information. This site
contains access to software and databases for DNA sequence analysis, human genetic
diseases and mapped human genes, and allows you to search for scientific articles in
PubMed.
Under the top panel where it says Search, select nucleotide from the pull-down
menu, ENTER THE ACCESSION NUMBER FOR THE GENE YOU WERE ASSIGNED
(highlighted in yellow or see your lab instructor) and click on GO. NOTE: THERE IS A
LIST OF THE ACCESSION NUMBERS FOR THE ZEBRAFISH cDNAS ON THE LAST
PAGE OF THE TUTORIAL; do not use the Z number. Click on the accession number
(underlined and in blue) to open the database file.
Assignment Part 1. a. Print out a copy of your assigned zebrafish cDNA sequence. To
obtain a hard copy of your data, your computer must be connected to a printer. Click on
the print button located in the top center of your web browser. If you do not have this
shortcut button, you may click on file and then click print and ok.
Part 1. b. Locate the following information about your sequence and circle or list them
i) What does the cDNA encode? What is the function of the protein product? (Note: to
fully answer this, you may need to do further searches)
ii) Genus and species name of the organism the cDNA is from.
iii) The vector used for cloning the cDNA.
B. TUTORIAL: HOW DO I LOCATE SIMILAR GENES?
To locate similar genes, programs such as blastn are used. Blastn tries to match
up your sequence with all the available DNA sequences stored in the databases. For
example, you could use blastn to identify a human gene or cDNA related to the
zebrafish cDNA you are characterizing. These results then might help to solve the
structure or suggest the function of a gene involved in a genetic disease.
From the NCBI home page double-click on BLAST in the top panel.
You will be taken to the page: http://www.ncbi.nlm.nih.gov/blast/
Scroll down in the Nucleotide box to Nucleotide-nucleotide BLAST (blastn) (the third
item under nucleotide) and click on it. In the large open box, type in the accession
number of your cDNA. This is the sequence you are searching with. You now can
choose a database to search against. Each database stores a different subset of DNA
sequences. Select “Others” (nr etc.). You will be more likely to be successful in your
search if you choose one of the databases from the table below. Scroll down the list to
select the desired database. Click on BLAST.
database
nr
est
est_human
est_others
htgs
sequence subset
All unique sequences--not always a good thing, can take a long time
Expressed sequence tags (cDNAs) from all species
Human expressed sequence tags
Expressed sequence tags from species other than human or mouse (would have
zebrafish sequences)
High throughput human genomic sequences (human genome draft sequences)
2
When the page comes up, click on Format! to view your results. Please be
patient, as it will take several minutes at least to complete the search.
The results consist of multiple data files of the gene or cDNA files from the
species databases you searched. The colored bars indicate how much of the region
matched with existing data files. The sequences most similar to your sample will appear
first in the results. Red colored bars are good and indicate a high probability of
homology. Weak homologies may also be found; these are indicated by short blue and
black bars and are unlikely to be significant.
To obtain a hard copy of your data, your computer must be connected to a printer.
Click on file on the upper left of your browser window. In the dialog box, select pages
and enter the specific pages you want to print (see below), then click print. DO NOT
PRINT OUT THE ENTIRE FILE, AS THE OUTPUTS CAN BE 20 PAGES LONG.
Assignment Part 2 a. Print out the first 3-5 pages of any search that produced a
significant homology (red bars). You will not receive full credit for printouts that are not
meaningful. Make sure the region of homology consists of at least 50 nucleotides and
that the reported probability value is very low (10-1 or smaller).
b. Give the sequence database you searched. What kinds of sequences are found
in that database?
c. What did you find? Annotate your printout to identify sequences that pertain to
one or two of the following questions.
 Did you find additional zebrafish cDNAs related to your cDNA?
 Did you find any zebrafish genes or genomic regions related to your
cDNA?
 Did you find any human cDNAs related to your zebrafish cDNA? What
does that mean?
 Did you find any human genes related to your zebrafish cDNA?
 Did you find any cDNAs from other species related to your zebrafish
cDNA?
Assignment Part 3. Find some additional information about your gene. For full credit,
follow up on at least two different leads or carry out two different analyses from
part a or b, and provide annotated printouts.
a. Use various links at the main NCBI site to find out more. For example,
 Search this site for the location of your gene on a particular chromosome in
humans, zebrafish or other organisms using MapViewer. MapViewer can be
found in the right hand menu. Be sure to use the protein name, not the
accession number in your search.
 Locate other information on the structure, function, map location, and/or
association of the gene with a human disease using the link for OMIM (Online
Mendelian Inheritance of Man) in the top panel.
 Find a reference to more information on the structure or function of the gene
using PubMed link in the top panel. Use a keyword to search, such as the name
of the protein product, not the accession number.
3
b. Use some additional software to analyze your sequence further, following the
instructions below. Some other types of analyses include:
 Identifying restriction sites
 Locating open reading frames
 Designing PCR primers
TO CARRY OUT ANY OF THESE ANALYSES, YOU WILL OFTEN NEED TO
CUT AND PASTE YOUR SEQUENCE INTO OTHER PROGRAMS. Document your
analysis by providing printouts of the results and include additional notes on
what analysis was done and what you could learn from it.
To cut and paste your sequence: open your sequence file using the NCBI site.
Use the mouse to highlight the beginning of your sequence and scroll down to the end
of your sequence. Use control C to copy the file or use copy from the pull down menu.
Use control V to paste the file in the large box that appears.
2. HOW DO I IDENTIFY RESTRICTION SITES IN MY DNA SEQUENCE?
In order to modify or further characterize a gene, you would probably want to
identify useful restriction sites to serve as landmarks. There are a number of free
programs available on the internet that will enable you to do restriction analysis. Go to
the NEBcutter site at the following URL:
http://tools.neb.com/NEBcutter2/index.php
To analyze your DNA sequence, either type in the GenBank accession number in the
appropriate place or cut and paste your sequence into the large box provided. Click on
SUBMIT. A linear map of the sequence and the restriction sites will be displayed.
a. Print out a copy of the output from this program using the main options
on the lower left. Print the GIF version unless your computer has an Adobe reader for
viewing/printing the PDF file.
b. Unique 6 bp restriction sites within a gene or just flanking a gene are often
among the most useful. Use the lower menu list to identify 1 or more unique sites in the
sequence (1 cutters). This page can also be printed. Circle on your printed map or
list an enzyme that has a unique site on your DNA.
3. HOW DO I DETERMINE WHERE THE PROTEIN CODING REGION IS ON MY
cDNA?
Sequence utilities enable you to do simple functions on the DNA, such as
translate the sequence, identify repeated regions, select PCR primers, or determine the
complementary sequence. We will use a web interface at the Baylor College of
Medicine with these utilities as well as access to a variety of other programs, such as
BLAST. We will use this site to theoretically translate a DNA sequence in all three
reading frames in both directions. This identifies “open reading frames” (ORFs), which
are the possible regions that encode proteins. Since we are translating cDNA
sequences, the genetic code for the protein should not have any introns interrupting the
coding region.
4
To open the BCM Search launcher, go to http://searchlauncher.bcm.tmc.edu
Where it says choose a type of search from the pull-down menu, scroll down, and then
click on--sequence utilities.
From the BCM Search Launcher, choose the 6 frame translation to locate
possible open reading frames in the sequence. Click on the [O] box on the right. Copy
and paste your sequence into the box and scroll down and click on perform conversion.
The correct frame for the protein is usually the largest segment of continuous amino
acids without a stop (*) codon, usually beginning with M (methionine). Print out this
analysis and circle or highlight the largest open reading frame among the six
translations.
4. HOW COULD I DESIGN PCR PRIMERS UNIQUE TO MY cDNA?
Often a smaller region of a cDNA sequence, for example just the protein coding
region is needed. There are many software tools for designing primers, once a DNA
sequence is known. This program we will use is called PrimerQuest, and is available
from a commercial supplier of primers, Integrated DNA Technologies. Go to
http://www.idtdna.com/SciTools/SciTools.aspx
Select the program, PrimerQuest, from the menu on the left (fourth item down). Type in
a name for your sequence, select PCR detection from the application list, and either
type in you’re your accession number where it says NCBI ID# and click on “GET
SEQUENCE” or cut and paste your sequence into the box. Make sure the option
“Design for PCR primers” is selected, keep the default setting for USE PARAMETER
SET as “PCR primers”, and click on the button below labeled “CALCULATE”.
a. Print the first page of the output, with the first set of designed primers. The
first set of primers would be those the software decides is most suitable for reproducing
a portion of your cDNA from sites within the zebrafish sequence.
b. Circle on the print out the size of the PCR product that would be produced in
PCR using this primer set on your zebrafish cDNA template.
SUMMARY OF ZEBRAFISH cDNAs
Clone ID
Z1
Z2
Z3
Z4
Z5
Z6
Z7
Z8
Z9
Z10
Z11
Z12
Z13
GenBank Accession #
Z27
Z28
Z29
Z30
Z31
Z32
Z33
Z34
Z35
Z36
Z37
Z38
Z39
AW466657
AW466660
AW423262
AW423173
AW423225
AW423235
AW423239
AW466513
AW422974
AW466671
AW423023
AW423266
AW422897
5
Z14
Z15
Z16
Z17
Z18
Z19
Z20
Z21
Z22
Z23
Z24
Z25
Z26
Z40
Z41
Z42
Z43
Z44
Z45
Z46
Z47
Z48
Z49
Z50
Z51
Z52
AW466555
AW466529
AW466686
AW466677
AW423006
AW466503
AW466541
AW466689
AW422876
AW422883
AW423264
AW422931
AW422881
PLEASE NOTE: Keep in mind that the sequence file does not necessarily represent the
cDNA you worked with in the PCR lab and that the sequence is not of the full length
cDNA insert.
Assignment:
Annotated computer results (10 pts). Hand in printouts with labels indicating what
they represent to receive credit for this lab. Due by last day of classes.
6