Download manual

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Replisome wikipedia , lookup

Protein moonlighting wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genome evolution wikipedia , lookup

Gene regulatory network wikipedia , lookup

Molecular cloning wikipedia , lookup

Non-coding DNA wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Gene expression wikipedia , lookup

Protein structure prediction wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

List of types of proteins wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Gene wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Biochemistry wikipedia , lookup

Molecular evolution wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Community fingerprinting wikipedia , lookup

Expanded genetic code wikipedia , lookup

Genetic code wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Lab Practical 4
UECM 1703 Introduction to Scientific Computing
(May 2011)
Learning Outcomes
After completing this lab practical, students are expected to be able to:
1. use for loop and logical comparison operators
2. load and save text data files
3. manipulate strings
4. use MATLAB to plot bar chart
Instruction
As you follow this laboratory you will notice a series of questions. These are to be answered in a
Microsoft Word document. This Word document should be saved as report 4.doc. The instructions
for writing the report and a template of the report could be found in WBLE.
You are require to email your report 4.doc to [email protected] with the subject of your
email is “UECM1703 Report 4”.
Note: Please note that an email filter will be created. The filter is to search for the phrase
“UECM1703 Report 4” of all the subject of the incoming emails. This filter will be used to
group all lab reports into a designated folder. Thus, it is very important that the subject of your
email “UECM1703 Report 4” must be accurate or you risk your report will not be group into
the designated folder and will not be marked!
Deadline
Please hand in the hard copy and email the softcopy of your report before Thursday of the following
week.
Marks will be deducted for inaccurate email subject and late submissions.
1
GC Content and Translation of DNA
1
Introduction
DNA (Deoxyribonucleic acid) is a nucleic acid that contains heredity information, or genetic instructions used in the development of living organisms. DNA is often called the “recipe” of constructing
components of cells, such as enzymes, proteins, and RNA.
In chemistry, DNA is known as a double helix polymers, i.e. two long chains of twisted polymers.
Polymers are macromolecules that made up of repeating simple units called monomers. For DNA
the monomers are called nucleotides, and there are 4 different types of monomers, Adenine (A),
Thymine (T), Cytosine (C), and Guanine (G). DNA is then a pair of twisted polymers made from
the repeating A, T, C, G monomers. The pair of the polymers are twisted together by hydrogen
bonds between the monomers of two polymers, where ‘A’ form two hydrogen bonds with ‘T’ and
‘C’ form three hydrogen bonds with ‘G’ (see Fig. 1). Thus, it is sufficient to specify the sequence
of the nucleotides o one strand of the DNA, as the opposite strand could always be deduce from
the first strand.
Figure 1: A schematic diagram showing double helix structure of DNA and the hydrogen bonds
between the monomers.
Not all of the nucletides in the DNA of an organism contain heredity information. The segments
of the DNA that contains heredity information are called genes. In this lab practical sessions, the
genes on one of the strand of the DNA of an organism called “Borrelia burgdorferi” were extracted.
Each group of the students for UECM1703 are required to analyse one of these genes.
2
2
GC Content of a gene
GC content of the DNA of an organism is a simple quantity that could be used to infer the thermal
stability of the gene or DNA sequences. The GC content of a DNA sequence can be calculated
from the following simple formula:
QGC =
Total numbers of C and G nucleotides in the sequence
× 100%
Total numbers of all nucleotides in the sequence
(1)
Now each group will be given a different set of gene to analyse. The sequence of the genes are
stored in the files “BBUxx.txt”, where xx is two digits. Now you are required to do the following:
• Create a new m-file called “lab04_1.m”, use this m-file to record all your MATLAB commands.
• Read in the gene sequence in the data file “BBUxx.txt” and called the sequence data gene.
RECORD down which “BBUxx.txt” you are using for you lab session. Please note that,
the MATLAB command load will not work here, it is used for loading numeric data. Use
the MATLAB help browser to find a suitable command that allow you to read in string data
from a text file.
• Calculate the total numbers of ‘A’, ‘C’, ‘T’ and ‘G’ of your gene.
• Calculate the total length of your gene.
• Calculate the GC content of your gene.
• Plot a bar chart that show the numbers of ‘A’, ‘C’, ‘T’ and ‘G’ of your gene.
Hints: you could either use a for loop to check character by character, or you could use the
mask technique you learned in the last practical session.
3
Gene translation
The central dogma of the molecular biology is that one gene in the DNA will produce one enzyme,
and the collections of all genes in the DNA produce all the neccessary enzymes that “runs” the
metabolism of the organism.
Enzymes are proteins, a kind of polymer like DNA but is made with monomers called amino
acids. There are in total 20 amino acids. Combinations of these amino acids made different kinds
of proteins and enzymes.
You can think of the a gene is a recipe to make a specific protein. Every 3 nucleotides corresponds
to one amino acid. For example the triplet of ‘AAA’ is used to encode the amino acid Lysine (see
3
Fig. 2). The triplet of 3 nucleotides is called a codon. Note that most amino acids can be coded
by more than one codon, for example, Lysine can be coded by either ‘AAA’ or ‘AAG’.
Figure 2: A translation table showing the 20 amino acids found in proteins and the corresponding
triplets of nucleotides (codons). In the table, the single-letter code (SLC) used to represent the 20
amino acids in the protein data bases are also listed. All 64 possible 3-letter combinations of the
DNA coding units ‘A’, ‘C’, ‘T’ and ‘G’ are used either to encode one of these amino acids or as one
of the three stop codons that signals the end of a sequence.
You task in this section of the lab practical is to translate your gene sequence into amino acid
sequence. For example, if your gene sequence is
ATG ATA AAA ACA CCA ATA ATT AGT GAA
then you will need to write a MATLAB code to translate it into a single-letter code of amino acid
sequence:
MIKTPIISE.
• Create a new m-file called “lab04_2.m”, use this m-file to record all your MATLAB commands.
• Read in the gene sequence in the data file “BBUxx.txt” and called the sequence data gene.
4
• Reshape your string into an array of strings. Each entry in your array of strings consist of 3
nucleotide letters. Name your array as codon.
• Write a for loop to iterate through every triplets in your codon and use a logical comparison
operator to translate the triplets into the single-letter code of the corresponding amino acids.
Store your resulting sequence of amino acids into a variable called protein.
• Save your variable protein into a file “BBUxx_pro.txt”, specify your file type is ascii, then
attach and submit this file together with your report.
• The following fragment of code is an example of how to translate for Lysine:
protein = ’’
for i=1:length(codon)
code = codon(i,:)
if (code ==’AAA’ | code == ’AAG’)
protein=[protein,’K’]
elseif (code == ...)
...
...
end
end
protein
You could either following the pattern of the above fragment of codes to write your translation
program, or you might want to investigate the MATLAB commands switch and case. Good luck.
——————ooOO The End & Engjoy Your Lab Practical Session OOoo——————
5