* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download manual
Protein moonlighting wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Gene expression profiling wikipedia , lookup
Genome evolution wikipedia , lookup
Gene regulatory network wikipedia , lookup
Molecular cloning wikipedia , lookup
Non-coding DNA wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Gene expression wikipedia , lookup
Protein structure prediction wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
List of types of proteins wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Biochemistry wikipedia , lookup
Molecular evolution wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Community fingerprinting wikipedia , lookup
Expanded genetic code wikipedia , lookup
Lab Practical 4 UECM 1703 Introduction to Scientific Computing (May 2011) Learning Outcomes After completing this lab practical, students are expected to be able to: 1. use for loop and logical comparison operators 2. load and save text data files 3. manipulate strings 4. use MATLAB to plot bar chart Instruction As you follow this laboratory you will notice a series of questions. These are to be answered in a Microsoft Word document. This Word document should be saved as report 4.doc. The instructions for writing the report and a template of the report could be found in WBLE. You are require to email your report 4.doc to [email protected] with the subject of your email is “UECM1703 Report 4”. Note: Please note that an email filter will be created. The filter is to search for the phrase “UECM1703 Report 4” of all the subject of the incoming emails. This filter will be used to group all lab reports into a designated folder. Thus, it is very important that the subject of your email “UECM1703 Report 4” must be accurate or you risk your report will not be group into the designated folder and will not be marked! Deadline Please hand in the hard copy and email the softcopy of your report before Thursday of the following week. Marks will be deducted for inaccurate email subject and late submissions. 1 GC Content and Translation of DNA 1 Introduction DNA (Deoxyribonucleic acid) is a nucleic acid that contains heredity information, or genetic instructions used in the development of living organisms. DNA is often called the “recipe” of constructing components of cells, such as enzymes, proteins, and RNA. In chemistry, DNA is known as a double helix polymers, i.e. two long chains of twisted polymers. Polymers are macromolecules that made up of repeating simple units called monomers. For DNA the monomers are called nucleotides, and there are 4 different types of monomers, Adenine (A), Thymine (T), Cytosine (C), and Guanine (G). DNA is then a pair of twisted polymers made from the repeating A, T, C, G monomers. The pair of the polymers are twisted together by hydrogen bonds between the monomers of two polymers, where ‘A’ form two hydrogen bonds with ‘T’ and ‘C’ form three hydrogen bonds with ‘G’ (see Fig. 1). Thus, it is sufficient to specify the sequence of the nucleotides o one strand of the DNA, as the opposite strand could always be deduce from the first strand. Figure 1: A schematic diagram showing double helix structure of DNA and the hydrogen bonds between the monomers. Not all of the nucletides in the DNA of an organism contain heredity information. The segments of the DNA that contains heredity information are called genes. In this lab practical sessions, the genes on one of the strand of the DNA of an organism called “Borrelia burgdorferi” were extracted. Each group of the students for UECM1703 are required to analyse one of these genes. 2 2 GC Content of a gene GC content of the DNA of an organism is a simple quantity that could be used to infer the thermal stability of the gene or DNA sequences. The GC content of a DNA sequence can be calculated from the following simple formula: QGC = Total numbers of C and G nucleotides in the sequence × 100% Total numbers of all nucleotides in the sequence (1) Now each group will be given a different set of gene to analyse. The sequence of the genes are stored in the files “BBUxx.txt”, where xx is two digits. Now you are required to do the following: • Create a new m-file called “lab04_1.m”, use this m-file to record all your MATLAB commands. • Read in the gene sequence in the data file “BBUxx.txt” and called the sequence data gene. RECORD down which “BBUxx.txt” you are using for you lab session. Please note that, the MATLAB command load will not work here, it is used for loading numeric data. Use the MATLAB help browser to find a suitable command that allow you to read in string data from a text file. • Calculate the total numbers of ‘A’, ‘C’, ‘T’ and ‘G’ of your gene. • Calculate the total length of your gene. • Calculate the GC content of your gene. • Plot a bar chart that show the numbers of ‘A’, ‘C’, ‘T’ and ‘G’ of your gene. Hints: you could either use a for loop to check character by character, or you could use the mask technique you learned in the last practical session. 3 Gene translation The central dogma of the molecular biology is that one gene in the DNA will produce one enzyme, and the collections of all genes in the DNA produce all the neccessary enzymes that “runs” the metabolism of the organism. Enzymes are proteins, a kind of polymer like DNA but is made with monomers called amino acids. There are in total 20 amino acids. Combinations of these amino acids made different kinds of proteins and enzymes. You can think of the a gene is a recipe to make a specific protein. Every 3 nucleotides corresponds to one amino acid. For example the triplet of ‘AAA’ is used to encode the amino acid Lysine (see 3 Fig. 2). The triplet of 3 nucleotides is called a codon. Note that most amino acids can be coded by more than one codon, for example, Lysine can be coded by either ‘AAA’ or ‘AAG’. Figure 2: A translation table showing the 20 amino acids found in proteins and the corresponding triplets of nucleotides (codons). In the table, the single-letter code (SLC) used to represent the 20 amino acids in the protein data bases are also listed. All 64 possible 3-letter combinations of the DNA coding units ‘A’, ‘C’, ‘T’ and ‘G’ are used either to encode one of these amino acids or as one of the three stop codons that signals the end of a sequence. You task in this section of the lab practical is to translate your gene sequence into amino acid sequence. For example, if your gene sequence is ATG ATA AAA ACA CCA ATA ATT AGT GAA then you will need to write a MATLAB code to translate it into a single-letter code of amino acid sequence: MIKTPIISE. • Create a new m-file called “lab04_2.m”, use this m-file to record all your MATLAB commands. • Read in the gene sequence in the data file “BBUxx.txt” and called the sequence data gene. 4 • Reshape your string into an array of strings. Each entry in your array of strings consist of 3 nucleotide letters. Name your array as codon. • Write a for loop to iterate through every triplets in your codon and use a logical comparison operator to translate the triplets into the single-letter code of the corresponding amino acids. Store your resulting sequence of amino acids into a variable called protein. • Save your variable protein into a file “BBUxx_pro.txt”, specify your file type is ascii, then attach and submit this file together with your report. • The following fragment of code is an example of how to translate for Lysine: protein = ’’ for i=1:length(codon) code = codon(i,:) if (code ==’AAA’ | code == ’AAG’) protein=[protein,’K’] elseif (code == ...) ... ... end end protein You could either following the pattern of the above fragment of codes to write your translation program, or you might want to investigate the MATLAB commands switch and case. Good luck. ——————ooOO The End & Engjoy Your Lab Practical Session OOoo—————— 5