* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Conceptual Translation as a part of Gene Expression
Bisulfite sequencing wikipedia , lookup
RNA interference wikipedia , lookup
Molecular cloning wikipedia , lookup
Promoter (genetics) wikipedia , lookup
DNA supercoil wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Eukaryotic transcription wikipedia , lookup
Community fingerprinting wikipedia , lookup
RNA polymerase II holoenzyme wikipedia , lookup
Metalloprotein wikipedia , lookup
Non-coding DNA wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Polyadenylation wikipedia , lookup
Proteolysis wikipedia , lookup
RNA silencing wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Protein structure prediction wikipedia , lookup
Messenger RNA wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Point mutation wikipedia , lookup
Biochemistry wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene expression wikipedia , lookup
Epitranscriptome wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Council for Innovative Research www.cirworld.com International Journal of Computers & Technology Volume 3 No. 3, Nov-Dec, 2012 Conceptual Translation as a part of Gene Expression Sukhjit Singh Sehra Sumeet Kaur Sehra Assistant Professor Guru Nanak Dev Engineering College, Ludhiana, Punjab Assistant Professor Guru Nanak Dev Engineering College, Ludhiana, Punjab ABSTRACT The major problem being faced by biologists and researchers is huge amounts of raw data but with a lack of means to effectively use this data. DNA is the main building block of a living organism. The information stored in DNA is used to make a more trasisent, single standard polynucleotide called RNA (ribonucleic acid). The process of making a RNA copy of a Gene is called transcription and is accomplished through the enzymatic activity of an RNA polymerase. There is a one to one correspondence between the nucleotide used to make RNA (G, A, U, C) and the nucleotide sequence in DNA (G, A, T, C respectively).The next process of converting that information from nucleotide sequence in RNA to the mRNA. The sequence after skipping the intron part is m-RNA which contains exons only. The next process of converting that information from nucleotide sequence in mRNA to the amino acid sequence that make protein is called translation. In the present work, the process of conceptual translation of gene sequences is implemented corresponding to amino acid sequence. The mRNA to Protein Sequence Conversion is done by dividing the sequence of mRNA into groups of three nucleotides called codons. These codons are replaced by corresponding amino acid combinations, which gives the resultant protein. All the possible amino acid combinations are displayed. contain ribose sugars while DNA contains deoxyribose and RNA uses predominantly uracil instead of thymine present in DNA. RNA is transcribed from DNA by enzymes called RNA polymerases and further processed by other enzymes [7]. RNA serves as the template for translation of genes into proteins, transferring amino acids to the ribosome to form proteins, and also translating the transcript into proteins [4][6]. 1.2.1 Messenger RNA (mRNA) Messenger RNA is RNA that carries information from DNA to the ribosome sites of protein synthesis in the cell. In eukaryotic cells, once mRNA has been transcribed from DNA, it is "processed" before being exported from the nucleus into the cytoplasm, where it is bound to ribosomes and translated into its corresponding protein form with the help of tRNA. In prokaryotic cells, which have not partition into nucleus and cytoplasm compartments, mRNA can bind to ribosomes while it is being transcribed from DNA [2]. After a certain amount of time the message degrades into its component nucleotides, usually with the assistance of ribonucleases [4][6]. 1.2.2 Transfer RNA (tRNA) 1. THE GENETIC MATERIAL Transfer RNA is a small RNA chain of about 74-95 nucleotides that transfers a specific amino acid to a growing polypeptide chain at the ribosomal site of protein synthesis during translation. It has sites for amino-acid attachment and an anticodon region for codon recognition that binds to a specific sequence on the messenger RNA chain through hydrogen bonding. It is a type of non-coding RNA [4][6] DNA is the main constituent of genetic material with in a body. DNA is converted into RNA and then to Protein. 2. THE CENTRAL DOGMA Keywords DNA, RNA, amino acid, codons. 1.1 DNA (Deoxyribonucleic acid) DNA (deoxyribonucleic acid) is the genetic material. This is a profoundly powerful statement to molecular biologists. It is the information stored in DNA that allows the organization of inanimate molecules into functioning of living cells and organisms that are able to regulate their internal chemical composition, growth, and reproduction. As a direct result, it is also what allows us to inherit our mother’s curly hairs, our father’s blue eyes, and even our uncle’s too large nose. The various units that govern those characteristics at the genetic level, be it chemical composition or nose size, are called genes [4][6]. 1.2 RNA(Ribonucleic acid) RNA is a nucleic acid polymer consisting of nucleotide monomers that plays several important roles in the processes that translate genetic information from deoxyribonucleic acid (DNA) into protein products; RNA acts as a messenger between DNA and the protein synthesis complexes known as ribosomes, forms vital portions of ribosomes, and acts as an essential carrier molecule for amino acids to be used in protein synthesis. RNA is very similar to DNA, but differs in a few important structural details: RNA is single stranded, while DNA is double stranded. Also, RNA nucleotides 503 | P a g e The sequence of nucleotide in a DNA molecule can have important information content for a cell. It is actually proteins that do the work of altering the cells chemistry by acting as biological catalysts called enzymes. The process by which information is extracted from the nucleotide sequence of gene and then used to make a protein is essentially the same for all living things on earth and is described by the grandly named central dogma of molecular biology. The Central Dogma of molecular biology relates DNA, RNA, and proteins. Briefly put, the Central Dogma makes the following claims. The amino acid sequence of a protein provides an adequate “blueprint” for the protein’s production. Protein blueprints are encoded in DNA in the chromosomes. The encoded blueprint for a single protein is called a gene. A dividing cell passes on the blueprints to its daughter cells by making copies of its DNA in a process called replication. The blueprints are transmitted from the chromosomes to the protein factories in the cell in www.ijctonline.com Council for Innovative Research www.cirworld.com International Journal of Computers & Technology Volume 3 No. 3, Nov-Dec, 2012 the form of RNA. The process of copying the DNA into RNA is called transcription. 4. SOLUTION METHODOLOGY The RNA blueprints are read and used to assemble proteins from amino acids in a process known as translation. Amino acids are the units that are stringed together to make proteins. The function of a protein is intimately dependent on the order in which its amino acids are linked by ribosomes during translation. Twenty different amino acids are used in protein systhesis. Figure 1: Central Dogma of Molecular Biology 3. CHOICE OF SEQUENCE FORMAT There are four basic types of molecules involved in life: (1) small molecules, (2) proteins, (3) DNA and (4) RNA. Proteins, DNA and RNA are known collectively as biological macromolecules. DNA is the main information carrier molecule in a cell. DNA may be single or double stranded. A single stranded DNA molecule, also called a polynucleotide, is a chain of small molecules, called nucleotides. There are four different nucleotides grouped into two types, purines: adenosine and guanine and pyrimidines: cytosine and thymine. They are usually referred to as bases (in fact bases are the only distinguishing element between different nucleotides, and denoted by their initial letters, A, C, G and T. This sequence is stored in text files. There is very long sequence (Chain of different combinations of ATCG) stored in the file. The main interest of biotechnologist is to search the sequence for similarity with other sequence. The only difference in DNA and RNA is that in RNA Thymine (T) is replaced with Uracil (U). There are different formats that are used to organize this sequence data. Types of different data formats [5] are given below: Plain Text FASTA Format Genbank Genetic Computer Group Format (GCG) In the present problem, Plain Text format is chosen which looks like the following: CTATGACTTGATTGCGACTGATATTGACAAGAATTCA TAAATTAAGTGAAACTAAACGAACCTCTTATAATTTC GTTTAAATTTAAAATTGTGAAAAATTAATCTAAAAT 504 | P a g e Figure 2: Conceptual Translation Standard 3-letter abbreviation is used for each of the most commonly used 20 amino acids. Each amino acid can be assigned to one of essentially four different categories: NonPolar, Polar, Positively charged and Negatively charged. A single change within a triplet codon is usually not sufficient to cause a codon to code for an amino acid in a different group. The genetic code is remarkably robust and minimizes the extent to which mistake in the nucleotide sequence of genes can change the function of the protein[1][6]. The mRNA to protein sequence is computed according to the flow chart shown in fig. 2. This process is also called conceptual translation. The mRNA sequence is accepted in the form of plain text format. As a result, it is necessary for ribosomes to use a triplet code to translate the information in DNA, RNA and mRNA into amino acid sequence of proteins. Each group of three nucleotide (a codon) in an mRNA copy of the coding portion of a gene corresponds to a specific amino acid. Translation by ribosome starts at translation initiation sites on mRNA copy of gene and proceeds until a stop codon is encountered. Three codons of the genetic code are reserved as www.ijctonline.com Council for Innovative Research www.cirworld.com International Journal of Computers & Technology Volume 3 No. 3, Nov-Dec, 2012 stop codons, one triplet codon is always used as a start codon. The codon AUG is used to code the amino acid methionine from the existing database. Accurate translation can occur when ribosome examine codons in the phase or reading frame that is established by a gene’s start codon. The alterations of a gene’s reading frame change every amino acid coded downstream of the alteration. The result are shown accordingly in the flow chart giving the amino acid and the starting and end point of the mRNA sequence method in the existing amino acid from the Database and to produce the Protein sequence. 5. RESULTS AND DISCUSSION In the process of translation, mRNA is grouped into combinations of three nucleotides called codons. The amino acid chain is extracted by matching the input sequence with the amino acid database as given in table 1. For the input mRNA sequence, three amino acid sequences are possible as shown in table 2. The different outputs are possible depending on the different triplet combinations. Table 1: Codons for amino acid 6. REFERENCES [1] Berman, H.M., Westbrook, J. and Feng, Z., et al., (2000), “The protein data bank”, Nucleic Acids Res., vol. 28, pp. 235–242. [2] Brunak, S., Engelbrecht, J., and Knudsen, S. (1991), “Prediction of human mRNA donor and acceptor sites from the DNA sequence”, Journal of Molecular Biology, vol. 220, pp. 49-65. [3] Huang, D.S. and Zhu, Y.P. (2005), “Improving protein secondary structure prediction by using the residue conformational classes”, Elsevier, vol. 26, pp. 2346– 2352. [4] Krane, D. and Raymer, M. (2003), “Fundamental Concepts of Bioinformatics”, Pearson Education, New Delhi, pp.1-314. [5] Malhi, M.S. (2003), “Development of Data Mining model for bioinformatics system”, M.Tech Thesis, PAU, Ludhiana. pp. 1-76. [6] Rastogi, S.C. and Mendiratta N. (2004), “Bioinformatics Methods and Applications”, Book, Prentice Hall of India, New Delhi. pp. 1-194. [7] Segal, E. and Yelensky, R. (2003), “Genome-Wide Discovery of Transcriptional Modules from DNA Sequence and Gene Expression”, Bioinformatics, vol. 19, pp. 273-282.\ Table 2: Output amino acid sequence 505 | P a g e www.ijctonline.com