Download DNA, RNA, Protein Sequences as Strings of Characters.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
DNA, RNA, Protein Sequences as Strings of Characters.
Much bioinformatics analysis is concerned with searching and comparing the sequences of DNA
or proteins.
DNA sequences are strings of 4 letters: ATGC. The letters represent different bases of DNA.
RNA sequences are strings of 4 letters: AUGC.
Protein sequences are strings of 20 letters: ACDEFGHIKLMNPQRSTVWY for different amino
acids.
GenBank and SWISS-PROT records contain information about the organism, gene, and protein
sequence with literature references. The sequence includes the numbers, which may cause
problems with some analysis programs.
FASTA format is a standard format that is commonly used by programs that analyze sequence
data. This format consists of one line starting with “>” that labels the sequence, followed by the
sequence data as several lines of letters.
DNA sequence of gene in FASTA format:
>gi|16127994:3597560-3598414
TTACGCTTCAATGGCAGCACGCAATTTTTTCATCGCGTTCTTTTCCAGCTGGCGTACACGCTCAGCGGAA
ACGCCGTAACGGTCAGCCAGTTCCTGCAACGTGGACTTGTTGTCTTCGTCCAGCCAGCGCGCACGGATGA
TGTCCTGGCTGCGTTCGTCCAGACCCTGCATCGCGTCGGTCAGACGGTTTGCCGCCTGCTCTTCCCAGTT
ATCATCTTCAATGCCGTCGGCAAAGTTAGATGATTTATCCTGCAGATAGAGCACCGGAGCCATCGGCTGG
CTGTCGGAATCGTCGTCGGAAGACAGGTCAAAGGTCATGTCCTGTGCCGCCATACGTGATTCCATCTCAC
GTACGTCTTTGCTGGTTACGCCCAGTTCACGGGCCACCATTTCGACTTCATCCTGGTTAAACCAGCCCAG
ACGCTGCTTGGTTTTACGCAGGTTGAAGAACAGTTTGCGCTGCGCTTTGGTGGTCGCAACTTTGACGATA
CGCCAGTTACGCAGAACGTATTCGTGGATCTCTGCTTTGATCCAGTGAACGGCGAAGGAGACCAGGCGCA
CACCCACTTCCGGGTTGAAACGGCGCACTGCTTTCATCAGGCCGATGTTACCTTCCTGAATCAAATCCGC
CTGTGGCAGGCCATAGCCCGCATAATTACGAGCAATATGAACAACAAACCGCAGGTGAGACAGGATCAGC
GTTTTAGCTGCTTCCAGATCGCCATGGTAATGCAGCTTTTCAGCCAGCGCCCGCTCCTCGTCAGCCGACA
ACATCGGCCACGCGTTAGCTGCCCGGATGTAGGAATCCAGGTTGCCAACTGGGGCTAAAGCTAAACTTTG
CATTTTGTCAGTCAT
Protein sequence in FASTA format:
>gi|16131333|ref|NP_417918.1| RNA polymerase, sigma(32) factor
MTDKMQSLALAPVGNLDSYIRAANAWPMLSADEERALAEKLHYHGDLEAAKTLILSHLRFVVHIARNYAG
YGLPQADLIQEGNIGLMKAVRRFNPEVGVRLVSFAVHWIKAEIHEYVLRNWRIVKVATTKAQRKLFFNLR
KTKQRLGWFNQDEVEMVARELGVTSKDVREMESRMAAQDMTFDLSSDDDSDSQPMAPVLYLQDKSSNFAD
GIEDDNWEEQAANRLTDAMQGLDERSQDIIRARWLDEDNKSTLQELADRYGVSAERVRQLEKNAMKKLRA
AIEA
Related documents