How Can I Do The Following Comparison Without Having To Write 20 If-statements Or Making 20 Lists/dictionaries?

May 26, 2024 Post a Comment

This problem is related to biology, so for those who know what amino acids and codons are, that's great! For those who don't, I have tried my best to phrase it so that you can unde

Solution 1:

You can set a dictionary up like this:

codon_lookup = {
    'ATT':'Isoleucine',
    'ATC':'Isoleucine', 
    'ATA':'Isoleucine',
    'CTT':'Leucine',
    'CTC':'Leucine', 
    'CTA':'Leucine',
     # ... etc
}

then you can make queries like

codon_lookup['ATT']

Which will give you

'Isoleucine'

EDIT:

You can set a dictionary up like this:

codon_lookup = {
    'ATT':'I',
    'ATC':'I', 
    'ATA':'I',
    'CTT':'L',
    'CTC':'L', 
    'CTA':'L',
     # ... etc
}

then you can make queries like

codon_lookup['ATT']

Which will give you

'I'

If you want to check your list of mutated_condons against this dictionary you can loop through it like this. If your mutated_condons list looks like ['ACA','GTT',...] then:

for mutated_codon in mutated_condons:
    print codon_lookup[mutated_codon]

Solution 2:

In light of the other two answers, here's another way of structuring it that I think might be best.

This will give you lookup dictionaries in both directions: SLC to Codon(s) and Codon to SLC.

slc_codon = {
    'I': ['ATT', 'ATC', 'ATA'],
    'L': ['CTT', 'CTC', 'CTA', 'CTG', 'TTA', 'TTG'],
    'V': ['GTT', 'GTC', 'GTA', 'GTG'],
    'F': ['TTT', 'TTC'],
    'M': ['ATG'],
    'C': ['TGT', 'TGC'],
    'A': ['GCT', 'GCC', 'GCA', 'GCG'],
    'G': ['GGT', 'GGC', 'GGA', 'GGG'],
    'P': ['CCT', 'CCC', 'CCA', 'CCG'],
    'T': ['ACT', 'ACC', 'ACA', 'ACG'],
    'S': ['TCT', 'TCC', 'TCA', 'TCG', 'AGT', 'AGC'],
    'Y': ['TAT', 'TAC'],
    'W': ['TGG'],
    'Q': ['CAA', 'CAG'],
    'N': ['AAT', 'AAC'],
    'H': ['CAT', 'CAC'],
    'E': ['GAA', 'GAG'],
    'D': ['GAT', 'GAC'],
    'K': ['AAA', 'AAG'],
    'R': ['CGT', 'CGC', 'CGA', 'CGG', 'AGA', 'AGG'],
    '*': ['TAA', 'TAG', 'TGA'],
}

codon_slc = dict((x, k) for k, v in slc_codon.items() for x in v)

>>> print codon_slc
>>> {'CTT': 'L', 'ATG': 'M', 'AAG': 'K', 'AAA': 'K', 'ATC': 'I', 'AAC': 'N', 'ATA': 'I', 'AGG': 'R', 'CCT': 'P', 'ACT': 'T', 'AGC': 'S', 'ACA': 'T', 'AGA': 'R', 'CAT': 'H', 'AAT': 'N', 'ATT': 'I', 'CTG': 'L', 'CTA': 'L', 'CTC': 'L', 'CAC': 'H', 'ACG': 'T', 'CAA': 'Q', 'AGT': 'S', 'CAG': 'Q', 'CCG': 'P', 'CCC': 'P', 'TAT': 'Y', 'GGT': 'G', 'TGT': 'C', 'CGA': 'R', 'CCA': 'P', 'TCT': 'S', 'GAT': 'D', 'CGG': 'R', 'TTT': 'F', 'TGC': 'C', 'GGG': 'G', 'TAG': '*', 'GGA': 'G', 'TAA': '*', 'GGC': 'G', 'TAC': 'Y', 'GAG': 'E', 'TCG': 'S', 'TTA': 'L', 'GAC': 'D', 'TCC': 'S', 'GAA': 'E', 'TCA': 'S', 'GCA': 'A', 'GTA': 'V', 'GCC': 'A', 'GTC': 'V', 'GCG': 'A', 'GTG': 'V', 'TTC': 'F', 'GTT': 'V', 'GCT': 'A', 'ACC': 'T', 'TGA': '*', 'TTG': 'L', 'CGT': 'R', 'TGG': 'W', 'CGC': 'R'}

Solution 3:

Answering just to address the question on how to look up the protein from the dictionary. Correctly creating the dictionary is the core of your problem, and the answers given so far address that well. I personally like FogleBird's reverse construction best, but any of the ways to define a dictionary mapping nucleotide symbol triples to amino acids will work fine.

Given the codon_lookup dictionary as you define it in your edit to the question and a list of three-letter strings named mutated_codon, the simplest way to print the amino acid symbols from a list of mutated codons would be this:

for codon in mutated_codon:
    print codon_lookup[codon]

Or, in Python 3.X:

for codon in mutated_codon:
    print(codon_lookup[codon])

It would be more in keeping with the usual Python style to name the list mutated_codons, since it will almost certainly be plural, but that's not a big deal. I'll proceed with mutated_codon below.

If I were writing this code, I would probably derive a list of the amino acids for the mutated codons, and possibly print or do other things on that list. The simplest way to do that is using a list comprehension to define a new list:

acids = [codon_lookup[codon] for codon in mutated_codon]

This is more or less syntactic sugar for a for loop that builds the new list:

acids = []
for codon in mutated_codon:
    acids.append(codon_lookup[codon])

But it's more concise. I hesitate to describe performance without actually testing the two different versions, but I believe the list comprehension form is also faster.

Either way, I could then iterate over that list to print them:

for acid in acids:
    print acid

As well as doing any further necessary processing.

One other option worth pointing out is the get method on the dictionary. The above code all uses direct key lookup on the dictionary, which will raise a KeyError if your codon is not found in the codon_lookup dictionary. This is probably the best choice - you're working with a limited range of input, and if something leads to you getting a string that isn't a valid codon in your mutated_codon list you probably want to see that exception rather than hiding it. But if in some future situation you're dealing with input that's less controlled, you might find the get method useful. This method of a dictionary accepts a key and an optional default value. If the key is in the dictionary, it returns the dictionary's value for that key. If the key is not in the dictionary, it returns the default value (if one was provided) or None (if one was not). If, for example, you wanted your code to treat any unknown codon as a stop, you could write something like this:

for codon in mutated_codon:
    print codon_lookup.get(codon, '*')

And since I earlier mentioned biopython, here's an example taken from their docs of using translate to convert DNA nucleotides to amino acids:

>>>from Bio.Seq import Seq>>>from Bio.Alphabet import generic_dna>>>coding_dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG", generic_dna)>>>coding_dna.translate()
Seq('MAIVMGR*KGAR*', HasStopCodon(ExtendedIUPACProtein(), '*'))

The docs I've linked to have tons more detail. It may be more than you need for this particular task, but it might save you some work if you need to take it further than a simple translation table.

Solution 4:

You can write down the table (from the link you gave) exactly as it is:

table =
    [
        {
            'amino_acid': 'Isoleucine',
            'codons': [ 'ATT', 'ATC', 'ATA' ]
        },
        {
            'amino_acid': 'Leucine',
            'codons': [ 'CTT', 'CTC', 'CTA', 'CTG', 'TTA', 'TTG' ]
        },
        ...etc...
    ]

Then use the following to transform to a dictionary that maps from codon to amino acid:

lookup = dict(itertools.chain.from_iterable(
    [[(codon, row['amino_acid']) for codon in row['codons']]for row intable]))

For example, lookup['TTA'] gives 'Leucine'.

I feel this is slightly more elegant than repeating the amino acid names, as in farmerjoe's answer, but that may be a matter of opinion. Better to do data entry in the most concise possible way then transform it programatically than to manually transform it by hand.

Solution 5:

Picking up Timothy Shields' comment on his own answer, it would be nice if you could just grab the table from http://www.cbs.dtu.dk/courses/27619/codon.html and generate your mapping out of that.

If you try to copy the table from the web page and paste it as a literal string in your source code, you'll see a few problems. First, different browsers have different ideas for how to copy tables. More seriously, the table is written very irregularly; some columns have no newlines, some have one, some have two, and some even have space after the newline. This is going to be very hard to parse.

But, if you take the HTML source, it's a lot easier. Not as easy as it should be (the page is a mess, with no IDs or names anywhere, and old-fashioned HTML styling instead of CSS, and so on), but it's immediately clear that the values you want are exactly the table columns, except without the box table at the top, the box header, and the footer. So, here's an example of how you could do that with BeautifulSoup. (It would be a little more verbose if you need to stick to the standard library, but not that much harder.)

import urllib2
import bs4

url='http://www.cbs.dtu.dk/courses/27619/codon.html'
page = urllib2.urlopen(url)
soup = bs4.BeautifulSoup(page)
codon_lookup = {}
forrowin soup.find_all('tr')[2:-1]:
    amino, slc, codons = (col.text.strip() forcolin row.find_all('td'))
    if slc == 'Stop':
        slc = 'Z'forcodonin codons.split(','):
        codon_lookup[codon.strip()] = slc

Of course in real life, you probably don't want your program to have to go online every time it needs to run. But you can easily have this program save the results to a pickle file (or a JSON file, or whatever else you prefer). Just add these lines:

import pickle
withopen('codons.pickle', 'wb') as f:
    pickle.dump(codon_lookup, f)

Now, in your main program, you just start off with:

withopen('codons.pickle', 'rb') as f:
    codon_lookup = pickle.load(f)

Learn Python Programming