Seed Media Group

Discovering Biology in a Digital World

My thoughts on biology, teaching, life, and exploring the living world via the digital one. Only my opinions are represented by these postings, they do not represent the viewpoints of any funding agency or Geospiza, Inc.

Profile

Sandra Porter I am a microbiologist and molecular biologist turned tenured biotech faculty turned bioinformatics scientist turned entrepreneur. My passion is developing instructional materials for 21st century biology (Geospiza Education).

Search this blog

Learn about DNA with molecular models

Exploring DNA Structure


Subscribe to Geospiza Education News


e-mail digitalbio at gmail.com


DigitalBio Favorites

Molecular Momentos


Recent Posts

Recent Comments

Archives

Categories

Blogroll

Science Education Groups

Science Blogs School Fundraiser



Keep up to date

Awards

Red Orbit

Digital Bio at Blogged


Add Digital Bio to your Technorati Favorites!

Interesting places

  • xkcd
  • The Tangled Bank
    MicrobeWorld Radio

    « Greg Laden may love Linux, but nothing beats Mac and PC | Main | Let's not forget the pets! »

    Digital Biology Friday: Free to evolve?

    Category: Chemistry & BiochemistryDigital Biology FridaysEvolutionScience educationclassroom activitiesmolecular structures
    Posted on: December 7, 2007 9:30 AM, by Sandra Porter

    This is a fun puzzle. The pink molecule is a protein and the other molecule is a nucleic acid.

    If I gave you the amino acid sequence of this protein, or the nucleotide sequence of this nucleic acid, what is the probability of finding a similar sequence in a different species (picked at random)?

    A. High
    B. Medium
    C. Low
    D. It depends on the database that you're searching.

    You can have more than one answer.

    Now, here's the hard part. Explain why you think your answer is correct.

    Comments

    #1

    If you're searching the protein database, you won't find the NA. And the reverse is also true.
    It depends on what you mean by "similar;" it also depends on what the protein is, which nucleic acid that is, and what the NA codes for. Conservation of sequence identity is much higher for certain proteins and nucleic acid sequences than others. It all depends.

    Posted by: DAG | December 7, 2007 11:54 AM

    #2

    Okay - I'll add a few more criteria.

    1. You are searching the database that contains the appropriate molecule - that is, you're looking for nucleotides in a nucleotide database and proteins in a protein database.

    2. What do I mean by similar? This can be kind of fuzzy and I haven't looked at a large enough sample so my guesses may be off. For now, let's say for the protein that at least 30% of the amino acids are either identical or conserved. For the nucleic acid, how about 60% identity.

    Posted by: Sandra Porter | December 7, 2007 12:04 PM

    #3

    medium

    Posted by: Trey | December 7, 2007 1:32 PM

    #4

    That's a guess.

    You can't get it right unless you can explain why you think the probability would be medium.

    Posted by: Sandra Porter | December 7, 2007 1:39 PM

    #5

    Naively I'd say:
    A. for the nucleotide sequence
    C. for the protein sequence
    and D. for both (since the size of the database will have a huge effect on the above probabilities).

    That based on the assumptions below:

    1. It looks like you're searching for ~8 bases in comparison to err... ~100 amino acids?
    2. There are 130 billion bases in Genbank/Refseq (from ~20 Million sequences)
    3. There are a significantly smaller number of aminoacids in UniProtKB/TrEMBL (~ 4 Million proteins - not sure on the aa count).
    4. There are only 4 bases in the nucleotide "alphabet".
    5. There are 20 amino acids in the protein "alphabet".

    So, if all 20 million entries in Genbank were 8 mers then we'd expect to find er... 1/65536 * 20,000,000 = 305 hits? (naive probability of any given 8 mer is 1/4 * 1/4 * 1/4 * 1/4 * 1/4 * 1/4 * 1/4 * 1/4 = 1/65536)

    How does this look for the protein sequence?

    Naive probability of any given 100 aa sequence = 1/100^20 = 7.9E-131;
    7.9E-131 * 4,000,000 = 3.1E-124 hits. Not that many then...

    Posted by: Paul | December 7, 2007 2:27 PM

    #6

    Paul: that's an interesting way to approach the problem, but the answer isn't strictly related to probability.

    Look for the answer in the biology.

    Posted by: Sandra Porter | December 7, 2007 2:35 PM

    #7

    Is that the "bottom" section of a t-RNA molecule? I just assumed that it was a transcription factor bound to DNA. If that protein is a ribosome subunit as well, then I'd go for:

    A: protein
    B: RNA (I don't know much about t-RNA conservation across species)

    Posted by: Paul | December 7, 2007 3:38 PM

    #8

    Here's a clue: the identity of the molecules doesn't matter as much as the relationship between them.

    Posted by: Sandra Porter | December 7, 2007 4:00 PM

    #9

    Actually, the identity of the molecules does matter as much as the relationship between them (from my perspective).

    If this is something like a histone/DNA interaction that's essential to the viability of the organism then the likelihood of finding sequence conservation (for either nucleotide or protein) is high for any randomly picked organism. But if this is a transcription factor interacting with a promoter/inhibitor element and isn't essential for the viability of the organism then the likelihood of finding any hits is lower. Especially if this example came from a species which has experienced genome duplication in the past.

    Posted by: Paul | December 7, 2007 5:03 PM

    #10

    You can't assume that transcription factors are unessential. Lots of transcription factors, e.g. the HOX genes are essential for viability.

    Anyway, the molecules that you see above are both essential.

    Posted by: Sandra Porter | December 7, 2007 5:10 PM

    #11

    lol! I wasn't implying that all TFs are unessential, I was saying that *some* are unessential and more "Free to Evolve".

    Posted by: Paul | December 7, 2007 5:21 PM

    #12

    You're getting close, I think. These two molecules are essential, but many essential molecules can still evolve - at least in some areas.

    Certainly, some of that freedom is related to the copy number number. In this case, though, copy number isn't the critical factor that controls whether these can change or not.

    Here's another hint: the answer is in the picture

    Posted by: Sandra Porter | December 7, 2007 5:28 PM

    #14

    From your given definition of "similar" as 30% identity/conserved for amino acids, 60% identity nucleotides -- then for amino acids, medium to high, depending on the actual protein, and for nucleotides, very, very low.

    First off, the amino acid sequence can be 100% identical, but the nucleotide sequence 60% identical or below at the same time, solely through 3rd-base wobble. Synonymous mutations aren't selected against (at least they usually aren't in most circumstances, as far as we can tell?), so they can accrue freely. Also, though, it depends on whether or not you are looking at a eukaryotic gene, and whether you are actually looking at the raw genomic sequence as opposed to cDNA/mRNA, and if you include introns and 5' and 3' UTRs in your definition of the gene. If it is eukaryotic, you are looking at the raw genomic sequence, and you are including untranslated functional elements and introns, then there could be a very high degree of change indeed without ever affecting the aa sequence.

    On the amino acid side, however, the main aspect of conservation is going to be the functional sites -- and it is going to depend on whether this is a "core function" protein, or something where high variation is actually adaptive, like a recognition protein in the innate immune system, where the evolution of multiple/variant binding sites is a way good thing. Otherwise, the amino acids in the functional sites will constrain variation, but as long as the functional faces are presented to the environment appropriately (e.g. not blocked by a novel fold or turn in the secondary structure) the sequence surrounding the functional sites can vary. Nevertheless, because there are non-synonymous mutations which result in an incompatible amino acid changing the folding (or making necessary folding impossible, like a hydrophilic to hydrophobic change in a residue), this too can be constrained by necessity.

    And after that, it just depends on the evolutionary distance you pick.

    Is that ok for a 30-second summary?

    Posted by: Luna_the_cat | December 9, 2007 4:10 PM

    #15

    Sorry, I just realised -- the discussion I included about nucleotide sequence doesn't really apply here, we're not talking so much about any given gene. Blame beer?

    Posted by: Luna_the_cat | December 9, 2007 4:13 PM

    #16

    I'll post my answer Friday.

    Posted by: Sandra Porter | December 10, 2007 9:46 AM

    #17

    It appears that the protein is interacting with the backbone of the nucleic acid. It would seem to me that the nucleic acid sequence is unimportant to the interaction because it all has the same phosphate backbone. This would mean that there is a low probability of finding the same nucleic acid sequence. If this is true then the protein sequence is probably well conserved. If a proteins job is to bind nucleic acid backbone, it would reach its maximum potential in early evolutionary time and undergo purifying selection. As long as the NA backbone does not change, which it hasn't, the protein would not have an evolutionary advantage if it changed. I would say you would have a high probability of finding the AA sequence in different organisms.

    Posted by: skerr | December 13, 2007 2:41 AM

    Post a Comment

    (Email is required for authentication purposes only. Comments are moderated for spam, your comment may not appear immediately. Thanks for waiting.)





    Having problems commenting? (UPDATED)

    Blogs in the Network

    Advertisement

    Top Five: Readers' Picks

    Search All Blogs

    Top Science Stories

    powered by SEED - seedmagazine.com