genomics

Both Carl Zimmer and Larry Moran have posts on the gene content in the human genome. Carl points out that the estimate of the total number of genes in the human genome is decreasing, but we still don't know what a whole bunch of those genes do (according to the one database he searched). Larry's post deals with what he considers a misconception regarding the historical estimates of gene content in humans. He argues that, while the estimates of the number of protein coding genes have decreased over the years, they haven't really decreased as much as some people seem to think (from about 40,000…
In the comments of my dinosaur genome size post, Shelley asked: So do ALL birds have equally small genomes or is there variation among species? I don't think she was looking for a trite response along the lines of: "Of course there's variation among species." What she was asking, I presume, is how much variation in genome size do we see in birds? As you can see in this phylogeny, all birds (and nearly all theropods) have small genomes. But that tree only presents data from a few species. To get a better idea of genome size variation within birds, I downloaded C-values (amount of DNA in a…
Genome size can be measured in a variety of ways. Classically, the haploid content of a genome was measured in picograms and represented as the C-value. People began to realize that the C-value was not correlated with any measures of organismal complexity and seemed to vary unpredictably between taxa. This was known as the C-value paradox, and it confused geneticists for quite a while. With an increased understanding of genome structure, however, came the resolution of the paradox: this measure of genome size does not correlate with gene content. The majority of many eukaryotic genomes…
If you happen to be a yeast population geneticist, then you probably already know about the Saccharomyces Genome Resequencing Project. They have resequenced 32 strains of S. cerevisiae and 27 strains of S. paradoxus at between 1x and 3x coverage. The nice thing about resequencing is that SNP discovery and genotyping occur simultaneously. That's in contrast to the human HapMap project which identified polymorphism in a few individuals, then searched for those polymorphisms in a larger panel. That means you can do some good population genetic analyses on the yeast data, which are not possible…
It seems like only yesterday (okay, less than two years ago) that I learned about 454 sequencing. It's the new technology that many folks think will replace dye termination Sanger sequencing using capillary arrays (the method used to sequence the human genome and many other genomes). A new technology is coming on the scene which may make 454 obsolete before it ever gets a foot-hold in the market (making it the laser disk of DNA sequencing). 454 sequencing works by copying small stretches of DNA sequence that have been attached to tiny beads. As each nucleotide is added to the growing sequence…
This post is part of a series documenting Professor Steve Steve's recent visit to Philadelphia for the Drosophila Research Conference (aka, the Fly Meeting). In the previous two installments of Steve Steve in Philly, we finally managed to meet up despite the best efforts of the staff at the Marriott to prevent our rendezvous, and we got Steve Steve up to date on the newest developments in fly pushing and Drosophila genetics. It had been quite a tiring day, so we ventured down to the hotel bar for a few drinks. Some of us were ready to hit the sack, but Steve Steve would have none if it; he…
Some bio-bloggers are atwitter over an article by Wojciech Makalowski on Scientific American's website about Junk DNA. I'm a little late to the game because, well, I've been really busy looking at sequences to determine if they are junk DNA. Is it irony? Is it coincidence? Who cares? It's an opportunity to discuss semantics, and I love semantics. Those of you who have hung around here for a while know this topic often comes up at evolgen (remember this, this, and this . . . hell, here's what a search for Junk DNA turns up). Long story short, I can't stand the term junk DNA, but I do agree…
Duplicated genes can arise via various mechanisms -- polyploidization, chromosomal duplication, segmental duplication, and retroposition -- and we can usually distinguish the various mechanisms as each has distinct signatures. For example, retroposed duplicates arise when an RNA transcript is reverse transcribed back into DNA and re-inserted into the genome. This is how many transposable elements (TEs) and viruses propagate throughout genomes, but the reverse transcriptase encoded by TE and viral genomes can be used on endogenous transcripts as well. Because they arise via the reverse…
Shotgun sequencing refers to the process whereby a genome is sequenced and assembled with no prior information regarding the genomic location of any of the DNA we sequence. There are quite a few steps that you have to go through before you have an assembled genome sequence. We're going to cover isolating DNA, putting the DNA in bacteria, sequencing the DNA, and assembling those sequencing into a complete genome. Sandy has been running a series on sequencing genomes (parts 1, 2, 3, 4, 5, 6, 7). You should go check it out even if you read this post; while I'm going to deal with some of the…
I've been chatting up Wilkins about the role of natural selection in speciation (and when I say "speciation" I mean "reproductive isolation"). Wilkins listed a few cases where speciation would occur independently of natural selection. Amongst the mechanisms in Wilkins's list was speciation via karyotypic changes (polyploidy, inversions, fusions or fissions). I cried shenanigans, and this is why. The karyotype refers to the organization of an organism's genome -- chromosome number, fusions/fissions of chromosomes, and gene order within chromosomes. One way to change the karyotype is to…
Shotgun sequencing. Sounds like fun. Speculations on the origin of the phrase I think that this term came from shotgun cloning. In the early days of gene cloning before cDNA, PCR, or electroporation; molecular biologists would break genomic DNA up into lots of smaller pieces, package DNA in lambda phage, transduce E. coli, and hope for the best. Consistent with the shotgun metaphor, we even used to store our microfuge tubes in plastic bullet boxes that my boss found at the sporting goods store. (Apparently this practice was unique to Minnesota, though. When I moved out west for graduate…
Considering that several genomes that have been sequenced in the past decade, it seems amazing in retrospect, that the first complete bacterial genome sequence was only published 12 years ago (1). Now, the Genome database at the NCBI lists 450 complete microbial genomes (procaryotes and archea), 1476 genomes from eucaryotes, 2145 viruses, and genome sequences from 407 phage. Much of the methodology used for sequencing DNA is designed to confront one big technical hurdle. That is, we can only determine the sequence of small pieces of DNA at a time. This means that you must break a larger…
About a week ago, I offered to answer questions about subjects that I've either worked with, studied or taught. I haven't had many questions yet, but I can certainly answer the ones I've had so far. Today, I'll answer the first question: How do you sequence a genome? Before we get into the technical details, there are some other genomic questions that you might like answered. How much does it cost to sequence a genome? I remember in 2002, when we were at the O'Reilly bioinformatics conference and we heard Lee Hood challenge the DNA sequencing community to lower the costs of genomic…
Nature Genetics is asking: What would you do if it became possible to sequence the equivalent of a full human genome for only $1,000? George Church would repeat the Applera dataset for everyone on earth, sequencing every exon from every human being. Francis Collins would sequence people with diseases and old people. Stephen O'Brien would sequence the genomes of all 38 extant species of cats (big surprise) to study the evolution of that taxon and generate SNP markers. O'Brien would also sequence the genomes of the 100 most endangered mammals and every species of primate. Evan Eichler would…
Gregg Easterbrook -- good sportswriter, crappy at pretty much everything else he does -- likes to take pot-shots at scientific research in his ESPN column "Tuesday Morning Quarterback" (TMQ). In this week's edition he tells us how he doesn't think scientific papers should have multiple authors and how he doesn't like the advertisements in the journal Science. TMQ dislikes the modern convention of listing multiple people as "authors" of a work written by a single person; this is part of the overall cheapening of the written word. Several previous items have concerned the absurd number of…
To the uninitiated, chromosome number may appear to reflect genome size -- more chromosomes would mean a larger genome. This is not necessarily the case if we measure genome size by the number of base pairs in a genome. There are two primary ways to change chromosome number: chromosomal duplications and chromosomal fusions/fissions. Chromosomal duplications (either through polyploidization or aneuploidy) do change the size of the genome, either by creating a second copy of a single chromosome or duplicating an entire genome. Fusions and fissions, on the other hand, merely rearrange the…
New Scientist reports on research to identify DNA sequences that cannot be found in any nucleotide database. These sequences are short -- so as to decrease the probability that they are missing due to chance alone -- and the researchers from the Boise State University have identified over 60,000 15 nucleotide stretches of DNA that are not present in any known sequenced region from all species. They also found 746 sequences of 5 amino acids that are not present in any known polypeptide. The article does not indicate whether the scientists utilized any hook and ladder or Statue of Liberty…
Pim van Meurs has a blog post at The Panda's Thumb about the recent paper on translational selection on a synonymous polymorphic site in a eukaryotic gene (DOI link). He points out that this was predicted in a paper from 1987. In short, the rate of translation depends on the tRNA pool -- amino acids encoded by more abundant tRNA anti-codons will be incorporated more quickly than amino acids with rare tRNAs. Because protein folding begins during translation, codon usage can influence protein secondary structure. That's because rare codons could stall translation, allowing for protein…
The rift in the biological sciences may lie between computational biologists and wet labs, but when we look at individual fields, we see other divisions. In an essay in PLoS Computational Biology Carl Zimmer describes the divide amongst evolutionary biologists. On one side are researchers who like to get their hands dirty -- ecologists, paleontologists, and others that fall under the label 'naturalist'. And on the other side we have the people that prefer to work with molecular tools; Zimmer calls these guys computational biologists, but they also generate their own data, so that label isn't…
Bacteria can cause other epidemics, why not obesity? Is there a relationship between our body weight and our bacterial inhabitants? Two reports in Nature (1, 2) suggest that bacterial populations differ between people who are obese and people who not, and that the bacterial inhabitants of their guts, may be partly to blame. In one study, the authors studied the bacterial populations of their volunteers' intestines by compiling a data set of 18,348 DNA sequences for bacterial 16S ribosomal RNA by sampling feces. Wow! That's a lot of ... well, I won't say it, but you know what I mean. In…