Development and Role of the Human Reference Sequence in Personal Genomics

By finchtalk on July 3, 2014.

A few weeks back, we published a review about the development and role of the human reference genome. A key point of the reference genome is that it is not a single sequence. Instead it is an assembly of consensus sequences that are designed to deal with variation in the human population and uncertainty in the data. The reference is a map and like a geographical maps evolves though increased understanding over time.

From the Wiley On Line site:

Abstract

Genome maps, like geographical maps, need to be interpreted carefully. Although maps are essential to exploration and navigation they cannot be completely accurate. Humans have been mapping the world for several millennia, but genomes have been mapped and explored for just a single century with the greatest advancements in making a sequence reference map of the human genome possible in the past 30 years. After the deoxyribonucleic acid (DNA) sequence of the human genome was completed in 2003, the reference sequence underwent several improvements and today provides the underlying comparative resource for a multitude genetic assays and biochemical measurements. However, the ability to simplify genetic analysis through a single comprehensive map remains an elusive goal.

Key Concepts:

Maps are incomplete and contain errors.
DNA sequence data are interpreted through biochemical experiments or comparisons to other DNA sequences.
A reference genome sequence is a map that provides the essential coordinate system for annotating the functional regions of the genome and comparing differences between individuals' genomes.
The reference genome sequence is always product of understanding at a set point in time and continues to evolve.
DNA sequences evolve through duplication and mutation and, as a result, contain many repeated sequences of different sizes, which complicates data analysis.
DNA sequence variation happens on large and small scales with respect to the lengths of the DNA differences to include single base changes, insertions, deletions, duplications and rearrangements.
DNA sequences within the human population undergo continual change and vary highly between individuals.
The current reference genome sequence is a collection of sequences, an assembly, that include sequences assembled into chromosomes, sequences that are part of structurally complex regions that cannot be assembled, patches (fixes) that cannot be included in the primary sequence, and high variability sequences that are organised into alternate loci.
Genetic analysis is error prone and the data require validation because the methods for collecting DNA sequences create artifacts and the reference sequence used for comparative analyses is incomplete.

More like this

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

What is Biotech?

September 29, 2017

The biotechnology (biotech) industry is incredibly diverse. Recently, I wrote about the size of the biotech industry, which is, of course, related to how biotechnology is defined. As a strict definition, biotechnology is the use of biology to turn raw materials into useful products. However,…

How Big is Biotech?

August 16, 2017

A simple web search says biotech is really big. One estimate indicates that the industry will have $400 billion in sales in 2017 with growth to over $775 billion by 2024 [1]. Another report suggests there are over 77,000 employers [2]. That’s big, but is it real, and what you can do with this…

BioDatabases 2017 - What's out there?

January 12, 2017

It's time for the annual blog about the annual Nucleic Acids Research (NAR) database issue. This is the 24th database issue for NAR and the seventh blog for @finchtalk. Like most years I have no idea what I'm going to write about until I start reading the new issue. Something always inspires me.…

Teach Biology? We want to learn about your use of computers in the classroom

April 13, 2016

Computers, biological data (molecular sequences, structures, and other data), websites, and databases are integral to modern research. Innovations like precision, or personalized medicine, expect a certain level of patient participation, and our future food and environmental sustainability…

Bio Databases 2016

February 16, 2016

Someone missed the memo. Over the past year, news and presentations by NIH leaders like Philip Bourne have communicated that the proliferation biologically focused databases is unsustainable. However, unlike last year, where the number of databases tracked by Nucleic Acids Research (NAR)…