The NASA Earth Observing System and dealing with all that data

The NASA Earth Observing System is an incredible resource for both science and education. One of the amazing things about it is all the different kinds and quantities of data are assembled together into pictures that even grade school kids can immediately comprehend.

How do they do it?

Each of the EOS satellites delivers a terabyte or more of data per day from many different instruments.

How do they take satellite imagery, rainfall statistics, temperature information, and other kinds of data and assemble these data into meaningful pictures?

The answer is HDF (hierarchical data format).

HDF is designed to handle large amounts of data. It was developed at the National Center for Supercomputing Applications (NCSA) (NCSA brought us Mosaic, the first web browser.)

To quote EOS:

NCSA developed HDF to assist users in the transfer and manipulation of scientific data across diverse operating systems and computer platforms, using FORTRAN and C calling interfaces and utilities. HDF supports a variety of data types: n-Dimensional scientific data arrays, tables, text annotations, several types of raster images and their associated color palettes, and metadata. The HDF library contains interfaces for storing and retrieving these data types in either compressed or uncompressed formats.

For each data object in an HDF file, predefined tags identify the type, amount, and dimensions of the data; and the file location of various objects. The self-describing capability of HDF files helps users to fully understand the file's structure and contents from the information stored in the file itself. A program interprets and identifies tag types in an HDF file and processes the corresponding data. A single HDF file can also accommodate different data types, such as symbolic, numerical, and graphical data; however, raster images and multidimensional arrays are often not geolocated. Because many earth science data structures need to be geolocated, NASA developed the HDF-EOS format with additional conventions and data types for HDF files.

What does HDF have to do with bioinformatics?

That's in the next post.

Categories

More like this

I often get questions about bioinformatics, bioinformatics jobs and career paths. Most of the questions reflect a general sense of confusion between creating bioinformatics resources and using them. Bioinformatics is unique in this sense. No one confuses writing a package like Photoshop with…
This is interesting, from a National Geographic press release: TORONTO, Feb. 12, 2009 - A York University researcher has tracked the migration of songbirds by outfitting them with tiny geolocator backpacks - a world first - revealing that scientists have underestimated their flight performance…
A good way to make Microsoft Office software not interact with other software, thus forcing users to either shun the alternatives or to use only the alternatives would be to start blocking the use of all older file formats. Let's hope Microsoft does not think of this nasty little trick. "In…
Many people, first confronted with the idea of data curation, think it's a storage problem. A commonly-expressed notion is "give them enough disk and they'll be fine." Terabyte drives are cheap. Put one on the desk of every researcher, network it, and the problem evaporates, right? Right? Let me…