The NASA Earth Observing System is an incredible resource for both science and education. One of the amazing things about it is all the different kinds and quantities of data are assembled together into pictures that even grade school kids can immediately comprehend.
How do they do it?
Each of the EOS satellites delivers a terabyte or more of data per day from many different instruments.
How do they take satellite imagery, rainfall statistics, temperature information, and other kinds of data and assemble these data into meaningful pictures?
The answer is HDF (hierarchical data format).
HDF is designed to handle large amounts of data. It was developed at the National Center for Supercomputing Applications (NCSA) (NCSA brought us Mosaic, the first web browser.)
To quote EOS:
NCSA developed HDF to assist users in the transfer and manipulation of scientific data across diverse operating systems and computer platforms, using FORTRAN and C calling interfaces and utilities. HDF supports a variety of data types: n-Dimensional scientific data arrays, tables, text annotations, several types of raster images and their associated color palettes, and metadata. The HDF library contains interfaces for storing and retrieving these data types in either compressed or uncompressed formats.
For each data object in an HDF file, predefined tags identify the type, amount, and dimensions of the data; and the file location of various objects. The self-describing capability of HDF files helps users to fully understand the file’s structure and contents from the information stored in the file itself. A program interprets and identifies tag types in an HDF file and processes the corresponding data. A single HDF file can also accommodate different data types, such as symbolic, numerical, and graphical data; however, raster images and multidimensional arrays are often not geolocated. Because many earth science data structures need to be geolocated, NASA developed the HDF-EOS format with additional conventions and data types for HDF files.
What does HDF have to do with bioinformatics?
That’s in the next post.