At NASA's Jet Propulsion Laboratory in Pasadena, Calif., mission planners and software engineers are coming up with new strategies for managing the ever-increasing flow of such large and complex data streams, referred to in the information technology community as "big data."
How big is big data? For NASA missions, hundreds of terabytes are gathered every hour. Just one terabyte is equivalent to the information printed on 50,000 trees worth of paper.
"Scientists use big data for everything from predicting weather on Earth to monitoring ice caps on Mars to searching for distant galaxies," said Eric De Jong of JPL, principal investigator for NASA's Solar System Visualization project, which converts NASA mission science into visualization products that researchers can use. "We are the keepers of the data, and the users are the astronomers and scientists who need images, mosaics, maps and movies to find patterns and verify theories."
Building Castles of Data
De Jong explains that there are three aspects to wrangling data from space missions: storage, processing and access. The first task, to store or archive the data, is naturally more challenging for larger volumes of data. The Square Kilometer Array (SKA), a planned array of thousands of telescopes in South Africa and Australia, illustrates this problem. Led by the SKA Organization based in England and scheduled to begin construction in 2016, the array will scan the skies for radio waves coming from the earliest galaxies known.
JPL is involved with archiving the array's torrents of images: 700 terabytes of data are expected to rush in every day. That's equivalent to all the data flowing on the Internet every two days. Rather than build more hardware, engineers are busy developing creative software tools to better store the information, such as "cloud computing" techniques and automated programs for extracting data.
"We don't need to reinvent the wheel," said Chris Mattmann, a principal investigator for JPL's big-data initiative. "We can modify open-source computer codes to create faster, cheaper solutions." Software that is shared and free for all to build upon is called open source or open code. JPL has been increasingly bringing open-source software into its fold, creating improved data processing tools for space missions. The JPL tools then go back out into the world for others to use for different applications.
"It's a win-win solution for everybody," said Mattmann.
In Living Color
Archiving isn't the only challenge in working with big data. De Jong and his team develop new ways to visualize the information. Each image from one of the cameras on NASA's Mars Reconnaissance Orbiter, for example, contains 120 megapixels. His team creates movies from data sets like these, in addition to computer graphics and animations that enable scientists and the public to get up close with the Red Planet.
"Data are not just getting bigger but more complex," said De Jong. "We are constantly working on ways to automate the process of creating visualization products, so that scientists and engineers can easily use the data."
Data Served Up to Go
Another big job in the field of big data is making it easy for users to grab what they need from the data archives.
"If you have a giant bookcase of books, you still have to know how to find the book you're looking for," said Steve Groom, manager of NASA's Infrared Processing and Analysis Center at the California Institute of Technology, Pasadena. The center archives data for public use from a number of NASA astronomy missions, including the Spitzer Space Telescope, the Wide-field Infrared Survey Explorer (WISE) and the U.S. portion of the European Space Agency's Planck mission.
Sometimes users want to access all the data at once to look for global patterns, a benefit of big data archives. "Astronomers can also browse all the 'books' in our library simultaneously, something that can't be done on their own computers," said Groom.
"No human can sort through that much data," said Andrea Donnellan of JPL, who is charged with a similarly mountainous task for the NASA-funded QuakeSim project, which brings together massive data sets -- space- and Earth-based -- to study earthquake processes.
QuakeSim's images and plots allow researchers to understand how earthquakes occur and develop long-term preventative strategies. The data sets include GPS data for hundreds of locations in California, where thousands of measurements are taken, resulting in millions of data points. Donnellan and her team develop software tools to help users sift through the flood of data.
Ultimately, the tide of big data will continue to swell, and NASA will develop new strategies to manage the flow. As new tools evolve, so will our ability to make sense of our universe and the world.