Artist's illustration using binary numbers

Scientists at NASA's Jet Propulsion Laboratory and the California Institute of Technology announced today that they have developed a computer software system to catalog and analyze the estimated half billion sky objects in the second Palomar Observatory sky survey.

The survey of the northern sky includes more than 3,000 digitized photographic plates produced by Palomar, located in San Diego.

Drs. Usama Fayyad and Richard Doyle of JPL said the system, called Sky Image Cataloging and Analysis Tool (SKICAT), will be delivered to Caltech this month. SKICAT is based on state-of-the-art machine learning, high performance database and image processing techniques.

Caltech astronomer Professor S. Djorgovski said each photographic plate is being digitized into 23,040 by 23,040-pixel images at the Space Telescope Science Institute, Baltimore. The resulting data set will not be surpassed in quality or scope for the next decade, he said.

"The sky object classification task is manually forbidding. The plates contain hundreds of millions of sky objects. Humans are unable to visually process the fainter objects in the survey," Djorgovski said.

Fayyad said the core of the new system includes two integrated machine learning mathematical formulas, called algorithms. These algorithms automatically produce decision trees for the computer based on astronomer-provided training data or examples. A machine learning program learns to classify new data based on training data provided by human experts.

Caltech astronomer Nick Weir and Fayyad said SKICAT has a correct sky object classification rate of about 94 percent, which exceeds the performance requirement of 90 percent needed for accurate scientific analysis of the data.

By contrast, Fayyad said, the best performance of a commercially available learning algorithm was about 75 percent. By training the learning algorithms to predict classes for faint astronomical objects on the survey plates, the algorithms can learn to classify objects that actually are too faint for humans to recognize.

The training data for faint objects was obtained from a limited set of charge coupled device images taken at a much higher resolution than the survey images, Weir said.

The SKICAT system will produce a comprehensive survey catalog database containing about one-half billion entries by automatically processing about three terabytes (24 trillion bits, 8-bits to a byte) of image data.

Since SKICAT can classify sky objects that are too faint for humans to recognize, the SKICAT catalog will contain a wealth of new information not obtainable using traditional cataloging methods, Weir said. Because sky objects up to one visual magnitude fainter now can be processed, the number of classified catalog entries will be approximately three times larger than has been possible so far with other techniques.

"Some historical sky object classification tasks performed over a period of years could now be achieved in a few hours," Weir said.

One major benefit of this program includes freeing astronomers from the tedium of an intensely visual and manual task so they may pursue more challenging analysis and interpretation problems, according to Djorgovski.

"This is an excellent example of the use of machine learning technology to automate an otherwise infeasible task of dealing with an amount of data that is simply overwhelming to humans," Fayyad said. "SKICAT represents a new generation of intelligent trainable tools for dealing with the huge volumes of scientific image data that today's instruments collect."

"We view SKICAT as a step towards the development of the next generation of tools for the astronomer of the turn of the century and beyond," Djorgovski said.

News Media Contact