For Milton Halem, assistant director for information science and CIO of NASA's Goddard Space Flight Center (GSFC) in Greenbelt, Md., all that data raining down creates two problems.
First, there's the issue of storing and moving huge amounts of information-a common problem, but on an uncommon scale and with some restrictions imposed by the nature of GSFC's work.Second, there are archiving issues.Modeling the weather, for example, requires many observations, and for widely spaced phenomena like El Niño, just 20 observations take roughly a century.How do you preserve digital data reliably for 100 years when the storage media themselves haven't been proven to last that long?Halem
could arguably be called an IT pioneer.He earned a Ph.D. in applied mathematics at New York University in 1968, and cut his teeth on the third Univac built.In the late sixties and early seventies, he worked at the Goddard Institute for Space Studies at Columbia.
In 1977, GSFC decided to install a new supercomputer and, nervous about student unrest, moved it to Goddard's campus outside Washington, D.C.
GSFC already has the largest active storage capacity in the world, but even that will soon fall short of the agency's needs as additional EOS satellites-carrying increasingly accurate instruments that deliver correspondingly greater amounts of data-are launched.Conventional ways of dealing with the explosion of data generally aren't an option.Lousy compression techniques might destroy valuable information and administrative measures such as periodic purging, and defeat long-term projects.Even efforts to clean up user files aren't likely to help much, according to Halem
, since they're just a drop in the bucket when compared to the amount of new data.
Framed in the abstract, GSFC's ability to collect data roughly follows Moore's Law, and doubles every 18 months.Increases in storage density, Halem
says, follow more or less the same timetable.The speed at which tapes can be read, however, has only tripled in the last decade.A collateral problem, incidentally, is that newer, faster controllers aren't always backward compatible with older data.Halem
reckons that unless something happens to increase data transfer rates, within 10 years the amount of time it takes to back up GSFC's data will exceed the life of the media onto which it's being transferred.Halem
is looking at storage area networks (SANs) to help solve the data volume issue.Goddard's campus covers several square miles and includes five or six buildings.The buildings are already linked by fiber channels, and GSFC's
IT organization is taking advantage of that to install a pilot SAN.The test is using off-the-shelf equipment from the major storage vendors."We will have all of our systems, our disk storage and some of our tape storage systems, capable of interfacing to devices that can share storage through fiber channels," he
The U.S. Geological Survey (USGS) is also involved in the SAN project.
"Once we get our part of it done, we're going to work with USGS
to begin to see if we can do a backup storage system between their site and our site here, as well as a third site in West Virginia," Halem
Apart from helping to solve its storage problems, the SAN could help speed the preparatory work Halem's group must do before a new satellite is launched.Generally, GSFC
prepares its systems to handle the new flow of data about six months before the scheduled launch date.If the launch is delayed, the IT organization
could be saddled with lower-capacity storage systems than if it had waited-Moore's
"If we could share this storage in the development of our systems, it would enable us not to commit so early," Halem
says."SANs give us that capability.We could do the development work, share some of the extra capacity of the system and not make the commitment up front to acquire the necessary storage."
To solve the archiving problem, Halem
considered optical storage, but rejected it, at least for the time being."Optical storage has considerably longer shelf life-although there are still a lot of questions about that-but the access time is even slower, and it requires much more advanced technology-lasers and things like that.Over decades, those kinds of technology don't migrate as well," he
is looking at a three-stage storage model, with low-cost disk storage-something like RAID arrays-sitting between his
high-speed disks and his
tape systems.The disks will be backed up onto tape cartridges at a remote location over fiber lines.Currently, Halem notes, GSFC
uses 50Mb cartridges deployed in a dozen, 5,000-tape silos.
New controllers will let him roughly quadruple storage density on the same cartridges, and bring total capacity up to around 5 petabytes (1,024 terabytes)."I saw an estimate recently that the total storage capacity worldwide-everything-is just a little over 500 petabytes," Halem
says."If we go to 5 petabytes, we'll have 1 percent of the world's storage in just our building."