BISSEX: Sucking Up the Web

Brewster Kahle's Internet Archive could make the Web live forever As Brewster Kahle recounts his 1996 visit to the digital storage facility at the National Archive in Washington, DC, his eyes become even wider than usual. "Rows, and rows, and rows," he says, gesturing, "of empty shelves!" The staff of the National Archive may still be catching up to the digital revolution, but when they make it, Kahle will have a present for them. He wants to give them the whole Internet. On tape.Kahle officially launched the Internet Archive, the non-profit corporation that will be the collector and caretaker of all this data, in June of 1996. At the time, his goal seemed quixotic. To the skeptics who thought sucking up the entire Web impossible, Kahle gave the best answer: he simply went and did it. "We're the big suck," he says cheerfully of the Archive, whose small staff works out of offices in San Francisco's historic Presidio. While the data collection accomplished so far is impressive, Kahle thinks that the most interesting parts of his project are yet to come.If the doubters had looked at Kahle's resume, they might have had more confidence. In the 1980s Kahle worked for supercomputing innovator Thinking Machines. Then he created a searching technology company later sold to America Online for $15 million. The profit from that sale funded the startup of the Archive and its for-profit sibling, Alexa.The name Alexa is an oblique reference to the Library at Alexandria, the most famous library of the ancient world and a conscious touchstone for Kahle. "The Library of Alexandria, as far as I know, was the last time a global effort was made" to collect data, Kahle says.The current size of the Web is estimated to be over two terabytes (that's two million million characters, or about a three-mile-high stack of regular computer disks). As of this writing the Archive contains about ten terabytes of data, or the equivalent of five complete copies of the web.The collection of all this data presents numerous problems relating to accuracy, copyright, and privacy.Phil Agre, a professor of Communications at the University of California at San Diego and a noted privacy expert, admits that storing the entire Web in one place is, from a technical perspective, "truly wondrous." But he believes that it raises troubling privacy questions.People come to new media like the Net with varying expectations about privacy. Some are eager to have the collective ear of the wired planet. Others may have no idea that their post to will be archived for the ages."In a situation where people don't know what the rules are, the ethical thing to do is to assume the strongest rules," Agre says. That is, don't archive until you have a mechanism in place whereby everyone is informed of their privacy options before they post, not at some unspecified point afterward.It's a tall ethical, and technical, order. Agre believes that it is possible to build such an infrastructure, an object-based Net in which every parcel of information carries with it explicit guidelines for use, but acknowledges that it is a long way off.Kahle hears and understands the objections, but they won't slow him down. Part of the excitement of the project seems to be that he doesn't know what it will take to manage such huge quantities of data in the long term.This is the second, and perhaps higher, purpose of the archive. It's an incentive to develop better tools. With the whole world's data in his seaside warehouse, Kahle and his crew of engineers will be forced to invent ways for us to deal with archives of this scope in the next millennium.Meanwhile, the Archive is filling up. Perhaps some time in the next millennium an Internet anthropologist will follow your trail of bits to try to learn more about life on the electronic frontier.Who knows? By his own admission, not Brewster Kahle. He hopes that the Archive will be of use in the future. And for now? "We're having a blast."Sites in my SightsThe Internet ArchiveÕs public web site is a bit frozen itself, but it holds some interesting articles and background material ( The Alexa service, as I mentioned last week, is now available for free download ( Phil AgreÕs Red Rock Eater news service has an online archive ( that includes subscription information.


Understand the importance of honest news ?

So do we.

The past year has been the most arduous of our lives. The Covid-19 pandemic continues to be catastrophic not only to our health - mental and physical - but also to the stability of millions of people. For all of us independent news organizations, it’s no exception.

We’ve covered everything thrown at us this past year and will continue to do so with your support. We’ve always understood the importance of calling out corruption, regardless of political affiliation.

We need your support in this difficult time. Every reader contribution, no matter the amount, makes a difference in allowing our newsroom to bring you the stories that matter, at a time when being informed is more important than ever. Invest with us.

Make a one-time contribution to Alternet All Access, or click here to become a subscriber. Thank you.

Click to donate by check.

DonateDonate by credit card
Donate by Paypal
{{ }}

Don't Sit on the Sidelines of History. Join Alternet All Access and Go Ad-Free. Support Honest Journalism.