I've always considered myself to be computer-savvy. After all, my Dad works for a major semiconductor manufacturer, I hung out deeply with MS-DOS when I was six, taught myself HTML in high school, and -- I promise you -- I have been on Myspace.com for much longer than you have. I've always scoffed at the kind of people who type with just their pointer fingers, have trouble installing software, and refer to that endless database of ours as the "internets." And, yet, when a friend of mine earnestly asked me, just the other night, "so, where is the internet, exactly?" I found myself stammering into my glass of wine, clueless.
How quickly we fall.
Of course, I did some research. The internet is huge, apparently!
It is physically made up of millions of computers around the world, sending information back and forth in packets. It was born recently, in the early 1970s, as a U.S. Defense Department network called ARPAnet, which was used primarily for military research -- whatever that means, right? This was an interconnected group of computers. What we call the World Wide Web -- an interconnected and hyperlinked group of documents, a different thing entirely from the Internet -- came later, around 1993, with the first graphical "web browsers." These hit extraordinary popularity almost instantly, birthed the stupid expression "surfing the web," and introduced a generation of geeks to the endless possibilities offered by arguing about RPG games in chat rooms.
Perhaps this isn't news to you. Nor, perhaps, does it come as a surprise to you that approximately 1.5 million pages are added to the Web every day. Maybe you're so computer-savvy, even, that you're aware that the most comprehensive of search engines barely give you access to even half of the Web's indexed material. But did you know that there is a distinction between the "Surface Web," which is the Internet we all know and love -- the Internet of irreverent Google Image searches, blogging, animated GIF icons, message boards, and eBay -- and the "Deep Web," the sheer information databases upon which the whole system delicately sits? The Deep Web is at least 450 times larger than the Surface Web, and most people never access it.
Countless studies have attempted to wrangle with the sheer size of the Internet and tabulate once and for all just how much information is or could be stored on this contraption. In 1998, the storage capacity of the Internet exceeded all of the world's known information for the first time, and it has only exponentially chugged along ever since. A 2003 study by the School of Information Management and Systems at UC Berkeley estimated that the "World Wide Web" contains about 170 terabytes of information its surface alone; this is to say, it is about seventeen times the size of the Library of Congress' print collections. As for the Deep Web...91,850 terabytes. Trip on that!
Why am I throwing these numbers around? Because it is resolutely staggering to think that humankind has haphazardly created, in the last 30-odd years, an entity as amorphous, lawless, and endlessly vast as this Internet of ours. This isn't the work of a few people; it is the product of the overzealous millions of the world and represents, conceptually, the radical democratization of both technology and information. More importantly, however, it is a sprawling and essentially physical thing which no man will ever be able to measure, much less control.
The Internet is both exceedingly chaotic -- on the surface -- and reassuringly stable, in its sheer mass and presence. It grows massive amounts daily. No single computer lies at its center, nor does any part of the network have privilege over any other part. It is vital to the functioning of our daily world. It is fundamentally immeasurable. It could contain all of the information in the world, although it could just as equally contain nothing but nonsense. It has infinite unseen parts lying below the surface. There is only one other entity, to my knowledge, which shares all of these properties: the whole damn Universe itself.
Placing how much I am about to sound like an evangelical cyber-punk aside, isn't it strange that after all of our developments in the sciences and in technology -- which ostensibly have as a final goal the cataloguing and understanding of the Universe -- humans have only managed to recreate the mess they started with, in the form of an equally impenetrable web of energy and information? Assuming that this analogy stands, if the world and its workings as we know them are like the "Surface Web" we interact with on a daily basis, then what vastness -- what uncharted knowledge -- does our Universe's "Deep Web" contain?
Part Two of "Computers Are Interesting" may or may not address the following questions: Are hackers, thus, the true explorers of our modern times? If the internet IS the Universe, then where are its black holes? Have you read Necromancer?
- Log in to post comments
The Internet. You are right, there is a lot of complexity, so visual models are useful. One is the Internet as a collection of the information stored on each node - computer. Another is as a collection of nodes. Maybe the Internet is a connection of links. The link view, or at least one very simplified version, may be seen at http://www.caida.org/home/. But wait. In the Internet each computer can theoretically reach any other computer by traversing a combination of links. So maybe the better model is a cloud - each computer is on the edge of the cloud and can reach any other computer on the other edge, but what happens inside is a mystery.
As for hackers, there are two uses for the word. Good hackers/ white hat hackers and bad hackers - of course - black hats. The Internet is improving in robustness even as it increases in size and complexity. But the fact is and will continue to be for some time that a bright person can seriously disrupt it. Luckily bright people are generally not motivated to do so. The point is that glorifying black hats may be great for drama, but no good comes of it.
As for black holes, there are plenty, but different flavors. Mail black holes are set up to keep spam from escaping all the time - mail can go in, but not out. Black holes for all services can occur by accident or by design to wall off black hat hacker attacks. Large portions of the military networks are black holes. So called "dark nets" exist for trading copyrighted material, sounds positively pirate to me. Even your firewall enforces an intentional black hole. The people who do this for a job hang out on the mailing lists at nanog.org
I'm not sure of the distinction you are trying to make about the "deep internet". Google et al index publicly accessible web pages on publicly accessible computers that are not marked "do not index". As for whether the data on non publicly accessible computers, like those holding my health records, or the vast computing resources of the NSA, storing every telephone conversation which has ever occurred constitute "deep" knowledge, I have my doubts.
The Internet messy? Well that's democracy...
The final interview question for my current job was, "What is the internet?" Once people got done stumbling through some explanation that attempted to communicate their view of it the interviewer (one of four, I believe) admitted that their explanation was pretty good, but another answer that would have been accepted was, "A cloud. A soft, fluffy cloud."
more like "coolputer"
I guess it's true that the deep web might really not be as interesting as it sounds, but it is still a fascinating relic/utility of the current era--the information available on the 'searchable' internet signifies something different to me than what the 'deep web' means--myspace and porno and harry potter fan fiction and other interesting social constructions. But the sheer volume of the deep web is also pretty profound.
Calling it the 'deep web' makes you think of mysteries of the deep sea or something--or the sci-fi fantasy that somewhere within the dark recesses of the internet there are mystical things happening--but you don't really need to imagine it that way in order to be blown away by it.
I'm thinking mainly of things like lexisnexus and other document databases, which is exist for basically every kind of media imaginable, and the comparison to the library of congress in this sense is completely apt. There's just a LOT of information out there. The website of the company that coined the term deep web is worth looking at (www.brightplanet.com). They have a number of 'products' for retrieving information from the deep web in a way that search engines can't. One of them is called "Deep Federation Portal" and if that doesn't excite you then I don't know what will.
There's no homicidal but godlike AI's lurking in the deep web unbeknownst to all, and that's a bummer for would-be netmystics. But it IS a collection of information so large that there no central index system, and methods of retrieving specific data from it are experimental and largely unheard of. What does this say about the culture that created it? In comparison to say, carolus linnaeus (read my email claire!) or uh, mr. dewey decimal? This is the 'information age,' not necessarily the age of ordered information. It seems like for a long time people tried to use science, mathematics, technology, art (our creations) to carve out a tiny part of the world that was ordered, refined, and turned into the "middle" as a refuge from the apparent barbarism of the periphery. Now we've just got clouds--the man-made world is quantifiably as lawless and uncontrollable as the 'rest.' But--we HAVE documented so many facts and kept such extensive records! It's both humbling and amazing.
I also feel uneasy about the white hat/black hat issue. Where are the black hats coming from? They're not just snarling malformed idiots with bushy mustaches and cigars, trying to kidnap all the pretty ladies and tie them to railroad tracks. The analogy doesn't make any sense to me.
I just typed all of that bullshit and then realized I was basically just paraphrasing what you wrote in the first place. But my point is this: It's called the 'deep web' not because the specific information it contains is profound or mysterious, but because it is, well, deep. As in it's a long way to the bottom. Nobody knows whats in there because there's a fucking lot of it, not because it's alien and mysterious. It's an interesting footnote to the age old question of, "Could God make a hotdog so big that even He could not eat the whole thing?" I don't know if God can, but now we know that WE can, and that's pretty awesome.
Whatever, I wouldn't call that a paraphrase in any capacity. What really gets me these days is that despite the fact that the internet contains all the information in the world, it is essentially dumb. Information isn't knowledge, you know? Which makes me feel better about the whole thing. We'll talk about this later definitely.
"Hyperlinks subvert hierarchy."
-Peter Morville
Does the universe, too? The ways we think about the universe might depend on the cataloguing mediums and methods we have at our disposal. Or maybe it's the other way around.
claire,
first things first, your blog is blowing my mind.
second, a couple of weeks ago i read this LRB article on google and google's specific internet is explained as a bunch of clunky, trashed computers hacked together to be searchable.
as a result, this is how my mind has been visualizing the internet recently.