In the comments on my DMCA post, a reader asked me to comment on this piece of silliness. I try not to disappoint my readers, so here's my take. It's a pile of silliness with the distinct aroma of astrotur - silliness mixed with a bit of deliberate stupidity in order to obscure things.
The basic idea of it is: how dare we complain about the idea of copyrighting numbers! After all, everything you can do on a computer is ultimately stored in a form that can be interpreted as a great big number! So we're always copyrighting numbers: every book, every article, every poem, every story that's ever been copyrighted is really just a number. So why should we start complaining now unless we're just a bunch or dirty anticorporate hippies who are complaining because we want to stick it to the movie companies?
According to this author, what makes the difference between a reasonable copyrighted numbers and unreasonable copyrighted numbers is just the size of the number. Really big numbers, numbers that we couldn't hope to encounter in real life because they're too long, those should be copyrightable. Basically, if you represent a number using an integer, and that integer is so big that you couldn't possibly count to it, then it should be reasonable to copyright it.
The thing is, that's just bullshit. Bullshit that positively reeks of the same kind of crap that people like Dembski like to pull with their "Universal Probability Bound" and similar garbage. It doesn't matter how probable or improbable something is. A two-word phrase can be copyrightable - and deserves the full protection granted by copyright, even though it's not a particularly improbable value when encoded as a number. A two-hour burst of random noise recorded by an instrument observing the solar wind is vastly more improbable than this article, but it's afforded less protection under copyright law. Copyright has nothing to do with probability: under copyright law, incredibly improbable coincidences can be permissible; and likely coincidences can be punishable. Copyright is based on a matter of intent.
It's just deliberate foolishness to pretend that copyrighting this article is really
copyrighting a number. Yes, any article, any creative work that can be viewed on a computer
is stored as a stream of bits, and can be encoded in numeric form. That doesn't mean that
by asserting my copyright on this article that I'm claiming to own a number. I claim to own
these words, and their meaning as an article. I'm not claiming the rights
to certain numbers - this article rendered as ASCII text with macintosh line endings, this
article rendered as UTF-8 with unix line endings, this article rendered as UTF-8 and then
gzip compressed. I'm claiming the rights to the article that I produced. If you
happen to create a bitmap which has a segment that's identical to some segment of this
document after being encoding in EBCDIC and then gzipped, I'm have no rights to that. It
doesn't matter how unlikely it is for that to happen by chance. I don't have the
rights to something that you produced independently.
The fundamental issue about copyrights has nothing to do with how probable or improbable a given text is in some arbitrary numeric encoding. A 30 character haiku has no less value in terms of copyright than a 30 megabyte sound file or a 3 gigabyte video file - even though the Haiku is infinitely more probable as a result of a random process. That doesn't make the Haiku less valuable, or less copyrightable. The point of copyright is to allow people to protect their work - and accidental collisions have never been criminal, no matter how improbable.
Look at the following Haiku, which I just wrote:
Musty air, smog, crowds
Subway, rushing traffic chaos, noise
But I do love my New York
That's 60 characters, and 10 or 11 words, depending on how you count it. With a suitable dictionary, a bunch of computers could generate every possible combination of 11 words that fit the Haiku structure in a not entirely unreasonable amount of time. (Back of the envelope sketch, assuming a dictionary of 200,000 words, categorized by part of speech, a simple grammar so try to make sure that you only generate potentially syntactically valid phrases, comes out to several quintillion possible haiku (10^15); fling a good-sized bunch of computers at it, and you can generate every possible one in less than a year.)
On the other hand, the noise recorded by dropping a digital tape recorder in a subway stop for an hour, you could never reproduce, not by running every computer in the world for the entire lifetime of the universe.
Does copyright law say that my Haiku or the tape of subway noise is more valuable?
Does it even differentiate between them? (Answer: unclear. Under some circumstances, the
subway noise would be copyrightable, in which case it would be treated as equal in
value to the Haiku under the law; under other circumstances, the subway noise would be
considered non-creative public domain, in which case the Haiku is more valuable under the
law. In no circumstance is subway noise more valuable than a copyrighted
poem. And in no circumstance does a calculation of the relative probability of generating
something randomly have any impact on copyright law. On the other hand, from a purely aesthetic viewpoint, the subway noise is better than my poetry.)
What's even worse than the shoddy probability argument, is that the argument is
deliberately obfuscating the real issue. Even if you accept "Oh yeah, you're copyrighting
numbers" as a legitimate argument, it's irrelevant to the issues around the HD-DVD key
nonsense. No one is asserting copyright over the HD-DVD encryption key. The DMCA
does not grant them any new right to copyright an encryption key - they always had the
right to copyright it as an encryption key: but as such, it would be subject to
constraints like fair use. Instead, what the DMCA has done is create something new. They
don't need to assert that they have a copyright on the number, and thus have rights over
its copying and distribution. In fact, they are not asserting that they have a
copyright on that number. What the DMCA does is say by virtue of the fact that they
used that number to encrypt some copyrighted work for the purpose of copy
protection, that they have a greater right to control the use of that number than
they would if they merely had a copyright.
Let me repeat that, because it's a critical point. They are not asserting that they have a copyright on the HD-DVD key. They are asserting far greater rights than what is granted by copyright. Under the DMCA, by virtue of its status as a copyright
protection circumvention device, they have far more right to sue over its copying and distribution than you or I have to sue over copyright infringement of our creative works. They've created a new category of intellectual property - not copyright, not patent, not trademark. And this new category gives them an obscene degree of control over the use of that property - which is just numbers.
And they can do this with any number. They can choose to use 128 bits from the binary expansion of π - and then threaten to sue Kate Bush for singing the digits of π in a song. They couldn't do that with simple copyright - but they can with the DMCA.
To respond to a couple of the objections that have been brought up:
- It's true that most of these abuses of DMCA would probably not wind up surviving a jury trial. But that's irrelevant: until it gets thrown out by a court, it remains the law, and people remain potentially liable for lawsuits and punishment. For the moment, the law in the US says that if they use a number to encrypt a copyrighted work, they have ownership rights far in excess of copyright. And even if most of the abuses would not survive a trial, there's a good chance that at least some would survive trial. So those of us with limited resources have to be very careful to protect ourselves, which means behaving as if every arbitrary piece of insanity is enforceable until it's proven that it isn't.
- Reading laws is a remarkably tricky thing. The best advice I've ever heard about it came from a lawyer who told me: "You're a geek. Don't ever read a law. It looks like it's english, but it's not. It's in legal, which is a different language." At my previous job, I had a lawyer explain parts of the DMCA to me, and what I've said is my understanding of it. The legal meaning of a device is tricky, and it's not what any sane person would use. But my understanding, on the basis of discussions with people whose job is to interpret these things is that this is what the courts currently recognize. I'd love to be wrong about this - so if you anything published by an IP lawyer that would contradict that I've written, please point me at it. But laymen's interpretation law is not just worthless - it's potentially dangerous: by reading and interpreting the law yourself, you can open yourselves to increased punishment for willful violation. It sucks, but it's the way the law works.
- Log in to post comments
I wonder how this would fair in a court trial...
Excellent post! This whole thing is silly, and a waste of time. Let's assume that by some super lawyer power they were able to make the display of that number illegal on the web. Their still basically screwed, once they make that number illegal they will have to make the display of information on how to devise that number illegal (I'm not anywhere near a good a writer as you, so that may not make sense).
Heres an example,
09f911029d74e35bd84156c5635688BB + 5 = Illegal Number
Obviously I could have used a much more complex formula, but I could have made the outcome the same. How would they deal with that?
Cody:
Probably not well. The articles on things like BoingBoing aren't just passively using the number. They're identifying it, and telling you what it is. Under the DMCA, that's almost certainly enough to give the HD-DVD people a solid case. The only thing protecting them is the amount of bad press that would be generated by doing it.
The BoingBoing article mentions one other important thing I neglected. The HD-DVD people don't just have one number. They've got a shitload of them. And they won't tell you what they are.
That's very different from copyright. With copyright, I can always know whether I'm infringing. I *know* that I copied something. I may not know it's original source, but I know I copied it. But with the HD-DVD encryption codes, I can be violating their DMCA protection *without knowing it*. It's theoretically possible that the site we've all been getting our numbers from has been feeding us the actual HD-DVD codes in order to get us all in trouble. If they were, we'd be screwed - because we're publishing those numbers, and under the DMCA, ignorance is no excuse. It doesn't matter that we don't *know* what the protected numbers are.
Thanks for the link.
Perhaps I didn't make clear the point of my article, which is not about whether or not it is *reasonable* to copyright an integer but whether it is *practical* to pass laws about one. Beyond a certain length - and, I should add, level of complexity (obviously a string of a billion zeros or digits of pi should not be fair game) - it is entirely practical to treat occurrences of certain integers in reality as intentional copyright violations (or intent to crack something open with those integers, or whatever). This is because there would simply be no such thing as an accidental collision.
Two people independently writing and copyrighting two identical haiku is a matter for the copyright lawyers, not a mathematician like myself. Likewise, the real current issue of the leaked HD-DVD key is, though tangentially related and the inspiration for the article, actually outside its scope.
As for separating an integer which completely represents your article in every detail from the article itself... that's more of a philosophical problem.
The most interesting thing to me about the super-copyright afforded by the DMCA is that it has no time limit. If the DMCA still stands then, the 32-digit numbers the DMCA protects will be just as illegal in a thousand years as they are today. This stands in stark contrast to the clause in the constitution which gives Congress authority to create intellectual property laws in the first place:
To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries;
Normal copyright laws weasel around this by giving copyright for a "limited" term which is significantly longer than a human life and which Congress retroactively increases by about 30 years about every 30 years. The DMCA however doesn't even attempt any such technicality to get around the "limited times" clause and this, I hope, can eventually aid in its downfall...
I followed that link yesterday, and I took a different message from it than you, Mark.
(Btw, I love your site, Sam!)
Basically, I believe he's asserting something about the possibility of a correspondence between materials and copyright. Essentially, if you have a document that exceeds a certain threshold (determined by current hardware capabilities plus a healthy margin), it is possible to encode it with a number. Multiple numbers, in fact, but as long as we restrict ourselves to actual, realistic encodings the number of possible bit strings that can be assigned to a document is manageable.
Now, say that the document is copyrighted. What's more, you have, for some reason, a binary string displayed on your website, which just happens to be the exact binary representation of the document in a particular encoding. Assuming that Fair Use is moot for this particular instance, can the publishing of this binary string be considered copyright infringement?
His argument is that, over a certain boundary, it *is* reasonable to say that publishing the number counts as infringement in the same way that publishing the actual document would be.
There are, of course, an enormous (infinite?) number of ways you could possible encode a document, which means that a staggering amount of numbers can be considered to represent it. But as I said, we can make decisions based on standard, extant encodings, which sharply limits the number of possible numbers that can represent a document. Thus, by adjusting the threshold at which we allow a number to be considered 'copyrighted', we can make it arbitrarily unlikely that a given random number actually represents a copyrighted document, and thus arbitrarily likely that, given a number which *does* correspond to a copyrighted document, that the number was intentionally chosen rather than randomly generated.
Of course, someone will always raise the question of, "What about copyrightednumber+1? Is that 'copyrighted' as well?" The number of simple mungings of a number is fairly limited, and complex mungings are limited simply by their rarity (that is, if they are being used, then they must be distributed/known to people, and can thus be taken into account). Taking mungings into account simply means we have to raise the threshold a bit more to return to our desired improbability.
Yes, this can raise the same issues that Dembski and friends bring up, but we wouldn't be defining the threshold arbitrarily. We'd be looking at actual, well-known data to determine where the threshold should be. And we can always adjust it as our datamass increases.
To sum this all up, you wouldn't be copyrighting integers - that's still impossible. You'd simply be stating that, above a certain length, integers corresponding to copyrighted material are so unlikely to be generated by random processes that if you found them 'in the wild' there is a very high probability that they were intentionally created. That is, they *are* meant to represent the copyrighted material, and can be deemed as infringing.
I do, however, disagree with sam's conclusion, that 128 bits is an acceptable threshold. I'd likely put it at several thousand bits, as that corresponds pretty well to the smallest files actually likely to be encountered. Your haiku, for example, clocks in at 704 bits on my Windows machine when I saved it as a .txt document. I'd still probably put the threshold above that, just to be safe.
I'd like to point out an additional reason that copyright on things shouldn't be limited to some measure of "improbability"---there can be multiple, separate copyrights on the same thing. To wit: "A copyright owner has no recourse against another person who, working independently, creates an exact duplicate of the copyrighted work." (http://tinyurl.com/23fxau). There have been cases where two musicians accidentally stumbled onto the same refrain for a song (after all, there are only so many short sequences of notes that sound good), and each hold a copyright to their respective songs.
Yes, I agree that the improbability of stumbling onto the same long sequence is evidence that it was copied, not independently created, but it is the copying, not the improbability that makes it illegal.
Sorry, marked up that URL incorrectly, so the link is broken. This should link to the same article http://library.findlaw.com/1999/Jan/1/241476.html
As far as I understand, the DMCA is pity not about the number itself, but about a fact, that this number was posted with a notice that this is the key. If the number was used just as a number, no problems would arise, but, being posted as the key, it made DMCA unhappy.
In any case, this all numbers privatization stuff sounds weird to me.
I see a business idea here. Like people buy interesting domain names and 'squat' on them, waiting for the highest bidder, I want to set up a corporation to buy the copyright to large significant numbers (eg primes longer than 128 bits; significant future dates, such as millennia, in the Unix time format, Bill Gates's bank balance). I shall then take legal action against anyone using these numbers, or sell the rights to the highest bidder. Venture capitalists, please get in touch.
Hi Mark. Great blog. Being a math grad student I love innumeracy, I think of it as future job security.
Every time I see another DMCA approach to security it just gets sillier and sillier. I always admired the RSA approach to security, basically realizing that there are clever people in the world, by doing the RSA challenge. It seems that they keep doing the Ol' lets now make that illegal approach. I often wonder if the following mini-play isn't taking place in the minds of their security people; although, in a much more over-the-top way:
Doug: Ted I just found the HD-DVD encryption codes we can now copy with impunity.
Ted: But Doug, those are protected by the DMCA and its illegal to even see them.
Doug: My god, you're right, let us never speak of this again!
Ted: Agreed!
(Doug and Ted shudder)
Its just my 2 cents, but it seems that they should be interested in making it actually difficult to crack rather than relying on a silly law to protect copyrighted works. I find it difficult to believe that simply barring access to the magic number will help, like I said, people tend to be rather clever when properly motivated.
I cannot help but to notice that this sort of rationalization of what should be copyrightable based on (im)probabilities flies in the face of the original purpose of copyrights, which is to promote Arts and Sciences. Any consideration of copyrights must therefore depend not on some cut-in-stone principles derived from false analogy from right of ownership, but on what can be considered reasonable and proper in balancing the freedom of speech with the protection of the livelihoods of artists in the interest of progression of culture.
As a funny side note, what if I were to craft a device for devising random numbers? Any number thusly produced would with equal improbability correspond to some future work. As long as my device could in theory produce any work of art, I should, by this reasoning, have the copyrights to all future works, since the probability of someone, by accident, of creating a work producible by my machine would be infinitesimal.
Will C.: The reason why copy protection must be protected by law is that the concept of a digital copy protection is fundamentally flawed. It can only fully work in an Orwellian society, where someone is constantly looking over your shoulder to see that you're not copying anything.
A very interesting take on copyright is the idea of OFF (Owner Free File system, http://offsystem.info/CopyNumbCJ.pdf). The explanation of it is very crappy, but the idea is quite cool. I would like to hear what you think about it.
"But as I said, we can make decisions based on standard, extant encodings, which sharply limits the number of possible numbers that can represent a document."
I dispute this. Take a standard, extant encoding like ASCII. There may or may not be a unique way to represent Mark's haiku, depending on whether you count representations that include irregularities in whitespace, capitalization, and spacing.
But then take a standard, extant encoding like PDF. Every choice of font or color, every choice of angle or position for scanning, every choice of bit resolution results in a quite different number. If limiting ourselves to standard, extant encodings limits the number of possible representation, it sure doesn't limit it much.
And in terms of copyright enforcement, we haven't even started yet. Copyright includes translation rights, so in order to detect copyright violation, we'd also have to identify every possible number representing a translation of the document into French, German, Spanish, Russian, Chinese, Pashto, and Wolof.
And let's not forget rights to derivative works. For some works -- probably not the haiku -- we'd have to find numbers for all possible dramatizations, children's versions and comic book adaptations (I'm old enough that "comic book" isn't a dirty word and means graphic novels and manga too). Oh, and also representations for possible sequels and fan fiction.
No, copyrighting numbers as a method of copyright enforcement is neither reasonable nor practical.
On another subject, this brings up something that has pestered me for awhile. Years ago, I read a science fiction story in which a group of researchers on a SETI-like project intercepted an interstellar transmission. After months of work, they came to the conclusion that the transmission was in fact a graphics file, and complications ensued.
Now, granted that any text or image can be encoded into one or more numbers. But suppose we happen across a number and have reason to believe that it represents _something_. Are there any mathematical or CS principles that would help us determine the nature of the referent? Could the researchers in the story actually decide that the transmission is a graphics file without prior knowledge of the encoding standard?
Billy:
I think you're probably talking about Carl Sagan's novel, "Contact". (Which wasn't a great novel, but was infinitely better than the crappy movie that they made out of it.)
The answer to your question is: If the *senders* of the message containing the image actually wanted people to be able to decode it, then yes, they could reasonably easily choose an encoding that would be easy for a receiver to decode.
We often end up thinking of images in terms of relatively complicated formats, like jpegs. Most image formats are optimized for space - that is, they include various compression mechanisms to try to make them smaller. If you're sending to an unknown receiver, you wouldn't want to do that - you'd choose to make it larger in order to make it clearer.
So suppose you want to send a black-and-white bitmapped image. The most naive approach would be encode it in binary, with a "1" for a white pixel, and a "0" for black; and you'd pick a rectangular shape for the image, where the width and height are both primes - that way, there's only one possible rectangle that fits the bitmap.
If you think of that in numeric terms, you could have a carrier wave which has different amplitudes for different numbers. So we could label the amplitudes as, say, the numbers from 1 to 20. So you'd reserve the codes "1" and "20", and not use them outside of images. So then in your transmission, you'd have some stuff encoded with the 2-19, then a long sequence of 1/20 codes whose length is N*M where N and M are prime numbers.
If you saw that kind of a pattern in a message - particularly if you saw that kind of pattern repeated, where there are ranges of unique codes whose length is the product of two primes - you'd definitely try looking at it as a rectangular image.
Of course, a copyrighted material may actually be represented by numerous numbers (depending on file format, resolution, etc.). So in the case of traditionally copyrighted materials (like books, but not computer programs) it isn't even an individual number that is in question. Furthermore, you could encrypt a copyrighted material and it would take on any range of numbers depending on the key, but it would still be an infringement of copyright to distribute such duplicates.
I feel that today's copyright laws are too long. The first copyright law in England only gave rights for 28 years. Most works created today are protected for the life of the holder plus seventy years. Ridiculous, especially when you consider that the overwhelming majority of profits to be made from copyrighted materials are probably earned within the first few years and I'm sure that by about 30 or 40 years, well over 90% of the profits that could be had by all copyrighted materials have been made. I think copyrights should last maybe 40 years max and maybe renewals on copyrights could be had for an additional 20 years max if a reasonable fee is paid, which would ensure that any extended copyrights applied primarily to works that still were capable of earning their holders a profit. After all, why should a book which is out of print and likely never to return to print continue to enjoy protection even a full century later?
Perhaps Mark has experienced the frustration of not being able to find an obscure, out-of-print mathematical treatise in the university library and also being unable to obtain it via interlibrary loan with a neighboring institution or even being unable to find a used copy on Amazon.com for an exorbitant price. As a chemistry student, I know I've encountered those kind of problems before. Even a large university library isn't the Library of Congress.
I feel that today's copyright laws are too long. The first copyright law in England only gave rights for 28 years. Most works created today are protected for the life of the holder plus seventy years. Ridiculous, especially when you consider that the overwhelming majority of profits to be made from copyrighted materials are probably earned within the first few years and I'm sure that by about 30 or 40 years, well over 90% of the profits that could be had by all copyrighted materials have been made. I think copyrights should last maybe 40 years max and maybe renewals on copyrights could be had for an additional 20 years max if a reasonable fee is paid, which would ensure that any extended copyrights applied primarily to works that still were capable of earning their holders a profit. After all, why should a book which is out of print and likely never to return to print continue to enjoy protection even a full century later?
Perhaps Mark has experienced the frustration of not being able to find an obscure, out-of-print mathematical treatise in the university library and also being unable to obtain it via interlibrary loan with a neighboring institution or even being unable to find a used copy on Amazon.com for an exorbitant price. As a chemistry student, I know I've encountered those kind of problems before. Even a large university library isn't the Library of Congress.
Basically, the copyright issue has become a war, and while media corporations choose to use the courts, they do so not because they are 'right' but because it plays to their strengths (deep pockets). The only sane way for those of us who oppose them to respond is to play to our strengths (mass uncontrollable publication via the internet). While this may hurt reasonable users of copy right, leaving groups like the RIAA unchecked is worse. I suspect that the only way to halt the RIAA and co. is to disarm them by emptying those pockets.
tommy
The copyright granted by America's founders was a mere 7 years, with an option for a 7 year extensions. That's 14 years total. After that, it was public domain, no questions asked.
Copyright is an unholy abomination in its current form.
Matthew L.
You're comparing two different things here. The odds of two people stumbling on the same melody are humongous compared to the numbers we'd be talking about here. The odds of someone randomly generating an 8192 bit number (1kb) that just happens to match the encoding of a 1kb file is nearly impossible.
I mean, there are 28192 possible binary strings of that length. That's an absolutely staggeringly large number. Using base ten, that's approx. 103557. To put it another way, it's googol35.6.
That's just a simple, tiny 1kb file. Now, think about how large ordinary non-trivial files are. An mp3 of a song is 3-5 mbs, which corresponds to something around 2225, or 233554432 possible bit strings.
Choose a random bit string from those 233554432. Construct a set consisting of that bit string and all simple mungings. This will be a pretty big set, of course, but still infinitesimal compared to the totality of the set. Now, what are the chances that any of these bit strings actually encode a file in any extant data encoding? I wouldn't be able to calculate the number, but I know that the odds would have to be written in scientific notation. Now, choose, I don't know, a trillion such bit strings, and repeat this procedure. The odds of even one of them producing a file are still infinitesimal.
We're not talking about the odds of abiogenesis being true. That's not generally something that can have odds applied to it. We're just looking at a given number, and trying to see if it is simply a random number and thus innocent, or an encoded file and thus subject to copyright restrictions. With an appropriate boundary line, this test can be more reliable than anything else in our legal system. ^_^
Again, you're not copyrighting a number. You're simply stating that the odds of a particular number being randomly generated are so infinitesimally small as to be very incriminating if it is found in the correct circumstances. There is always the chance that you actually did randomly generate the number, and you just happen to like sending random bit strings over a p2p network or something. But it is enormously more likely (and by enormous, we're talking many, many orders of magnitude here) that the bit string you are sending is, in fact, an encoding of a file, which is then potentially subject to copyright law.
Numbers aren't being copyrighted. They're simply being recognized for the markers that they are.
Billy
No, it's really not that difficult. We're not identifying numbers, and specifying that these particular numbers aren't allowed to be transmitted. We're looking at a given number, and determining whether or not it represents a file in an extant file encoding (or possible a simple or known munging of such a number).
We don't have to identify the numbers that correspond to a file having more spaces, or in a different language, or anything like that. We just have to look at the file and try decoding it. If you get a valid Notepad document or something, and the text is copyrighted material, then it is very nearly certain (assuming the number is above a defined threshold size) that the number was intentionally created (eg it's an actual file, not a random number). What matters are the functions that can be applied to the number (either file encodings [integer -> file] or mungings [integer -> integer]). Beyond that, it's simply ordinary copyright law.
As well, note that as you introduce more whitespace or formatting weirdness into the file (presumably to hide its origins) you increase its file size, making it ever less likely that the bit string was randomly generated.
tommy
The copyright granted by America's founders was a mere 7 years, with an option for a 7 year extensions. That's 14 years total. After that, it was public domain, no questions asked.
Copyright is an unholy abomination in its current form.
Matthew L.
You're comparing two different things here. The odds of two people stumbling on the same melody are humongous compared to the numbers we'd be talking about here. The odds of someone randomly generating an 8192 bit number (1kb) that just happens to match the encoding of a 1kb file is nearly impossible.
I mean, there are 28192 possible binary strings of that length. That's an absolutely staggeringly large number. Using base ten, that's approx. 103557. To put it another way, it's googol35.6.
That's just a simple, tiny 1kb file. Now, think about how large ordinary non-trivial files are. An mp3 of a song is 3-5 mbs, which corresponds to something around 2225, or 233554432 possible bit strings.
Choose a random bit string from those 233554432. Construct a set consisting of that bit string and all simple mungings. This will be a pretty big set, of course, but still infinitesimal compared to the totality of the set. Now, what are the chances that any of these bit strings actually encode a file in any extant data encoding? I wouldn't be able to calculate the number, but I know that the odds would have to be written in scientific notation. Now, choose, I don't know, a trillion such bit strings, and repeat this procedure. The odds of even one of them producing a file are still infinitesimal.
We're not talking about the odds of abiogenesis being true. That's not generally something that can have odds applied to it. We're just looking at a given number, and trying to see if it is simply a random number and thus innocent, or an encoded file and thus subject to copyright restrictions. With an appropriate boundary line, this test can be more reliable than anything else in our legal system. ^_^
Again, you're not copyrighting a number. You're simply stating that the odds of a particular number being randomly generated are so infinitesimally small as to be very incriminating if it is found in the correct circumstances. There is always the chance that you actually did randomly generate the number, and you just happen to like sending random bit strings over a p2p network or something. But it is enormously more likely (and by enormous, we're talking many, many orders of magnitude here) that the bit string you are sending is, in fact, an encoding of a file, which is then potentially subject to copyright law.
Numbers aren't being copyrighted. They're simply being recognized for the markers that they are.
Billy
No, it's really not that difficult. We're not identifying numbers, and specifying that these particular numbers aren't allowed to be transmitted. We're looking at a given number, and determining whether or not it represents a file in an extant file encoding (or possible a simple or known munging of such a number).
We don't have to identify the numbers that correspond to a file having more spaces, or in a different language, or anything like that. We just have to look at the file and try decoding it. If you get a valid Notepad document or something, and the text is copyrighted material, then it is very nearly certain (assuming the number is above a defined threshold size) that the number was intentionally created (eg it's an actual file, not a random number). What matters are the functions that can be applied to the number (either file encodings [integer -> file] or mungings [integer -> integer]). Beyond that, it's simply ordinary copyright law.
As well, note that as you introduce more whitespace or formatting weirdness into the file (presumably to hide its origins) you increase its file size, making it ever less likely that the bit string was randomly generated.
The article linked to was reasonable and mathematically sound. Mark's rebuttal ended up as just a rant about the DMCA, something the article never even mentioned. Besides, the article's conclusion was only that it was practical to allow copyright of certain numbers; it did not discuss philosophical or legal issues like how something must have "meaning" to be copyrighted.
Could we theoretically encrypt something with a much simpler number like '24'. Sure its not much security but I'll use it to encrypt some original writing such as this post. Then the number '24' is mine. Then I sue Jeff Gordon of Nascar fame (along with his sponsors Hendrick Motorsports and DuPont) for using my number on his car. I also sue Fox Networks and Keifer Sutherland for the tv show of the same name. I might sue God for making 24 hours in a day. These may not be successful lawsuits, but it sure would be entertaining.
And that would, of course, be completely impossible by the argument given in the linked article in *this* post. If you're talking about encryption, that's the other post. This is more about recognition. At what point can we look at a number that just happens to code for copyrighted content and say, "That was almost certainly not randomly generated"?
That's all this is about. The number 24, being composed of 5 binary digits, is clearly not large enough to do so.