Another chaos theory post is in progress. But while I was working on it, a couple of

comments arrived on some old posts. In general, I’d reply on those posts if I thought

it was worth it. But the two comments are interesting not because they actually lend

anything to the discussion to which they are attached, but because they are perfect

demonstrations of two of the most common forms of crackpottery – what I call the

“Education? I don’t need no stinkin’ education” school, and the “I’m so smart that I don’t

even need to read your arguments” school.

Let’s start with the willful ignorance. This is the kind of crackpottery

that frankly bugs me most. As an American, I’m used to the fact that our

culture distrusts intelligence and education. Politicians in America use

“intellectual” as a insult. Mentioning that someone attended one of the best

schools in America is commonly used as a criticism. That’s where this one, by

a guy who calls himself “Vorlath”, comes from. Enough introduction,
href="http://scienceblogs.com/goodmath/2009/10/sorry_denise_-_but_god_didnt_m.php#comment-2022628">here

it is:

I can’t believe there are still people who believe in Cantor’s

theory. It’s complete silliness and makes the people who believe in

superstition look like the smart ones by comparison. Cantor was well known to

treat infinity as a finite number and that’s all he’s doing with his theory.

It doesn’t mean anything.

That argument reduces, roughly, to “I have no idea what Cantor’s argument was,

but I don’t like it, and therefore it’s wrong”.

Cantor didn’t treat infinity like a finite number. What he did was study the

structure of numbers, and realize that “infinity” isn’t a simple thing. You can show

that there are “infinities” that are larger than other “infinities”. In fact, it’s

pretty inescapable.

The rough sketch is: take any set, S. You can create another set, called the *power set* of S

(usually written 2^{S}), which consists of the set of all subsets of S. So if S is

the empty set, then 2^{S} has one value: the set containing the empty set. If S

is the set { a, b }, then 2^{S} = { {}, {a}, {b}, {a,b} }. The power set of any set S is

*always* larger than S. So – take the set of all natural numbers. How big is it? Well,

it’s infinite. How big is its powerset? It’s also infinite. But every powerset *must* be

bigger than the original set – so the powerset of the natural numbers must be larger

than the natural numbers. How can that be? It can be, because there are different kinds

of infinities.

Some of that actually has an effect in reality. I can create a perfect one

to one mapping between the natural numbers, and the set of all rational

numbers. They’re the same size. But I *can’t* do that for real numbers. My

mapping for the rationals won’t include π, unless I cheat and specifically add it

to the list. It won’t include square roots. No matter what I do, I can’t devise any

method for creating a one-to-one correspondence between the natural numbers and the real

numbers. *That* is what Cantor’s work really showed.

Intuitively, it seems *wrong* that there are degrees of infinity, that there

are bigger infinities and smaller infinities. The argument from Vorlath is, basically,

that the fact that it’s not intuitive means that it *must* be wrong. Vorlath has

no need to know what Cantor really said, or what it really means. Without knowing that,

he just *knows* it’s wrong, because it’s *obviously* silly.

Moving on, we come to the genius who doesn’t need to know an argument in

order to disprove it. This is an amazingly common form of argument. I see it

mostly in the form of Cantor disproofs, where it takes the form “Here’s a

one-to-one mapping between the natural numbers and the reals”. The mappings

are always wrong. But what makes the argument particularly annoying is that

the mappings are *trivially* wrong: Cantor’s diagonalization shows that

given *any* one-to-one mapping between the naturals and the reals, the

mapping will miss some real numbers. In *every* case where I’ve seen

one of these arguments, their enumerated mapping fails because Cantor’s

diagonalization shows how to produce a counterexample for it. You can’t disprove

Cantor’s diagonalization without showing that something is wrong with the

diagonalization proof – but these anti-Cantor geniuses constantly think that

they can disprove Cantor without actually addressing the proof.

This comment though, has nothing to do with Cantor. It’s on

the subject of iterative compression. Here

it is:

Your talking linear maths not non-linear with non-linear representation you

can compress repeatedly because the input a is different to the output b which

you can then consider B in linear form ready to become non-linear C I know it

can be done as I have the non-linear twin bifurcation maths needed and have

tested it already on random digit data and yes you can unpack this information

too. not to worry I’m am just getting ready to produce the compressor myself

and no I’m not a Christian. There of course for a given data set will be

limits as to how compressed you can make the data but this will not be

relative to linear laws of data compression.

This is, basically, the same thing as the Cantor disproofs. The argument

against iterative compression is pretty simple: compression is intrinsically

limited – because *most* strings are non-compressible. It doesn’t

matter how clever you are.

The argument is incredibly simple. Imagine that you want to compress all

strings of 128 bits. You’re not very ambitious: you only want to compress them

by *one bit* – to reduce all of those strings of 128 bits to 127

bits.

You *can’t* do it.

There are 2^{128} strings of 128 bits, and 2^{127} strings

of 127 bits. That means that either: (1) you can’t compress half of the strings,

or (2) on average, every 127 bit string is the compressed form of *two*

128 bit strings. In case 1, you’ve admitted defeat: you can’t compress half of

the strings *at all*. In case 2, you’re screwed, because compression is

only valuable if it’s reversible – that is, when you compress a string, you

expect to be able to get the original uncompressed string back; but if

there are multiple input strings that can be mapped to the same compressed

string, then your decompressor can only, at best, return a *set* of

strings saying “The uncompressed string is one of these”.

The more you compress, the worse it gets. Want to compress things by half?

Then you’re talking about compressing 2^{N} strings into

2^{n/2} values – giving you 2^{n/2} possible uncompressed

values for each compressed string.

The way that this relates to iterative compression is that an iterative

compression system is no different than any other. Sure, you can devise

iterative compression systems – they’re commonly known as “lousy compressors”.

What they do is take some specific set of input strings, and compress them a

little bit. Then you run them again, and they compress a little bit more. Then

you run them again, and they compress a little bit more. Until eventually,

they stop working. And, at best, you’ve compressed your input as much as you

could have in a single pass with a non-iterative compression system.

The unavoidable fact is that the set of inputs is larger than the set of

outputs, and that means that compression is inevitably limited. The reason

that compression works *in practice* is because our input strings are

highly structured – and in this case, structure is another word for

“redundancy”. We can compress text because text is far from random – it

has redundant structure that we can remove. (For example, in the format

that I’m using to write this post, each letter takes 8 bits. But I’m using

a character set of about 64 characters. So really, even ignoring all of

the redundancy of english, I’m using 8 bits per character when I could

be using only 6. Fully 1/4 of the length of this document is completely

redundant. And that doesn’t even cover the fact that I use words like “that” all

the time – 41 times so far in this document, and each one takes 32 bits.)

The problem with compression has *nothing* to do with linearity

versus non-linearity. It doesn’t matter how clever you are. If you can’t

explain just how you’re going to compress 2^{N} strings into

2^{N-1} strings without losing any information, then *you
lose*: it can’t work. You haven’t addressed the problem.