
 While reading Mandelbrot's text on fractals, I found something that surprised me: a relationship
between Shannon's information theory and fractals. Thinking about it a bit, it's not really that suprising;
in fact, it's more surprising that I've managed to read so much about information theory without
encountering the fractal nature of noise in a more than cursory way. But noise in a communication channel
is fractal - and relates to one of the earliest pathological fractal sets: Cantor's set, which Mandelbrot
elegantly terms "Cantor's dust". Since I find that a wonderfully description, almost poetic way of describing it, I'll adopt Mandelbrot's terminology.
 Cantor's dust is a pathological set. It's caused no small amount of consternation among mathematicians and physicists who find it too strange, too bizarre to accept as anything more than an extreme artifact
of logic in the realm of pure math. But it's a very simple thing. To make it, you start a line segment.
Cut it into three identical parts. Erase the middle one. Then repeat the cutting into thirds and erasing the middle on each of the two remaining segments - and then the segments remaining from that, and so on. The diagram below shows a few steps in the construction of the cantor dust.

Why should it be called a dust? Because geometrically, in the limit, it's got to be a set of completely disconnected points - a scattering of dust across the original line-segment.
 It's so simple - what is pathological about it? Well, it's clearly 0-dimensional in the pure sense - as I said above, it's just a collection of points. Topologically, it's a set of points with empty neighborhoods. And yet - look at it. It's clearly not zero-dimensional. It's got a 1-dimensional geometric structure. But naive topology insists that it doesn't. But there's worse to it. It's a similar
problem to what we saw in the shape-filling curves. Clearly, the dust disappears into nothingness. Every part of it has zero length - it's seems like it must converge to something very close to the empty set. And yet it doesn't: the set of points in the Cantor dust has the same cardinality as the cardinality of the set of points in the original line.
 For those of us who came of age as math geeks in the late 20th century, this doesn't really seem that strange at all. But you've got to remember: the 20th century was a time of great turmoil in math. At the beginning of the century, the great work was solving mathematics: turning all of math into a glorious, perfect, clean, rational, elegant edifice. The common belief of the time was that math was beautiful and perfect - that while the real world might be ugly, might have all sorts of imperfections and irrationalities, that those real-world flaws could never touch the realm of pure math: math was, in the words of one famous mathematician "the perfect mind of God". And then came the crash: the ramifications of Cantor's set theory, Gödel's incompleteness, Church and Turing's uncomputability, fractals, Chaitin's
strange numbers... The edifice collapsed; math was flawed, imperfect, incomplete as anything else in the world. It was hugely traumatic, and there was (and in some circles still is) a great deal of resistance
to the idea that so much irregularity or ever irrationality was a part of the world
of math - that the part of math that we can really grasp and use is just an infinitessimal part of the
monstrous world of what really exists in our abstractions.
But getting back to the point at hand: what does Cantor's dust have to do with information and noise?
 Imagine that you're listening to sound through a telephone wire with incredibly precise recording equipment. You're sending a perfectly clear sine-wave over the line, in order to see how much noise
there is.
 You start pretty high - you only want to record noises that exceed, say, 20% of the amplitude of the
basic sine-wave. You wind up with a pattern of bursts of noise. Those noises are scattered around, temporally. Now, mark off every time period of greater that 5 minutes where there is no noise.  Those are gaps in the noise - the largest gaps that you're going to look at. Now look in the bursts of noise - that is, the periods of time where there was no gap in the noise longer than 5 minutes. Look for periods of 1 minute where there wasn't any noise. In between the 5 minute gaps, you'll get a collection of smaller 1 minute gaps, separated by smaller bursts of noise. Then look into those 1 minute gaps, for 10 second periods with no noise - and you'll break the bursts of noise up further, into bursts longer than 10 seconds, but shorter than a minute. Keep doing that, and eventually, you'll run out of noise. But turn down your noise threshold so that you can hear noise of a smaller amplitude, and you can find more noise, and more gaps, breaking up the bursts.
If you look at the distribution of noise, one thing you'll notice is that the levels are independent: the length of the longest gaps has no relation to the frequency of smaller gaps between them. And the other thing you'll notice is that the frequency of gaps is self-similar: the distribution of long gaps relative to sections of the recording of long length are the same as the distribution of short gaps relative to shorter sections of the recording. The noise distribution is fractal! In fact, it's pretty much a slightly randomized version of Cantor's dust.
 Understanding the structure of noise isn't just interesting in the abstract: it provides a necessary
piece of knowledge, which is used regularly by communication engineers to determine the necessary
properties of a communication channel in order to ensure proper transmission and storage of information. Recognizing the fractal nature of noise makes it possible to better predict the properties of that channel, and determine how much information we can safely pump through it, and how much redundancy we need to add to the information to prevent data loss.
 
Maybe I've seen too many of these examples, but I find I'm no longer surprised by these "surprising" qualities. For example, I would not expect the dust to be like the empty set. Let's consider that any line segment during the construction contains at least one point (fairly obvious). At the nth stage of the construction there are 2^n line segments, so at least 2^n points. Therefore the final dust, occurring after an integer-infinity of iterations (aleph_0) has at least 2^(aleph_0) points, the cardinality of the line (can I take the continuum hypothesis as true?).
Interestingly, the classic Cantor set also has Lebesgue measure zero -- there's "nothing there". But there are similar Cantor dusts (with similar constructions) which have any positive measure between 0 and 1. That is, you can have "nothing there" in the topological sense, but have that "nothing" contain almost all of the interval you started with.
Stephen, that's a very different approach to showing that the dust isn't an empty set than I thought about. I was thinking of it recursively: you can generate a map from the every point on the nth level to every point on the (n+1)th level by compressing half of each line segment to leftmost- or rightmost- third. The base case is that the first, unbroken line segment maps to itself.
The direct connection to (at least) 2^(\aleph_0) non-empty subsets and the continuum hypothesis is actually more interesting to me, though, I think.
(PS. I think this is the first time I've commented here: cool stuff! I've been reading for several months. Thanks for taking the time to write all of it up so nicely.)
On the contrary, the Cantor set (I have never heard it called the Cantor dust before, though it is a good name) has no isolated points, which seems to be what you're claiming here. But every point has arbitrarily small neighbourhoods with an empty boundary, which I guess means that the small inductive dimension is zero.
Factoid: the Cantor dust is exactly those numbers between zero and one which, when expressed in ternary notation, do not contain the numeral 1.
The first erasure removes numbers of the form 0.1xxx... (base 3)
The second erasure removes numers of the forms 0.01xxx... (base 3) and 0.21xxx... (base 3)
etc.
The Cantor set is a strange thing indeed; it is nowhere dense, totally disconnected and yet closed (and compact).
But every compact metric space is the continuous image of a cantor set!
I have read in the July number of the magazine "Le Scienze" (Italian version of "Scientific American") that also the solar wind seems to have a fractal distribution.
@Evan Murphy: the argument I give in comment 1 struck me because the number of points you know for certain are _in_ the set increases at every iteration. That makes it feel very natural that the final set has a continuum number of points, we get there by pushing the lower bound up to the continuum. Your mapping approach, meanwhile, shows that every point in the final set maps back to a unique point in the initial line and vice versa- hence there's a continuum of points at every stage.
MarkCC: I suspect that you and many of your readers will enjoy the story: "The Infinite Assassin" by Greg Egan, [Interzone #48, June 1991].
Arturo Magidin summarizes as follows:
"There are multiple realities. As the narrator puts it, 'the number of parallel worlds is uncountably infinite - infinite like the real numbers, not merely like the integers.' The narrator has to kill a man across the realities to prevent him from tearing the universe apart. He is 'stable' meaning there is little to no variation between all copies of him across the realities. However, he is eventually put out of action by being blown into 'Cantor dust' (his selves are put, so to speak, in the Cantor set instead of the entire interval) making him ineffective, since his copies are nowhere dense in the realities, so cannot really affect them enough."
I should not have to remind you that Greg Egan, down under in Australia scribbling (and graphically programming) Mathematical Physics with coauthors such as John Baez, and science fiction that is bizarre and brilliant.
He is one of the great Mathematician / Science Fiction authors, along with Eric Temple Bell, Vernor Vinge, Rudy Rucker, and folks like me running frantically to follow in their footsteps.
And then there's the Cantor function: it is a continuous function on [0,1] -> [0,1] with the property of being a.e. differentiable with derivative 0. It essentially "increases only on the Cantor set," i.e. is constant on the complement of the Cantor set.
Minor point re comment #1: the continuum hypothesis has nothing to do with your argument. The real line and the Cantor dust both have cardinality 2^{aleph_0}, whatever the status of CH.
Stephen, your argument is unconvincing. If you literally had 2^n points it wouldn't work. You'd just end up with something which had the cardinality of the natural numbers.
Canuckistani, that "factoid" is actually the basis of the proof of the cardinality of the Cantor set (or dust) that I learned in real analysis. See, given that every number in the Cantor dust can be written in the form .x1x2x3.... where xn is an element of the set {0,2} in base 3, you can set the elements in the Cantor dust in 1-1 correspondence with the elements of the interval [0,1) as written in binary.
With the greatest respect, not all noise is fractal. You're talking about 1/f noise, which has fractal characteristics. However, not all noise in communication lines is 1/f noise. There exist many different kinds of noise with many different kinds of spectral and temporal characteristics. One good example of non-fractal noise in communications lines ithe interference caused by too many 2.4 Ghz 802.11b wireless acess nodes trying to access the internet at once.
Cantor dusts show up in the visual arts too. Fractal Fish (2001) by Robert Fathause and Seven Birds by Hollister David derive from the Cantor dust:
http://www.mi.sanu.ac.yu/vismath/fath/index.html
And there's a even a rock group called Cantor Dust:
http://www.themusicappraisal.net/reviews.swallowedbythenight.shtml
Now you are confusing me. I can understand from Mark's description why a channel with a real life threshold would exhibit fractal characteristics, or why an ideal 1/f noise is self-similar on all scales. But many realized 1/f noise processes, such as flicker noise, have frequency cut-offs so would not be fractal (but have fractal characteristics).
On the other hand, wouldn't shot noise, being a Poisson process, have fractal characteristics under your conditions (i.e. up to the frequency cutoff)?
Um, okay, I guess I'm really asking if we shouldn't distinguish between fractals in the math of idealized models and limited fractal characteristics ("fractal nature") of measurable physics. The communication channel noise is sort of in between since it would measure as fractal over the whole communication process, as I understand the description.
What would an ideal mathematician say? :-P
The ideal mathematician would say:
(a) "It is obvious that we should. . . [scribbles on chalkboard, erases, scribbles again] Yes, it is obvious."
(b) "That's left as an exercise for the interested reader."
(c) "I'll pass that question over to the ideal computer scientist, thereby reducing the problem to the previous joke."
@Anonymous: are you claiming that 2^(aleph_0) is equal to aleph_0? If so, I have some bad news for you.
No, Anonymous is claiming that as n grows bigger (through the integers), a nested family of sets of size 2^n approaches a set of size aleph_0, not a set of size 2^{aleph_0}.
I see. Is that the case?
Stephen Wells:
Stephen Wells:
No I'm not. However, if you think that you have demonstrated through your argument that you have shown that there are 2^aleph_0 subsets, than I have some bad news for you. Simply doubling the number of sets at each iteration is insufficient to ensure that as N->infinity that you have 2^aleph_0 subsets.
0:{{1}}
1:{{1}{2}}
2:{{1}{2}{3}{4}}
3:{{1}{2}{3}{4}{5}{6}{7}{8}}
...
The above process clearly yields the set of natural numbers, and at each step, there are 2^n points.
archgoon:
And yet, strangely enough, you *can* construct the real interval [0,1] in binary.
Step 1: {.0 .1}
Step 2: {.00 .01 .10 .11}
Step 3: {.000 .001 .010 .011 .100 .101 .110 .111}
...
The above process clearly yields the set of reals between 0 and 1, and at each step, there are 2^n numbers.
Thusly, sirrah, your argument is refuted!
(Your problem is that you'll run out of numbers at the aleph0th step. It's certainly not obvious that such a thing would occur, however.)
Xanthir, your process has an aleph_0 step, and the increased cardinality occurs entirely there. The doubling at the finite steps is irrelevant.
Archgoon is correct that Stephen's argument is flawed.
... with a truly remarkable proof, which this board is too small to contain.
... after a pullback to a Hopf bundle.
... assuming the Riemann hypothesis.
Or, you could just view the stage n Cantor Set as the points between 0 and 1 (that is, in [0,1) - open at 1) that have no "1" digit in the first n places after the point when their coordinate is expressed in trinary in the shortest possible sequence. (i.e. no infinitely repeating "2" digit) "The Cantor Set" is then the intersection of the stage n Cantor Sets for nââ.
Then, we have a very easy 1-to-1 correspondence to the entire real line: just take the string representations, swap the "2" digits for "1"s, and read the string as binary.
Xanthir,
You can not construct the interval of [0,1] that way, your construction misses all the irrational numbers between [0,1].
2^{aleph 0}
think of it as the set of all infinite sequences (indexed by the positive integers) of 0's and 1's
e.g.: things like:
1, 0, 1, 0, 0, 1....
1, 1, 1, 0, 1, 0, 1....
the "2" in the "base" comes from 2 objects, the "aleph 0" comes from having an infinite sequences of these things.
And the set of all such gadgets are indeed uncountable.
Proof (for those who haven't seen it) below.
Brain is right about Xanthir's construction leaving off all of the non-terminating diatic decimals (hence the irrationals, at least).
Of course, 2^{aleph 0} is uncountable, and the nested interval property shows that one gets the cardinality of the unit interval (and hence the real line).
Proof that 2^(aleph 0) is uncountable: suppose there is a one to one f map between the positive integers and these things.
Then construct a squence that isn't in this image as follows:
Let a(n) denote the n'th entry of the constructed sequence and f(n,j) to denote the n'th entry of the image of f(j) (remember, the domain is the set of natural numbers 1, 2,....and n is the n'th entry of the image which is a sequence.
Then let a(n) = 1-f(n,n) mod 2.
(that is, if the n'th entry of f(n) is 1, then let a(n)=0, else, let a(n) = 0.
Then a(n) cannot be in the range of f as it differs from f(n) at the n'th entry for each n.
I'm quite aware that there is an aleph-0th step. I indicated that obliquely through my last sentence. My construction of the reals is virtually identical to the construction of the surreal numbers. At any given step you have only dyadic rationals, but in the limit - at the aleph-0th step - you finally generate everything.
I'm also quite aware that the naturals (and integers and rationals and algebraics) are of a different cardinality than the reals. Thus why you need the aleph-0th step.
Unless I'm missing something very important, my construction (and thus, Stephen Wells') is equivalent to Conway's construction of the reals in the field of surreals. Can anyone point out to me either where I do not have a good match with Conway, or where Conway is wrong?
For reference, Conway's surreals can be represented by their sign-expansions, which are merely a string of + and - signs. At the ωth birthday, where each sign-expansion is infinitely long, the entirety of the real numbers is generated. Constructing strings of + and - signs seems, to me at least, to be exactly equivalent to constructing binary decimals in the manner I indicated. My method only constructs the numbers in [0,1] while Conway's finds *all* the reals, but that's a trivial difference.
Another way to show the Cantor set is uncountable is to use the idea of a perfect set. A set X is perfect if it is its own set of limit points (i.e., if X' is the set of limit points of X then X=X').
Theorem: Every non-empty perfect set is uncountable.
Theorem: The Cantor set is perfect.
This is getting really interesting. We know from the counting argument that at every iteration n of the formation of the set we have 2^n line segments (hence at least 2^n points). And we know from the line-mapping/ternary arguments that the final set has the cardinality of the line. So we have here a process that has at least 2^n points at each n and goes to aleph_1 = 2^aleph_0 points as n-> aleph_0. However archgoons' argument seems to show that we can also have a set with 2^n points at every iteration which goes to aleph_0 in the limit, and I can't see anything wrong with his construction. So apparently some 2^n sets go to aleph_0 and some to aleph_1, which is... confusing :) What am I missing?
Stephen, that "at least" is very important. You actually have far more than 2^n points.
Xanthir, your construction is fundamentally different from what Stephen was doing. His argument basically went:
At stage n, we have 2^n intervals. Therefore, we have at least 2^n points. Since this is true for all n < aleph_0, we have at least 2^(aleph_0) points.
That last step is wrong. The correct conclusion is that we have at least aleph_0 points (cardinal exponentiation isn't continuous in the exponent). The important point is that unlike you, he doesn't have an aleph_0 step. There is no stage at which we have aleph_0 many intervals.
Okay, part of my third paragraph was eaten:
Since this is true for all n < aleph_0, we have at least 2^(aleph_0) points.
I'm confused, though, by your assertion that his construction doesn't have an aleph-0 step. Stephen explicitly referenced the aleph-0 step in his first post.
Is it just that going from n=0...aleph-0 isn't continuous, as you implied? Is that an important difference? I know that leaving out the aleph-0th step leaves you with only countably many points, but why precisely can't we just tack that on at the end to fill in the rest?
Terribly sorry to be the pedant, but every "aleph_0-th" that anyone has said should be "w-th," where w (omega) is the first infinite ordinal.
@JBL: yes, now you mention it, we would all get less RSI by typing w. I'll do that from now on.
@Antedren: firstly, I know I have at many more than 2^n points, which is why I stated the "at least" explicitly. I'm trying to see if we can use the fact, that each line segment must contain at least one point, to get the number of points in the dust, where each line segment contains _only_ one point. And I'm confused by your assertion that there is no w'th step, I thought I'd made it clear that there should be- bearing in mind that the construction of the set by deletion of segments only gives you the set after w steps!
What he called the aleph_0 step is simply taking the intersection of all previous stages. This doesn't actually introduce anything new, it simply collates what happened earlier. But yeah, primarily his mistake is the continuity issue.
What program do you use for making those beautiful charts?
I'm really not sure at all what you mean by "simply taking the intersection of all previous stages". That's not indicated at all in what he said. Can you show how what he said is equivalent to that?
As well, if the "continue until step w" construction is wrong, what makes it correct when Conway uses it?
Random sidenote, Antendren - are you the same one that posts on OYT?
Sam:
I use OmniGraffle for the mac for all of my drawings.
It's not indicated in what he said, but it is how one constructs the Cantor set. You have collections of intervals at each finite step, then you take the intersection. There's no infinite step at which you have a collection of intervals, unless you consider the final set itself to be such a step and the points to be length 0 intervals. That would just be begging the question, however.
Continuing to or past \omega is fine, you just need to be clear on exactly what's going on.
And yes, that's me.
Hmm. I was thinking that you'd reach the appropriate step in the limit, similar to how at no finite step is the peano curve space-filling, but in the limit it is.
Continuing the sidenote, this is CJM. Good to see you. ^_^
But why should the limit be a "step"? It is an "infinite" construction, and may have other properties than the preceding steps. Antendren's observation on what happens with the intervals/points argues against the limit being among the defined steps.
Similarly, the peano limit has different properties than the construction in the steps. (An infimum or supremum limit is the closure more than a result of a continous process. For example, when the function is an oscillation.
As I remember it, when you work with distributions the limit may well end up outside the support of every test function, which seems to be a neat property to allow if you look for weak solutions. As long as the difference between the original set and the limit points is a set of zero measure, what would be the problem?
"As long as the difference between the original set and the limit points is a set" - As long as the difference between the original set and the limit set is a set
I'd just like to say that all binary numbers of the form:
0.a_1a_2...a_aleph0
where each a_i is in {0, 1}
in fact cover the closed interval [0, 1] and not just [0, 1).
This is because 0.1` = 1 ...where the ` indicates a recurring digit.