We mathematician types like solving polynomial equations. The simplest such equations are the linear ones, meaning that the variable appears to the exponent one. They have the general form:


If you remember anything at all from your basic algebra classes, then you know that this is readily solved by bringing the b to the other side and dividing by a. We obtain


Of course, we are assuming here that a is not zero, but let’s not be overly pedantic. We can think of this as “the linear formula,” since it can be used to solve any linear equation we might confront. That seems a bit grandiose for something so simple, but what the heck.

As Stephen King once wrote, once you’ve done Frankenstein, what’s left to do but Bride of Frankenstein? And once you’ve done linear equations, you might as well move on to the quadratic case. That is, to equations where the exponent appears to the exponent two. They have the general form:


Now, in your elementary algebra classes you learn that factoring is the preferred way of solving such things. But at some point we all must grow up and realize that factoring, so sorry, hardly ever works. So we move onto more sophisticated methods like “completing the square.” It goes like this. First we divide through by a, which ensures that the coefficient of the squared term will be one. That’s very convenient. Then we move the constant term to the other side to obtain:


Now for the seemingly arbitrary step that has confused generations of middle school students. By adding the same thing to both sides we obtain this:


This was a clever thing to do, since that expression on the left-hand side is now a perfect square. Specifically, we have:

\left( x+\frac{b}{2a} \right)^2 = -\frac{c}{a}+\frac{b^2}{4a^2}


x+\frac{b}{2a}=\pm \sqrt{ -\frac{c}{a}+\frac{b^2}{4a^2} }

With a bit of elementary symbol manipulation this becomes:

x=\frac{-b \pm \sqrt{b^2-4ac}}{2a},

which is the famous quadratic formula. It’s a very useful little gadget, since it shows you how to express the solutions of a quadratic equation as a simple function of its coefficients.

That was considerably more complicated than the linear case, but it really is nothing extraordinary. If you have some facility for mathematics the idea of completing the square is pretty natural. It’s the kind of thing a mathematically talented high school student might come up with. But when we kick things up to the cubic case, which looks like this:


things get considerably more complicated. There is no obvious way forward. There are, however, nonobvious ways forward, and, as we shall see, there is a cubic formula to go along with our linear and quadratic formulas. Since the full derivation has quite a few steps, we shall only do the first one this week. Notice, incidentally, that I have simply assumed that the coefficient of the cubic term is one. There is no loss of generality in making this assumption. If the leading coefficient is not one, then you can simply divide through by that coefficient to obtain an polynomial whose leading term is, indeed, one.

The first step is to make the following change of variables:


That transforms our equation into this:

\left(y-\frac{a}{3} \right)^3+a \left(y-\frac{a}{3} \right)^2+ b \left( y-\frac{a}{3} \right) + c =0

It’s a bit tedious, but if you would care to multiply that out and group together the like terms, you will notice that a miracle happens. The squared term disappears! In other words, you get an equation that looks like this:


where p and q are elaborate functions of the coefficients a, b and c. This sort of cubic equation, lacking a square term, is said to be a “reduced cubic.” If we could devise some general procedure for solving it, then we would be able to work our way backwards to solutions to the general cubic.

But does eliminating the square term really help us all that much? Indeed it does, as we shall explain in a future post.


  1. #1 Stephen Lucas
    June 20, 2011

    Interestingly enough, the classic quadratic formula that Jason has rederived for us is not as old as one might expect. It appears that the first publication of the formula was as late as 1896 — Henry Heaton, A Method of Solving Quadratic Equations, Amer. Math. Monthly 3(10) (1896) 236–237. Descarte’s The Geometry of 1637 had the equivalent of solving x^2-ax+b^2=0. Mathematicians since ancient times were able to solve various cases of the quadratic, but were hampered by a disbelief in negative numbers, let alone complex ones. Even relatively recently (textbooks in the 1800s) equations like x^2+ax=b and x^2=ax+b were considered different, with different solution techniques, to ensure that a and b were always positive.

  2. #2 Dave Luckett
    June 20, 2011

    Jason, I mean no offence, but this is exactly what used to happen to me at school. The maths teacher would say something in English, and then he’d say something like “the variable appears to the exponent one”.

    That is, he would stop talking English, switch to gibberish, and you’d be expected to nod and understand. Well, I didn’t understand. I still don’t understand.

    The variable appears to the exponent one freaking what f’chrissakes? Give me a hint, here. One guinea pig? One sarsparilla? One Mohave Desert? One what?

  3. #3 Jason Rosenhouse
    June 20, 2011

    Dave –

    Sorry! Think of it this way. Suppose I say something lke, “He is more than six feet tall.” If i asked you whether that sentence is true or false you would have to reply that you need to know who “He” is. Some people are more than six feet tall and some are not. We can think of the word “He” as being a variable in that sentence. As soon as you know what it represents you can decide if the sentence is true or false.

    An equation is very much like that. It contains a variable, represented by x in this post (or y at the very end). If I write an equation like ax+b=0, then you can say that sentence is true for some choices for x and not true for others. “Solving the equation” then means that you find all of the values of x that make the sentence true. (And proving that you really have found them all, of course.)

    Now, the little superscript above and to the right of the variable is called the exponent. It indicates that the number represented by the variable is to be multiplied by itself some number of times. If the exponent is 2 then we square the number, that is, multiply it by itself. If the exponent is 3 then we would cube the number, that is, we would multiply it by itself three times (as in x times x times x). If there is no superscript then we say simply the exponent is 1.

    Finally, when we talk about the exponent to which the variable appears, we are usually referring to the largest exponent that appears. In the equation x2-2x+1=0, the variable x appears to the exponent 2 in the first term and to the exponent 1 in the second term. Since the highest exponent is 2 we would classify this as a quadratic equation. Roughly speaking, the larger the highest exponent, the harder the equation is to solve.

    Does that help?

  4. #4 Dave Luckett
    June 21, 2011

    Sorry. In my agitation, I have not made it clear what I do not understand. I understand what the words “variable” and “exponent” mean. I still do not understand what the words “the variable appears to the exponent one” mean, when assembled in that order. Do they mean “the exponent on this variable is one”? That, I can understand. However I have the horrible feeling that it is not what those words mean.

    I’m sorry. Reading that sentence (was it a sentence?) damn near sent me into fugue about some of the worst times of my life. I spent my days in maths class desperately trying to figure out what the guy meant when he reassembled words with whose meanings I was perfectly familiar into sentences that made no sense to me at all, because the grammar – the rules by which those sentences were constructed – was completely alien.

    I realised later that part of my problem was that mathematicians treated totally insane ideas as if they were real – adjectives, for example, became nouns, and perfectly sensible concepts like the fixed ratios between the sides of right triangles were applied to figures that weren’t right triangles, weren’t triangles at all, and were in fact impossible. For that matter, purely imaginary things like negative numbers had even more purely imaginary square roots, and you were expected to manipulate these with a straight face, knowing for sure that this couldn’t be right.

    I gave up and read fiction. At least both you and the author knew that it was all make-believe. Mathematics teachers would look you in the eye and tell you that this was all completely factual, and you’d have to think they actually believed it. The only thing to do was nod, smile, and back away slowly, while watching out in case they made sudden moves with their blackboard compasses.

  5. #5 Pseudonym
    June 21, 2011

    I don’t want to preempt Jason’s derivation of the cubic equation, but the method that I use (on the rare occasions when I need it) is much easier to remember than Cardano’s method: take the Fourier transform of the roots.

    Let w be a principal root of unity, that is, a number such that:

    w^3 = 1
    w^2 + w + 1 = 0

    Let x1, x2 and x3 be the roots of a cubic equation. Then let r1, r2 and r3 be numbers such that:

    x1 = r1 + r2 + x3
    x2 = r1 + w r2 + w^2 r3
    x3 = r1 + w^2 r2 + w r3


    (x – x1)(x – x2)(x – x3)
    = (x – r1 – x2 – r3)(x – r1 – w r2 – w^2 r3)(x – r1 – w^2 r2 – w r3)

    Expanding this is left as an exercise.

    Exercise: Why won’t this method work for quartics?

  6. #6 Wonderist
    June 21, 2011

    Jason, Dave:
    Having done a bit of tutoring recently, I think Dave’s question is a bit more about ‘not skipping steps’ than it is about understanding what variables are, and basics like that. (It may be ‘all basic’ to you, Jason, but there are different degrees of basic ;-) )
    The confusion appears to be when you say that “ax + b = 0″, and then say that ‘the variable appears to the exponent one’, but there’s no actual ’1′ for Dave to connect this to.
    Even your explanation that “If there is no superscript then we say simply the exponent is 1,” would leave many people confused as to, “Why?!” After all, if there’s no superscript, why don’t you just say there’s no exponent either?
    If I were covering this in a tutoring session, I’d explain it like this:
    When we say that ‘the variable appears to the exponent one’ in “ax + b = 0″, we’re actually just leaving out a tiny step, which is to write:
    ax1 + b = 0
    Here, we’ve explicitly written the exponent on ‘x’ as ’1′. We can get away with leaving off the exponent in “ax + b = 0″ because we can show that any number raised to the exponent of 1 is just the number itself:
    x1 = x
    For example 5 to the 1 is just 5, 11 to the 1 is just 11, and forty billion to the 1 is just forty billion. So, instead of always writing x to the 1, we can just write x. So, while there’s no actual visible ’1′ next to the ‘x’ in “ax + b = 0″, we can imagine an invisible ’1′ as a superscript on ‘x’, and say that the variable x ‘appears’ to the exponent 1, even though, technically, it doesn’t actually appear that way. It’s just a short-cut to leave out that step of writing the exponent as ’1′.

    As for why x1 = x in the first place, there’s an easy way to remember this and make sure you get the right answer every time.
    First let’s look at the pattern of multiplication. Take 7 times 6. We can think of this as ‘adding 7, six times’:
    (there are six 7s)
    7 + 7 + 7 + 7 + 7 + 7 = 42
    7 x 6 = 42
    So, 7 times 1 is:
    (only one 7)
    7 = 7
    7 x 1 = 7
    Makes sense, right? But a problem arises when we ask, what is 7 times 0? We ‘know’ that the answer is supposed to be zero, but how does it fit into the pattern of ‘adding 7, zero times’?
    ??? = 0
    We can’t just write
    (emptyness) = 0
    How do you ‘write’ zero 7s?
    The solution to this conundrum is to realize that we’re not really ‘adding 7, six times’ when we write 7 x 6 as
    7 + 7 + 7 + 7 + 7 + 7 = 42
    Don’t count the 7s, count the +s:
    7 + 7 + 7 + 7 + 7 + 7 = 42
    There’s only five of them!
    And when we write 7 x 1 as
    (Note, no + symobls)
    7 = 7
    We’re not even adding anything at all, we’re just writing 7 = 7. No addition is involved in that equation.
    So, how can multiplication be properly described as ‘repeated addition’?
    The answer is to realize that ‘to add’ something means to add it to something else. And, for addition, the starting point that we add stuff to is the number ’0′.
    So, 7 times 6 really means ‘from zero, add 7, six times’.
    (six +s)
    0 + 7 + 7 + 7 + 7 + 7 + 7 = 42
    7 x 6 = 42
    And, 7 times 1 means ‘from zero, add 7, one time’.
    (one +)
    0 + 7 = 7
    7 x 1 = 7
    And, 7 times 0 means ‘from zero, add 7, zero times’, or, in other words, ‘from 0, don’t add 7 at all!’
    (zero +s)
    0 = 0
    Okay, now the pattern actually works, even for the most basic (but tricky) cases. Now let’s look at exponents.
    Most people recognize that 22 = 4 and 32 = 9 and 52 = 25
    Why is that? It’s because ‘squaring’, or ‘raising to the exponent 2′ means to ‘multiply something by itself’, just like a square which has equal sides, and you multiply the side times the side to get the square’s area.
    So 52 is just 5 x 5, which is just 25.
    Increasing the exponent by 1 just means to multiply by the number one more time, so 53 = 5 x 5 x 5, which happens to equal 125.
    The analogy to multiplication becomes more obvious now. At first glance, raising numbers to exponents means ‘multiply the number by itself, n times’ where n is the exponent. So 56 would mean ‘multiply 5 by itself, 6 times’. We would write it like this:
    5 x 5 x 5 x 5 x 5 x 5 = 15625
    But again, what is 51, and 50?
    51 seems to be just 5 = 5
    Okay, seems fine. But still, what’s 50?
    Whereas with multiplication, it seemed obvious that 7 x 0 equals 0, with exponents, it’s not even obvious what it means to ‘mulitply a number by itself zero times!’ WTF?! Is that just the number itself? Is it 0? or what?
    The solution is a very similar solution to the multiplication example from earlier.
    With repeated multiplication, you also need a starting point. When we did repeated addition, the starting point was 0, which made sense. It’s a little less intuitive with repeated multiplication, but just accept for now that the starting point of multiplication is actually the number ’1′.
    So, in analogy with multiplication as repeated addition starting from 0, exponents are repeated multiplication, starting from 1.
    So, 56 means ‘From 1, multiply by 5, six times’.
    1 x 5 x 5 x 5 x 5 x 5 x 5 = 15625
    Now 52 means ‘From 1, multiply by 5, two times’.
    1 x 5 x 5 = 25
    And 51 means ‘From 1, multiply by 5, one time’
    1 x 5 = 5
    And 50 means ‘From 1, multiply by 5, zero times’
    1 = 1
    Or, in other words, ‘From 1, don’t even multiply by 5 at all!’
    Now the pattern works for all natural exponents*, and even better, it makes sense of both multiplication and exponentiation as simply repeated applications of simpler math operations. Very nice. Once you get this pattern, it will be easy to see why “ax + b” has the variable ‘x’ to the exponent ’1′, because:
    x1 = 1x = x
    And so:
    ax1 + b = ax + b

    (* The special case of 00 is technically undefined, but there are often good reasons for pretending that it actually also equals 1 just like other numbers.)
    (PS: Sorry if my usage of the symbol ‘x’ as both the variable ‘x’ and the multiplication sign ’7 x 6′ is confusing. I use it because the alternative symbol ‘*’ is a bit ugly and foreign to those used to an x-like symbol.)

  7. #7 James Sweet
    June 21, 2011

    FWIW, I have never heard the expression “the variable appears to the exponent [whatever]” — maybe that is a somewhat older way of saying things, or a more number theorist way of saying things? I got fairly far into calculus in college, and I just had never heard it expressed that way. I’m more used to something like “polynomial of order [whatever]“.

  8. #8 Wonderist
    June 21, 2011

    To be honest, I think that’s just how the words passed through Jason’s fingertips; I don’t think it’s a particular math-speak idiom. My explanation of it was explaining those words as if I had said them to a student myself, and then the student had responded with Dave’s question.
    I intentionally used different ways of saying ‘raised to the exponent’ in different English words to try to convey that they all really mean the same thing. The math symbols are identical, of course. It’s just that English isn’t so strict like math.

  9. #9 Michael Kremer
    June 21, 2011

    “appears to the exponent” plugged into google producing a grand total of 15 results, or 26 with omitted results included. And all but two of those results appear to be quoting this blog post or referring back to it in one way or another (typically in the google search page you see “The simplest such equations are the linear ones, meaning that the variable appears to the exponent one. They have the general form”.) So I conclude this is a very uncommon phrase, at least in print.

    And I think it is just this phrase that was tripping up Dave, though I read right past it without noticing. I think the mathematically semi-literate (e.g. me, BA in Math and 7 graduate courses in math 25 years ago) would read past it. But it is a pretty strange bit of English, I think. Rather like: “these words (that I am writing) appear in English.” Or: “the dog appeared hungry.”

    Suppose that Jason had said instead: “meaning that the variable appears with the exponent one” or “meaning that the variable is raised to the exponent one” — from what Dave says above I think this would not have caused the same incomprehension.

  10. #10 Michael Kremer
    June 21, 2011

    Ack — I meant to say “”appears to the exponent” plugged into google produces a grand total of 15 results, or 26 with omitted results included.” Which shows how words slip through fingertips.

  11. #11 Michael Kremer
    June 21, 2011

    Hmmm — (sorry, Jason, I know you hate that) — on reflection my examples don’t seem so strange to me. I think the switch is this perhaps: “the dog appears hungry” is best read as “the dog appears to be (seems to be) hungry.” “These words appear in English” is best read as “these words show up on the screen in English.” (The sense of “appears” that is opposed to “disappears”.) Jason’s phrase “appears to the exponent one” looks like it means “shows up (on the screen) raised to the exponent one” or maybe “occurs raised to the exponent one.” But there may be some dissonance introduced by the other meaning of “appears” as meaning “seems.”

  12. #12 Jason Rosenhouse
    June 21, 2011

    Well, this just goes to show that you never know what people will respond to in one of your posts! Perhaps it would have been more normal to say, “the variable appears to the power one.” Typically, the largest exponent that appears is referred to as “the degree” of the polynomial. So linear polynomials would be degree one, quadratic polynomials would be degree two, and so on. I didn’t use the word “degree” because –I kid you not! — I was trying to avoid excessive jargon.

  13. #13 Lenoxus
    June 21, 2011

    xactly why can’t we say “The variable’s exponent is 1.”? Or “The variable is raised to the first power.”?

    Or does that just happen to be the language I grew up with, and it’s no more inherently clear than Jason’s?

  14. #14 Lenoxus
    June 21, 2011

    Not to beat this to death, but I’m surprised no one said the precise reason I think the phrasing is confusing: in non-math English, the sentence “It appears to the exponent one” seems to be telling us about something “the exponent one” is looking at or interpreting.

    Consider this complete sentence: “The moon appears to Sally.” All it seems to be telling us is that Sally didn’t see the moon and now she does. Perhaps some cloud cover moved away, or a wizard made the image of the moon “appear” in front of Sally’s eyes. It doesn’t seem to be telling us anything about the moon itself.

    Substitute the appropirate math terms, and the sentence doesn’t make immediate sense.

  15. #15 0db
    June 21, 2011

    xactly why can’t we say “The variable’s exponent is 1.”? Or “The variable is raised to the first power.”?

    Because, as Jason has written:
    “Finally, when we talk about the exponent[/power] to which the variable appears, we are usually referring to the largest exponent that appears [within the whole equation].”

  16. #16 Lenoxus
    June 21, 2011

    Ah, thanks much, 0db. I get it now. Forget my last posts (except for that mischievous wizard, he’s important).

  17. #17 xander
    June 21, 2011

    Dave Luckett, #4:

    I realised later that part of my problem was that mathematicians treated totally insane ideas as if they were real…

    I’m not sure that I would use exactly that phrasing. It depends upon what we mean by “real.” In a sense, nothing in mathematics can really be considered to be “real”—everything is an abstraction, and removed from the “real” world. Through history, mathematicians have tried to work with abstract objects that seem to reflect the “real” world pretty well, but these are still abstractions and approximations.

    To give my favorite example, consider parallel lines. In the Elements, Euclid declares that if you have a line and a point that is not on the line, then you can construct exactly one parallel line through that point. That is, given a line and a point, there is a unique parallel.

    This postulate or axiom (fancy mathematical jargon for “assumption”) is the longest and most awkwardly phrased of all of those given by Euclid, and for a couple of thousand years, lots of geometers tried to show that it did not need to be declared as an axiom, but could be derived from the others.

    Then, in the late 19th century, several mathematicians showed that you could make slightly different assumptions and come up with perfectly reasonable systems of geometry. For instance, what if parallels are not unique? That is, given a line and a point not on the line, suppose that there are an infinite number of parallel lines. Such systems of geometry exist, and are called hyperbolic geometries.

    On the other hand, we could also make the assumption that there are no parallel lines. As an example, suppose that we are on the surface of a sphere (like, roughly speaking, the Earth). Lines are great circles. Given a line (great circle) and a point not on the line, there is no parallel line—any great circle drawn through the given point will intersect the original line (and not just once, but twice!).

    There are at least three entirely self-consistent versions of geometry, all of which can provide useful results with “real” world applications. Which set of assumptions should we treat as real, and which as insane?

    The trick to mathematics is to get over the idea that any of it is “real,” and accept the fact that it is all made up. It is all abstractions and fantasy. The number 1 no more real or imaginary than -1 or i.

  18. #18 killinchy
    June 22, 2011

    When I taught Chemistry, I found a way to solve higher order equations that I shall now describe:

    I would phone Nick in the Math Dept, and ask him.

  19. #19 Lenoxus
    June 22, 2011


    Given a line (great circle) and a point not on the line, there is no parallel line—any great circle drawn through the given point will intersect the original line (and not just once, but twice!).

    Really? Aren’t, say, lines of latitude on a globe examples of parallel great circles?

  20. #20 Jason Rosenhouse
    June 22, 2011

    Lenoxus –

    A great circle on a sphere is a circle whose center is also the center of the sphere. That means that the only latitude line that is also a great circle is the equator. Any latitude line north or south of the equator would not be a great circle.

  21. #21 Lenoxus
    June 23, 2011

    Ah, figured it would be something like that, thanks.

  22. #22 Oneiric
    June 25, 2011

    Know I’m coming into this discussion much later, but for those confused, I’d like to try something to help you translate “the variable appears to the exponent 2″ more intuitively:

    Supposing you have an equation:
    3x2 + 5x + 10

    Now, if you look at it as an infinite series of x raised to different powers, you could also write it as:
    …0x999 + 0x998 + … + 0x3 + 3x2 + 5x1 + 10

    In that case, when you look at the first representation of the equation, you can see that in this imaginary series, the variable x only appears until the power 2 (because all the rest are 0s, and we ignore them).

    Does that help? I have no clue whether that’s the reason the phrasing is used, I just find it easier to impute its meaning that way.

  23. #23 Tom
    June 28, 2011

    Here it is next Tuesday, and no part 2 to be seen! Maybe next week?