ext3 and ext4 dispute

If you don’t get the title, skip this post. Discussion below the fold.

There is a discussion going on regarding ext3 and ext4, the Linux file systems, in relation to the nature of the journaling process.

The ext3 file system was discovered to have a significant delay when writing to the disk to th cache. I believe this is writing the journaling information specifically. This means that journaling data, which is supposed to be there to fix everything when data is lost during a crash of some kind, would no be accurate. Normally, journaling is timed like a transaction. So, to simplify, this would happen (troll feel free to correct or modify this description):

Journal: I’m about to do X.

In the data part of the disk: X is carried out

Journal: I just checked, and it appears that I did X.

Thus, using the journaling information, not only can you tell what the hard disk using parts of the system did, but also what was intended, and what did nor did not work. This means that when a system comes back up from a crash, it is a trivial matter to make things right.

However, if the disk data writing/reading is not synced with the journaling, then the journaling system is not only useless but maybe worse than useless. And there were indications that this was happening with the ext3 system.

And by the way, it does not matter if the delay is with data or journaling. Neither is good.

It turns out that ext4 is worse. The cache writing delay with ext3 is seconds, but with ext4 it may be minutes.

But of course, it is all a bit more complex than that even. An update of the discussion and some interesting commentary can be found here on Linux Mag web site. Enjoy.

Comments

  1. #1 Colin
    March 31, 2009

    Last I heard there are also write barrier issues with using LVM (LVM flat out ignores them) which means any RAID with LVM is less safe than no RAID. Ironic.

  2. #2 Charlie
    March 31, 2009

    Greg, talk to Ted T’so, and you’ll get your misunderstandings about the issue straightened out.

    For me, the big thing is that when Ted implemented in ext4 what mainstream computer systems vendors and academics have been saying (for many years) is the “right thing”, and Linus Torvalds in response declared the mainstream viewpoint “moronic”. Yay Linus! He’s right.

    Ext3 has some emergent behaviour that many people have decided is desireable and necessary. Some of these people foolishly believe that this behaviour is a normal part of filesystem design. They are wrong. Ted explains thoroughly in his blog. Yay Ted! He’s right too.

    http://tytso.livejournal.com/#tytso61989

  3. #3 Nathan Myers
    March 31, 2009

    There was an extraordinary amount of dishonesty and “quick! look over there”-ism in this case. File system people have traditionally made a point of not caring one whit what happens to users’ recent file changes in the case of a crash, as long as the disk’s internal data structures remained consistent, and its old files accessible.

    In this case, as usual, they pointed to a POSIX standard to suggest that running any program that doesn’t perform an “fsync” frequently makes it the user’s fault that their data disappeared. Doing frequent fsyncs, however, means a laptop can hardly ever turn off its disk drive, running down the battery for no good reason, and it slows file operations to a crawl. Anyway, shell scripts don’t get the opportunity to do fsyncs.

    They usually justify this by saying that doing what is necessary to ensure that user data isn’t discarded unnecessarily slows things down when there is no system crash. They’re correct, but (1) the speed difference is usually negligible (unlike frequent fsyncs), and (2) we ask to have things on disk because we want to keep them. Discarding everything immediately would be faster yet, but then file system implementers wouldn’t have any work to do.

    In this case, the file system was writing out changes indicating an old file had been replaced before before the new contents were written out, so people were finding empty files in place of their old (but better than nothing) files. The problem is, apparently, fixed now. Ted, last I heard, insists he was right to do the wrong thing, but has since been persuaded to do the right thing anyway.

  4. #4 dreikin
    March 31, 2009

    Difficult to have an ACID-compliant db when you don’t have an ACID-compliant fs.

  5. #5 bNewEnglandBob
    March 31, 2009

    I read Ted T’so’s response pointed to by Charlie, above. This bit of reasoning has me laughing:

    This sounds like a good thing, right? It is, except for badly written applications that don’t use fsync() or fdatasync(). Application writers had gotten lazy, because ext3 by default has a commit interval of 5 seconds, and and uses a …

    Those damn application writers! The nerve of them to assume that the file system might actually store their data in reasonable time! What will they expect next? A database that stores data? An operating system that run the computer?
    /sarcasm

  6. #6 Greg Laden
    March 31, 2009

    This opens up a whole nuther can of worms.

  7. #7 Mohammed Berdai
    March 31, 2009

    Yeah, this is an annoying problem. This is why some still prefer Ext2.

    Hopefully, ext4 will mimic ext3 behavior in the next kernel release; or maybe Btrfs will be an good alternative in the future.

    But still, it’s annoying :S

  8. #8 Dan J
    March 31, 2009

    Well, I’ll be sticking with ext3 on my systems here at home. I don’t yet have a need for the file or volume size that would require ext4, and the machines I have are single-user desktops. I haven’t had any irreparable catastrophes yet, and when one happens, I may change my mind on the issue (but may opt for XFS or Reiser).

  9. #9 Greg Laden
    March 31, 2009

    I’m not sure what to do and I’m about to do it. (Probably this weekend.) My main HD is indicating failure imminent, so I’ve got to scrape all the date off this one and spread it on a new one.

  10. #10 Stacy
    March 31, 2009

    I am a rebel. Haha! I read it anyway!!

  11. #11 Lassi Hippeläinen
    April 1, 2009

    Ext2 rulez. All this journaling stuff is for luzers. My mighty Acer Aspire One uses a flash disk that has no need for it.

  12. #12 Nathan Myers
    April 1, 2009

    Dan: XFS and Reiser have always had the problem people complained about when it showed up in ext4. Ext3 performs well as long as you don’t run programs that call fsync as often as Ted says programs are supposed to do.

  13. #13 Dan J
    April 1, 2009

    Thanks for the tip, Nathan. I think I’ll be in decent shape with ext3 for the time being.

  14. #14 Ray Ingles
    April 1, 2009

    bNewEnglandBob – I actually do write applications, and I try to make them as portable as possible… and I’ve got to side with Ted. All kinds of application developers don’t actually use APIs correctly. They get something that kinda works, most of the time, and assume that it’s all good.

    Microsoft is famous for maintaining backward compatibility, even when applications do patently wrong things. (See here, where Windows actually checked to see if SimCity was running, and if, so, allowed it to access memory after freeing it!) I do not want Linux to go that route.

    App developers have been relying on behavior that’s not guaranteed. As Linux Weekly News puts it, “POSIX forms a sort of contract between user space and the kernel. When the kernel fails to provide POSIX-specified behavior, application developers are the first to complain. So perhaps they should not object when the kernel insists that they, too, live up to their end of the bargain.”

  15. #15 Ann
    April 4, 2009

    I recently came across your blog and have been reading along. I thought I would leave my first comment. I don’t know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.

    Ann

    http://largepet.info

Current ye@r *