Databases

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • mount -o async - is it safe?

    7 answers - 1718 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Hi,
    We've recently set up our database (7.4.9) with our new hosting provider.
    We have two database servers running RHEL 4 in a cluster; one active and
    one hot-spare. They share a [fibre-channel connected] SAN partition; the
    active server has it mounted.
    Now my question is this; the provider has, by default, mounted it with -o
    sync; so all reads/writes are synchronous. This doesn't result in the
    greatest of performance, and indeed remounting -o async is significantly
    faster.
    They tell me this is so mySQL databases don't get corrupted in the event of
    a crash. which is fine
    But as Postgres uses fsync() to force committed transactions to disk, then
    this shouldn't be necessary, right?
    (I know this is based on the assumption the SAN doesn't lie about its syncs,
    but then surely it would lie to the kernel with -o sync anyway?)
    If we turn sync off, surely PostgreSQL keeps the data consistent, ext3
    journalling keeps the filesystem clean [assuming other mount options left at
    defaults], and then everything should be ok with either a server crash, power
    failure, storage failure, whatever. right?
    I've googled and come up with some info; the most relevant of
    which is here:
    If anyone can confirm either way that'd be great - or even just point me in
    the direction of enough firm info to work it out myself ;)
    Thanks,
    Shane
    (end of broadcast)
    TIP 1: if posting/reading through Usenet, please send an appropriate
    subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
    message can get through to the mailing list cleanly
  • No.1 | | 1074 bytes | |

    Thu, Jan 19, 2006 at 09:42:59AM +0000, Shane Wright wrote:
    Now my question is this; the provider has, by default, mounted it with -o
    sync; so all reads/writes are synchronous. This doesn't result in the
    greatest of performance, and indeed remounting -o async is significantly
    faster.

    They tell me this is so mySQL databases don't get corrupted in the event of
    a crash. which is fine

    But as Postgres uses fsync() to force committed transactions to disk, then
    this shouldn't be necessary, right?

    That depends. As long as the data is appropriately sync()ed when
    PostgreSQL asks, it should be fine. However, from reading the manpage
    it's not clear if fsync() still works when mounted -o async.

    If -o async means "all I/ is asyncronous except stuff explicitly
    fsync()ed" you're fine.

    The usual advice is to stick the WAL on a properly synced partition and
    stick the rest somewhere else. Note, I have no experience with this,
    it's just what I've heard.

    Have a nice day,
  • No.2 | | 1131 bytes | |

    Martijn van <kleptog (AT) svana (DOT) orgwrites:

    That depends. As long as the data is appropriately sync()ed when
    PostgreSQL asks, it should be fine. However, from reading the manpage
    it's not clear if fsync() still works when mounted -o async.

    If -o async means "all I/ is asyncronous except stuff explicitly
    fsync()ed" you're fine.

    That's the way it works. Async is the default setting for most
    filesystems, but fsync() is always honored, at last as far as
    non-lying hardware will allow. :)

    The usual advice is to stick the WAL on a properly synced partition and
    stick the rest somewhere else. Note, I have no experience with this,
    it's just what I've heard.

    This might not be optimal, as having every write synchronous actually
    results in more synced writes than are strictly necessary.
    -Doug

    (end of broadcast)
    TIP 1: if posting/reading through Usenet, please send an appropriate
    subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
    message can get through to the mailing list cleanly
  • No.3 | | 1314 bytes | |

    Hi,

    thanks :)

    If -o async means "all I/ is asyncronous except stuff explicitly
    fsync()ed" you're fine.

    That's the way it works. Async is the default setting for most
    filesystems, but fsync() is always honored, at last as far as
    non-lying hardware will allow. :)

    That sounds good :)

    ext's journalling should take care of the rest I guess - does that sound ok?
    I have read in various places I think that pgSQL doesn't need any
    directory-level operations in keeping WAL up to date so provided the ext3
    partition remains mountable then the database should be fine,

    The usual advice is to stick the WAL on a properly synced partition and
    stick the rest somewhere else. Note, I have no experience with this,
    it's just what I've heard.

    This might not be optimal, as having every write synchronous actually
    results in more synced writes than are strictly necessary.

    Actually I thought that *all* the database had to have fsync() work correctly;
    not for integrity on failed transactions, but to maintain integrity during
    checkpointing as well. But I could well be wrong!

    thanks,

    Shane

    (end of broadcast)
    TIP 5: don't forget to increase your free space map settings
  • No.4 | | 855 bytes | |

    Shane Wright <shane.wright (AT) edigitalresearch (DOT) comwrites:

    Actually I thought that *all* the database had to have fsync() work correctly;
    not for integrity on failed transactions, but to maintain integrity during
    checkpointing as well. But I could well be wrong!

    I think you're write, but what I was thinking of is the scenario where
    WAL writes are done in small increments, then committed with fsync()
    once a full page has been written. With a sync mount this would
    result in the equivalent of fsync() for every small write, which would
    hurt a lot.

    I dimly recall this sort of thing being discussed in the past, but I
    don't know offhand whether PG does its WAL writes in small chunks or
    page-at-a-time.
    -Doug

    (end of broadcast)
    TIP 6: explain analyze is your friend
  • No.5 | | 1050 bytes | |

    Thu, Jan 19, 2006 at 09:34:00AM -0500, Doug McNaught wrote:
    Shane Wright <shane.wright (AT) edigitalresearch (DOT) comwrites:

    Actually I thought that *all* the database had to have fsync() work correctly;
    not for integrity on failed transactions, but to maintain integrity during
    checkpointing as well. But I could well be wrong!

    You're correct; if the S or drives lie about fsync'ing the base tables
    during a checkpoint you can end up with a corrupted database. The only
    'upside' here is that checkpoints don't happen as often, so the risk is
    slightly less, but it's still there.

    And all the debate about filesystem options is pointless unless they
    have also turned off any unsafe write caching by the drives.

    I dimly recall this sort of thing being discussed in the past, but I
    don't know offhand whether PG does its WAL writes in small chunks or
    page-at-a-time.

    It's done in pages, but remember that every commit requires an fsync of
    WAL.
  • No.6 | | 1253 bytes | |

    Shane Wright <shane.wright (AT) edigitalresearch (DOT) comwrites:
    If we turn sync off, surely PostgreSQL keeps the data consistent, ext3
    journalling keeps the filesystem clean [assuming other mount options left at
    defaults], and then everything should be ok with either a server crash, power
    failure, storage failure, whatever. right?

    I checked around with some of Red Hat's kernel folk, and the bottom line
    seems to be that it's K as long as you trust the hardware:

    :Question is, can fsync(2) be trusted to behave properly, ie, not return
    :until all writes are down to disk, if the SAN is mounted -o async ?
    :
    : async is the default, which is the whole point of having things like
    : fsync, fdatasync, DIRECT, etc. You can trust fsync as far as you can
    : trust the hardware. The call will not return until the SAN says the
    : data has been written.
    :
    : In reality, the SAN is probably buffering these writes (possibly into
    : SRAM or battery-backed RAM), and the disks are probably buffering them
    : again, but you've got redundant power supplies and UPSs, right?

    regards, tom lane

    (end of broadcast)
    TIP 3: Have you checked our extensive FAQ?

  • No.7 | | 1571 bytes | |

    Hi Tom,

    If we turn sync off, surely PostgreSQL keeps the data consistent, ext3
    journalling keeps the filesystem clean [assuming other mount options
    left at defaults], and then everything should be ok with either a server
    crash, power failure, storage failure, whatever. right?

    I checked around with some of Red Hat's kernel folk, and the bottom line
    seems to be that it's K as long as you trust the hardware:

    fabulous, thanks :)

    :Question is, can fsync(2) be trusted to behave properly, ie, not return
    :until all writes are down to disk, if the SAN is mounted -o async ?
    :
    : async is the default, which is the whole point of having things like
    : fsync, fdatasync, DIRECT, etc. You can trust fsync as far as you can
    : trust the hardware. The call will not return until the SAN says the
    : data has been written.
    :
    : In reality, the SAN is probably buffering these writes (possibly into
    : SRAM or battery-backed RAM), and the disks are probably buffering them
    : again, but you've got redundant power supplies and UPSs, right?

    that sounds true (and it has) - but presumably this is the case whether we
    mount -o sync or not? I.e. if its going to buffer, then its going to do so
    whether its postgres or the kernel sync'ing the writes?

    (specifically that the SAN likely buffers anyway - IM having to trust the
    hardware to some degree is a given ;)

    Cheers

    Shane

    (end of broadcast)
    TIP 3: Have you checked our extensive FAQ?

Re: mount -o async - is it safe?


max 4000 letters.
Your nickname that display:
In order to stop the spam: 7 + 6 =
QUESTION ON "Databases"

EMSDN.COM