Databases

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Major Performance decrease after some hours

    20 answers - 2137 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Hi all,
    i have a Tomcat application with PostgreSQL 8.1.4 running which
    performs about 10000 inserts/deletes every 2-4 minutes and updates on
    a database and after some hours of loadtesting the top output says
    0.0% idle, 6-7% system load, load average 32, 31, 28 and there are
    many exceptions at statement execution like:
    An I/ error occured while sending to the backend.
    There are 10-15 postmaster processes running which use all the CPU power.
    A restart of tomcat and then postgresql results in the same situation.
    Some postgres processes are in DELETE waiting or SELECT waiting.
    VACUUM runs through in just about 1-2 seconds and is run via cron
    every minute and reports that the fsm setting are high enough.
    The situation just gets better if tomcat is stopped and postgresql is
    restarted -90%idle
    The tomcat application talks to another application on the server
    which uses much CPU cycles at startup of tomcat and my observation is
    that after this startup with 0.0% idle, PostgreSQL can't recover from
    this situation.
    Dual Xeon Server
    machines (Dell PowerEdge 2850) using Heartbeat1. Each server has only
    one harddisk, no RAID configuration is used.
    - Each Cluster has 4 drbd Devices, one for the PostgreSQL data
    - Two of these clusters are using the same PostgreSQL installation to
    share the data, the database can be moved from one cluster to the
    other in case of failure
    - S: Debian Sarge with postgresql 8.1.4
    - Two cronjobs are configured to perform a "vacuumdb "
    every 1 minute and a "vacuumdb " every 23
    minutes
    - There are 3 applications using the PostgreSQL installation, each
    with their own database.
    - The main application is based on Tomcat 4.1-30 partly a
    Web-Application, partly a terminal login protocol based on http and
    XML, so the database access is of course done using JDBC
    - A cronjob is configured to perform a pg_dump of the main database
    every 4 hours
    Any ideas are welcome.
    thx,
    Peter
    (end of broadcast)
    TIP 6: explain analyze is your friend
  • No.1 | | 2563 bytes | |

    2006/10/1, Peter Bauer <peter.m.bauer (AT) gmail (DOT) com>:
    Hi all,

    i have a Tomcat application with PostgreSQL 8.1.4 running which
    performs about 10000 inserts/deletes every 2-4 minutes and updates on
    a database and after some hours of loadtesting the top output says
    0.0% idle, 6-7% system load, load average 32, 31, 28 and there are
    many exceptions at statement execution like:
    An I/ error occured while sending to the backend.

    There are 10-15 postmaster processes running which use all the CPU power.
    A restart of tomcat and then postgresql results in the same situation.
    Some postgres processes are in DELETE waiting or SELECT waiting.
    VACUUM runs through in just about 1-2 seconds and is run via cron
    every minute and reports that the fsm setting are high enough.

    The situation just gets better if tomcat is stopped and postgresql is
    restarted -90%idle

    The tomcat application talks to another application on the server
    which uses much CPU cycles at startup of tomcat and my observation is
    that after this startup with 0.0% idle, PostgreSQL can't recover from
    this situation.

    Dual Xeon Server
    machines (Dell PowerEdge 2850) using Heartbeat1. Each server has only
    one harddisk, no RAID configuration is used.
    - Each Cluster has 4 drbd Devices, one for the PostgreSQL data
    - Two of these clusters are using the same PostgreSQL installation to
    share the data, the database can be moved from one cluster to the
    other in case of failure
    - S: Debian Sarge with postgresql 8.1.4
    - Two cronjobs are configured to perform a "vacuumdb "
    every 1 minute and a "vacuumdb " every 23
    minutes
    - There are 3 applications using the PostgreSQL installation, each
    with their own database.
    - The main application is based on Tomcat 4.1-30 partly a
    Web-Application, partly a terminal login protocol based on http and
    XML, so the database access is of course done using JDBC
    - A cronjob is configured to perform a pg_dump of the main database
    every 4 hours
    --
    Any ideas are welcome.

    thx,
    Peter

    Hi again,

    i decreased the load of the test and the system is stable for some
    hours now, no Exceptions so far, top says 80-90% idle.
    The question is what are the reasons for this behaviour? Is a hardware
    upgrade required? How can i check how bad the condition of the
    database is?

    thx,
    Peter

    (end of broadcast)
    TIP 5: don't forget to increase your free space map settings
  • No.2 | | 1100 bytes | |

    Hi,

    a few random question

    i have a Tomcat application with PostgreSQL 8.1.4 running which
    performs about 10000 inserts/deletes every 2-4 minutes and updates on
    a database and after some hours of loadtesting the top output says
    0.0% idle, 6-7% system load, load average 32, 31, 28

    Who is responsible for this high load values?
    Do you see many postmaster processes at the top?

    and there are
    many exceptions at statement execution like:
    An I/ error occured while sending to the backend.

    Is this from the application?
    How many connections does the application open in parallel?
    Using JDBC, I guess?

    - Each Cluster has 4 drbd Devices, one for the PostgreSQL data
    - Two of these clusters are using the same PostgreSQL installation to
    share the data, the database can be moved from one cluster to the
    other in case of failure

    Just to be 100% sure: just one server at a time runs PostgreSQL on
    that shared data disk, right?

    Bye,
    Chris.

    (end of broadcast)
    TIP 6: explain analyze is your friend
  • No.3 | | 1037 bytes | |

    Peter Bauer wrote:
    []
    There are 10-15 postmaster processes running which use all the CPU power.
    A restart of tomcat and then postgresql results in the same situation.
    Some postgres processes are in DELETE waiting or SELECT waiting.
    VACUUM runs through in just about 1-2 seconds and is run via cron
    every minute and reports that the fsm setting are high enough.

    Pure speculation: are you sure you aren't vacuuming too agressively?
    The DELETE waiting and SELECT waiting sound to me like they are waiting
    for a lock that another vacuum is holding. You've said that you spawn a
    vacumm process by cron every minute, that may be fine when the server
    isn't too loaded. But when vacuums begins to take longer cron will spawn
    more and more vacuums killing your machine.

    I'll add a little check in the cron script to see if there isn't already
    a vacuum process running or use an idependant script,
    while [ 1 ]
    do
    vacuum
    sleep 60
    done

    []

    HTH,
  • No.4 | | 1682 bytes | |

    2006/10/1, Chris Mair <chrisnospam (AT) 1006 (DOT) org>:
    Hi,

    a few random question

    i have a Tomcat application with PostgreSQL 8.1.4 running which
    performs about 10000 inserts/deletes every 2-4 minutes and updates on
    a database and after some hours of loadtesting the top output says
    0.0% idle, 6-7% system load, load average 32, 31, 28

    Who is responsible for this high load values?
    Do you see many postmaster processes at the top?

    yes, there are about 10 postmaster processes in top which eat up all
    of the CPU cycles at equal parts.

    and there are
    many exceptions at statement execution like:
    An I/ error occured while sending to the backend.

    Is this from the application?

    Yes, this happens when the sql statements are executed.

    How many connections does the application open in parallel?

    30-40, one tomcat runs on the same machine as PostgreSQL, another runs
    on the other cluster and uses the database via network.

    Using JDBC, I guess?

    yes, postgresql-8.1-407.jdbc3 is used
    - Each Cluster has 4 drbd Devices, one for the PostgreSQL data
    - Two of these clusters are using the same PostgreSQL installation to
    share the data, the database can be moved from one cluster to the
    other in case of failure

    Just to be 100% sure: just one server at a time runs PostgreSQL on
    that shared data disk, right?

    server of a cluster runs the database and the tomcat, the other
    one just runs tomcat, so they share this common database.

    Bye,
    Chris.

    thx,
    Peter

    (end of broadcast)
    TIP 6: explain analyze is your friend
  • No.5 | | 817 bytes | |

    MaXX wrote:
    >There are 10-15 postmaster processes running which use all the CPU
    >power.
    >A restart of tomcat and then postgresql results in the same situation.
    >Some postgres processes are in DELETE waiting or SELECT waiting.
    >VACUUM runs through in just about 1-2 seconds and is run via cron
    >every minute and reports that the fsm setting are high enough.
    >

    Pure speculation: are you sure you aren't vacuuming too agressively?

    Have you tried autovacuum? You can manually tweak the settings so that
    your hot spot tables are vacuumed aggressivly. of the main upsides
    to autovacuum is tha tit saves cycles, that is, it only vacuums when you
    need it.

    (end of broadcast)
    TIP 6: explain analyze is your friend
  • No.6 | | 681 bytes | |

    "Peter Bauer" <peter.m.bauer (AT) gmail (DOT) comwrites:
    yes, there are about 10 postmaster processes in top which eat up all
    of the CPU cycles at equal parts.

    What are these processes doing exactly can you show us the queries
    they're executing? It might be worth attaching to a few of them with
    gdb to get stack traces, to see if there's any commonality about the
    traces:
    $ gdb /path/to/postgres PID-of-backend
    gdbbt
    gdbquit
    are you sure? y
    $ <repeat with other backends, and/or same backend after a few sec>

    regards, tom lane

    (end of broadcast)
    TIP 3: Have you checked our extensive FAQ?

  • No.7 | | 3261 bytes | |

    2006/10/1, MaXX <bs139412 (AT) skynet (DOT) be>:
    Peter Bauer wrote:
    2006/10/1, MaXX <bs139412 (AT) skynet (DOT) be>:
    >Peter Bauer wrote:
    >[]
    >There are 10-15 postmaster processes running which use all the CPU
    >power.
    >A restart of tomcat and then postgresql results in the same situation.
    >Some postgres processes are in DELETE waiting or SELECT waiting.
    >VACUUM runs through in just about 1-2 seconds and is run via cron
    >every minute and reports that the fsm setting are high enough.
    >>

    >Pure speculation: are you sure you aren't vacuuming too agressively?
    >

    The previous configuration was to vacuum all 10 minutes. This resulted
    in 60-70% system load (harddisk i suppose) so i thought more vacuums
    would be a good idea.
    It won't hurt to ensure that you will never have 2 vacuum processes
    running at the same time
    >The DELETE waiting and SELECT waiting sound to me like they are waiting
    >for a lock that another vacuum is holding. You've said that you spawn a
    >vacumm process by cron every minute, that may be fine when the server
    >isn't too loaded. But when vacuums begins to take longer cron will spawn
    >more and more vacuums killing your machine.

    When the postmaster processes eat up all CPU cycles the vacuum still
    does only take some seconds and there are no postgres VACUUM processes
    hanging around, so i don't think they hamper each other.
    I have the impression that the database "condition" is getting worse
    over time so the queries take longer and longer and at some point the
    I/ Exceptions start because the data cannot be written to the disk in
    time. I just don't know how to pinpoint this bad "condition" or
    whatever gets worse over time.

    you can connect to your server with pgAdmin, go to "tools -server
    status" to identify wich process is causing troubles on a sunday ;-)
    (you may have to turn on some logging option on postresql.conf) I find
    it more friendly than psql but it's a matter of taste.
    You are running pg on a Linux platform, try systat and iostat. (I'm
    running FreeBSD, I suppose those 2 are availlable on Linux)

    Long running transactions are not you friends too I once made a typo
    in a script which prevented commits from appenning not good

    This was the first thing we checked when these problems appeared. The
    only locking is done in functions which should not be a problem.
    course it is possible that there are bugs which prevent some
    transactions from being commited, but its a pretty big application so
    i would need some advice for finding and debugging such problems.
    Is it possible to check which transactions are currently running,
    which thread or java process runs them and for how long?
    I will attach a logfile to the reply to Toms mail which contains all
    sql statements executed from the start of the loadtest to the point
    the problems began, so maybe you can find something in it.

    thx,
    Peter

    (end of broadcast)
    TIP 6: explain analyze is your friend
  • No.8 | | 1324 bytes | |

    2006/10/1, Matthew T. 'Connor <matthew (AT) zeut (DOT) net>:
    MaXX wrote:
    >There are 10-15 postmaster processes running which use all the CPU
    >power.
    >A restart of tomcat and then postgresql results in the same situation.
    >Some postgres processes are in DELETE waiting or SELECT waiting.
    >VACUUM runs through in just about 1-2 seconds and is run via cron
    >every minute and reports that the fsm setting are high enough.
    >

    Pure speculation: are you sure you aren't vacuuming too agressively?

    Have you tried autovacuum? You can manually tweak the settings so that
    your hot spot tables are vacuumed aggressivly. of the main upsides
    to autovacuum is tha tit saves cycles, that is, it only vacuums when you
    need it.

    (end of broadcast)
    TIP 6: explain analyze is your friend

    Autovacuum is enabled and runs in addition to the configured cronjob
    vacuums. I will attach logfiles to the reply to Toms mail, so please
    have a look at it if autovacuum can be reconfigured to replace the
    cronjobs.

    thx,
    Peter

    (end of broadcast)
    TIP 9: In versions below 8.0, the planner will ignore your desire to
    choose an index scan if your joining column's datatypes do not
    match
  • No.9 | | 1170 bytes | |

    "Peter Bauer" <peter.m.bauer (AT) gmail (DOT) comwrites:
    Attached you can find the postgresql logfiles and a logfile which
    contains alls SQL statements executed in the relevant time together
    with the excpetions thrown. I also attached a file with all used
    Pl/pgSQL functions. Since we were not able to find a problem for so
    long, i think it makes no sense to filter the information because we
    are probably not looking on the right spots.

    I hope the pgsql-general moderators are not going to approve an 8MB
    message :-(. I'll hang onto my copy though, just in case it proves
    useful.

    >It might be worth attaching to a few of them with
    >gdb to get stack traces,


    Should this be done when the problems are appearing or during normal
    operation or both to compare the results?

    When the problems are appearing.

    regards, tom lane

    (end of broadcast)
    TIP 1: if posting/reading through Usenet, please send an appropriate
    subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
    message can get through to the mailing list cleanly
  • No.10 | | 1211 bytes | |

    2006/10/2, Tom Lane <tgl (AT) sss (DOT) pgh.pa.us>:
    "Peter Bauer" <peter.m.bauer (AT) gmail (DOT) comwrites:
    Attached you can find the postgresql logfiles and a logfile which
    contains alls SQL statements executed in the relevant time together
    with the excpetions thrown. I also attached a file with all used
    Pl/pgSQL functions. Since we were not able to find a problem for so
    long, i think it makes no sense to filter the information because we
    are probably not looking on the right spots.

    I hope the pgsql-general moderators are not going to approve an 8MB
    message :-(. I'll hang onto my copy though, just in case it proves
    useful.

    Sorry, i put the files on so
    please use them.

    >It might be worth attaching to a few of them with
    >gdb to get stack traces,
    >

    Should this be done when the problems are appearing or during normal
    operation or both to compare the results?

    When the problems are appearing.

    ok, i will try to reproduce it and post the stack traces

    thx,
    Peter

    (end of broadcast)
    TIP 5: don't forget to increase your free space map settings
  • No.11 | | 474 bytes | |

    Sun, 01, 2006 at 12:55:51PM +0200, MaXX wrote:

    Pure speculation: are you sure you aren't vacuuming too agressively?
    The DELETE waiting and SELECT waiting sound to me like they are waiting
    for a lock that another vacuum is holding.

    How would one determine the lock situation definitively? Is there
    an internal mechanism that can be queried?

    (end of broadcast)
    TIP 5: don't forget to increase your free space map settings
  • No.12 | | 304 bytes | |

    Ray Stell <stellr (AT) cns (DOT) vt.eduwrites:
    How would one determine the lock situation definitively? Is there
    an internal mechanism that can be queried?

    pg_locks view.

    regards, tom lane

    (end of broadcast)
    TIP 2: Don't 'kill -9' the postmaster
  • No.13 | | 969 bytes | |

    Hi all,

    inspired by the last posting "Weird disk write load caused by
    PostgreSQL?" i increased the work_mem from 1 to 7MB and did some
    loadtest with vacuum every 10 minutes. The system load (harddisk) went
    down and everything was very stable at 80% idle for nearly 24 hours!
    I am currently performing some pgbench runs to evaluate the hardware
    and configuration for the system but i think the biggest problems are
    solved so far.

    thx everybody,
    Peter

    2006/10/2, Tom Lane <tgl (AT) sss (DOT) pgh.pa.us>:
    Ray Stell <stellr (AT) cns (DOT) vt.eduwrites:
    How would one determine the lock situation definitively? Is there
    an internal mechanism that can be queried?

    pg_locks view.

    regards, tom lane

    (end of broadcast)
    TIP 2: Don't 'kill -9' the postmaster

    (end of broadcast)
    TIP 4: Have you searched our list archives?

    http://archives.postgresql.org
  • No.14 | | 1935 bytes | |

    It appears to me that work_mem is a more significant configuration
    option than previously assumed by many PostgreSQL users, myself
    included. As with many database optimizations, it's an obscure
    problem to diagnose because you generally only observe it through I/
    activity.

    possibility would be to log a warning whenever work_mem is
    exceeded (or exceeded by a certain ratio). I would also love a couple
    of new statistics counters tracking the amount of work memory used
    and the amount of work memory that has spilled over into pgsql_tmp.

    Alexander.

    5, 2006, at 10:48 , Peter Bauer wrote:

    Hi all,

    inspired by the last posting "Weird disk write load caused by
    PostgreSQL?" i increased the work_mem from 1 to 7MB and did some
    loadtest with vacuum every 10 minutes. The system load (harddisk) went
    down and everything was very stable at 80% idle for nearly 24 hours!
    I am currently performing some pgbench runs to evaluate the hardware
    and configuration for the system but i think the biggest problems are
    solved so far.

    thx everybody,
    Peter

    2006/10/2, Tom Lane <tgl (AT) sss (DOT) pgh.pa.us>:
    >Ray Stell <stellr (AT) cns (DOT) vt.eduwrites:
    >How would one determine the lock situation definitively? Is there
    >an internal mechanism that can be queried?
    >>

    >pg_locks view.
    >>

    >regards, tom lane
    >>

    >(end of
    >broadcast)
    >TIP 2: Don't 'kill -9' the postmaster
    >>

    >

    (end of
    broadcast)
    TIP 4: Have you searched our list archives?

    http://archives.postgresql.org

    (end of broadcast)
    TIP 4: Have you searched our list archives?

    http://archives.postgresql.org
  • No.15 | | 2970 bytes | |

    I finished the little benchmarking on our server and the results are
    quite curios.
    With the numbers from
    in mind i did
    /pgbench -i pgbench
    and then performed some pgbench tests, for example
    /pgbench -c 1 -t 1000 -s 1 pgbench
    starting vacuumend.
    transaction type: TPC-B (sort of)
    scaling factor: 1
    number of clients: 1
    number of transactions per client: 1000
    number of transactions actually processed: 1000/1000
    tps = 50.703609 (including connections establishing)
    tps = 50.709265 (excluding connections establishing)

    So our server with two 3.00 GHz Xeon CPUs and 2GB has about 5% of the
    performance of the server described in the article!

    I did some tests on a Xen machine running on my workstation and the
    results are about 400-500tps which seems to be quite reasonable.

    I also tried to disable drbd and put the data directory elsewhere, but
    the performance was the same.

    any ideas?

    thx,
    Peter

    2006/10/5, Alexander Staubo <alex (AT) purefiction (DOT) net>:
    It appears to me that work_mem is a more significant configuration
    option than previously assumed by many PostgreSQL users, myself
    included. As with many database optimizations, it's an obscure
    problem to diagnose because you generally only observe it through I/
    activity.

    possibility would be to log a warning whenever work_mem is
    exceeded (or exceeded by a certain ratio). I would also love a couple
    of new statistics counters tracking the amount of work memory used
    and the amount of work memory that has spilled over into pgsql_tmp.

    Alexander.

    5, 2006, at 10:48 , Peter Bauer wrote:

    Hi all,

    inspired by the last posting "Weird disk write load caused by
    PostgreSQL?" i increased the work_mem from 1 to 7MB and did some
    loadtest with vacuum every 10 minutes. The system load (harddisk) went
    down and everything was very stable at 80% idle for nearly 24 hours!
    I am currently performing some pgbench runs to evaluate the hardware
    and configuration for the system but i think the biggest problems are
    solved so far.

    thx everybody,
    Peter

    2006/10/2, Tom Lane <tgl (AT) sss (DOT) pgh.pa.us>:
    >Ray Stell <stellr (AT) cns (DOT) vt.eduwrites:
    >How would one determine the lock situation definitively? Is there
    >an internal mechanism that can be queried?
    >>

    >pg_locks view.
    >>

    >regards, tom lane
    >>

    >(end of
    >broadcast)
    >TIP 2: Don't 'kill -9' the postmaster
    >>

    >

    (end of
    broadcast)
    TIP 4: Have you searched our list archives?

    http://archives.postgresql.org
    --

    (end of broadcast)
    TIP 3: Have you checked our extensive FAQ?

  • No.16 | | 3415 bytes | |

    I forgot to mention that top does not show a noticeable increase of
    CPU or system load during the pgbench runs (postmaster has 4-8% CPU).
    Shouldn't the machine be busy during such a test?

    thx,
    Peter

    2006/10/5, Peter Bauer <peter.m.bauer (AT) gmail (DOT) com>:
    I finished the little benchmarking on our server and the results are
    quite curios.
    With the numbers from
    in mind i did
    ./pgbench -i pgbench
    and then performed some pgbench tests, for example
    ./pgbench -c 1 -t 1000 -s 1 pgbench
    starting vacuumend.
    transaction type: TPC-B (sort of)
    scaling factor: 1
    number of clients: 1
    number of transactions per client: 1000
    number of transactions actually processed: 1000/1000
    tps = 50.703609 (including connections establishing)
    tps = 50.709265 (excluding connections establishing)

    So our server with two 3.00 GHz Xeon CPUs and 2GB has about 5% of the
    performance of the server described in the article!

    I did some tests on a Xen machine running on my workstation and the
    results are about 400-500tps which seems to be quite reasonable.

    I also tried to disable drbd and put the data directory elsewhere, but
    the performance was the same.

    any ideas?

    thx,
    Peter
    --
    2006/10/5, Alexander Staubo <alex (AT) purefiction (DOT) net>:
    It appears to me that work_mem is a more significant configuration
    option than previously assumed by many PostgreSQL users, myself
    included. As with many database optimizations, it's an obscure
    problem to diagnose because you generally only observe it through I/
    activity.

    possibility would be to log a warning whenever work_mem is
    exceeded (or exceeded by a certain ratio). I would also love a couple
    of new statistics counters tracking the amount of work memory used
    and the amount of work memory that has spilled over into pgsql_tmp.

    Alexander.

    5, 2006, at 10:48 , Peter Bauer wrote:

    Hi all,

    inspired by the last posting "Weird disk write load caused by
    PostgreSQL?" i increased the work_mem from 1 to 7MB and did some
    loadtest with vacuum every 10 minutes. The system load (harddisk) went
    down and everything was very stable at 80% idle for nearly 24 hours!
    I am currently performing some pgbench runs to evaluate the hardware
    and configuration for the system but i think the biggest problems are
    solved so far.

    thx everybody,
    Peter

    2006/10/2, Tom Lane <tgl (AT) sss (DOT) pgh.pa.us>:
    >Ray Stell <stellr (AT) cns (DOT) vt.eduwrites:
    >How would one determine the lock situation definitively? Is there
    >an internal mechanism that can be queried?
    >>

    >pg_locks view.
    >>

    >regards, tom lane
    >>

    >(end of
    >broadcast)
    >TIP 2: Don't 'kill -9' the postmaster
    >>

    >

    (end of
    broadcast)
    TIP 4: Have you searched our list archives?

    http://archives.postgresql.org
    >
    >
    >


    (end of broadcast)
    TIP 9: In versions below 8.0, the planner will ignore your desire to
    choose an index scan if your joining column's datatypes do not
    match
  • No.17 | | 4867 bytes | |

    If you are on Linux, I recommend iostat(1) and vmstat(8) over top.

    Iostat will report I/ transfer statistics; it's how I discovered
    that work_mem buffers were spilling over to disk files. For Vmstat,
    look in particular at the load (ie., how many processes are competing
    for the scheduler) in the first field ("r") and how many processes
    are blocked by I/ waits ("b").

    Alexander.

    5, 2006, at 14:35 , Peter Bauer wrote:

    I forgot to mention that top does not show a noticeable increase of
    CPU or system load during the pgbench runs (postmaster has 4-8% CPU).
    Shouldn't the machine be busy during such a test?

    thx,
    Peter

    2006/10/5, Peter Bauer <peter.m.bauer (AT) gmail (DOT) com>:
    >I finished the little benchmarking on our server and the results are
    >quite curios.
    >With the numbers from
    >in mind i did
    >./pgbench -i pgbench
    >and then performed some pgbench tests, for example
    >./pgbench -c 1 -t 1000 -s 1 pgbench
    >starting vacuumend.
    >transaction type: TPC-B (sort of)
    >scaling factor: 1
    >number of clients: 1
    >number of transactions per client: 1000
    >number of transactions actually processed: 1000/1000
    >tps = 50.703609 (including connections establishing)
    >tps = 50.709265 (excluding connections establishing)
    >>

    >So our server with two 3.00 GHz Xeon CPUs and 2GB has about 5% of the
    >performance of the server described in the article!
    >>

    >I did some tests on a Xen machine running on my workstation and the
    >results are about 400-500tps which seems to be quite reasonable.
    >>

    >I also tried to disable drbd and put the data directory elsewhere,
    >but
    >the performance was the same.
    >>

    >any ideas?
    >>

    >thx,
    >Peter
    >>
    >>

    >2006/10/5, Alexander Staubo <alex (AT) purefiction (DOT) net>:
    >It appears to me that work_mem is a more significant configuration
    >option than previously assumed by many PostgreSQL users, myself
    >included. As with many database optimizations, it's an obscure
    >problem to diagnose because you generally only observe it
    >through I/
    >activity.
    >>

    >possibility would be to log a warning whenever work_mem is
    >exceeded (or exceeded by a certain ratio). I would also love a
    >couple
    >of new statistics counters tracking the amount of work memory used
    >and the amount of work memory that has spilled over into pgsql_tmp.
    >>

    >Alexander.
    >>

    >5, 2006, at 10:48 , Peter Bauer wrote:
    >>

    >Hi all,
    >>

    >inspired by the last posting "Weird disk write load caused by
    >PostgreSQL?" i increased the work_mem from 1 to 7MB and did some
    >loadtest with vacuum every 10 minutes. The system load
    >(harddisk) went
    >down and everything was very stable at 80% idle for nearly 24
    >hours!
    >I am currently performing some pgbench runs to evaluate the
    >hardware
    >and configuration for the system but i think the biggest
    >problems are
    >solved so far.
    >>

    >thx everybody,
    >Peter
    >>

    >2006/10/2, Tom Lane <tgl (AT) sss (DOT) pgh.pa.us>:
    >>Ray Stell <stellr (AT) cns (DOT) vt.eduwrites:
    >>How would one determine the lock situation definitively?

    >Is there
    >>an internal mechanism that can be queried?
    >>>

    >>pg_locks view.
    >>>

    >>regards, tom lane
    >>>

    >>(end of
    >>broadcast)
    >>TIP 2: Don't 'kill -9' the postmaster
    >>>

    >>

    >(end of
    >broadcast)
    >TIP 4: Have you searched our list archives?
    >>

    >http://archives.postgresql.org
    >>
    >>
    >>

    >

    (end of
    broadcast)
    TIP 9: In versions below 8.0, the planner will ignore your desire to
    choose an index scan if your joining column's datatypes do not
    match

    (end of broadcast)
    TIP 5: don't forget to increase your free space map settings
  • No.18 | | 11652 bytes | |

    it seems that the machine doesn't really care about the pgbench run. I did a
    pgbench -c 10 -t 10000 -s 10 pgbench
    and here is the output of vmstat 1 100 which has been started some
    seconds before pgbench:
    vmstat 1 100
    procs
    r b swpd free buff cache si so bi bo in cs us sy id wa
    0 0 10236 1240952 111324 553908 0 0 3 2 1 3 5 3 92 0
    0 0 10236 1240944 111324 553908 0 0 0 0 167 48 0 0 100 0
    0 0 10236 1240936 111332 553908 0 0 0 20 169 52 0 0 100 0
    0 0 10236 1240936 111332 553908 0 0 0 0 165 44 0 0 100 0
    0 0 10236 1240936 111332 553908 0 0 0 0 164 44 0 0 100 0
    0 0 10236 1240936 111332 553908 0 0 0 0 165 47 0 0 100 0
    0 0 10236 1240936 111332 553908 0 0 0 0 165 47 0 0 100 0
    0 0 10236 1240936 111332 553908 0 0 0 0 164 49 0 0 100 0
    0 0 10236 1240936 111332 553908 0 0 0 0 165 40 0 0 100 0
    0 0 10236 1240936 111332 553908 0 0 0 0 165 41 0 0 100 0
    0 0 10236 1240936 111332 553908 0 0 0 0 165 45 0 0 100 0
    0 0 10236 1240936 111332 553908 0 0 0 0 165 48 0 0 100 0
    0 0 10236 1240936 111332 553908 0 0 0 0 164 42 0 0 100 0
    0 0 10236 1240936 111332 553908 0 0 0 0 165 45 0 0 100 0
    0 0 10236 1240936 111332 553908 0 0 0 0 165 41 0 0 100 0
    0 2 10236 1237688 111332 550256 0 0 0 10976 449 1793 13 1 86 0
    0 2 10236 1237688 111332 550256 0 0 0 1928 345 3206 2 1 97 0
    0 2 10236 1237680 111340 550256 0 0 0 1969 352 3012 4 1 95 0
    0 2 10236 1237600 111340 550336 0 0 0 2096 353 2986 2 0 98 0
    0 2 10236 1237608 111340 550336 0 0 0 1932 351 2985 1 2 97 0
    0 2 10236 1237600 111340 550336 0 0 0 1836 350 3097 4 1 95 0
    0 2 10236 1237600 111340 550336 0 0 0 1852 353 2971 3 1 95 0
    4 1 10236 1237600 111340 550336 0 0 0 1975 372 2682 3 0 97 0
    0 2 10236 1237608 111340 550336 0 0 0 2056 372 2458 2 1 96 0
    0 2 10236 1237600 111340 550336 0 0 0 2028 377 2360 1 1 98 0
    0 2 10236 1237588 111340 550336 0 0 0 2000 372 2630 3 1 95 0
    0 2 10236 1237588 111340 550336 0 0 0 2044 372 2326 3 0 97 0
    0 2 10236 1237588 111340 550336 0 0 0 1976 372 2171 2 0 98 0
    0 2 10236 1237588 111340 550336 0 0 0 1972 383 2275 4 1 95 0
    1 2 10236 1237588 111340 550336 0 0 0 1924 382 2500 3 1 95 0
    0 2 10236 1237588 111340 550336 0 0 0 1804 372 2798 3 1 96 0
    0 2 10236 1237588 111340 550336 0 0 0 1900 374 2974 6 2 92 0
    0 2 10236 1237588 111340 550336 0 0 0 1820 361 2871 2 2 95 0
    0 2 10236 1237576 111340 550336 0 0 0 1876 366 2762 4 1 95 0
    0 2 10236 1237576 111340 550336 0 0 0 1904 370 2724 3 0 96 0
    0 2 10236 1237576 111340 550336 0 0 0 1972 378 2585 6 0 93 0
    6 1 10236 1237576 111340 550336 0 0 0 1800 371 2838 1 1 98 0
    0 2 10236 1237576 111340 550336 0 0 0 1792 362 2826 3 1 96 0
    0 2 10236 1237576 111340 550344 0 0 0 1804 362 3068 3 0 96 0
    1 1 10236 1237560 111340 550360 0 0 0 1936 373 2718 2 1 96 0
    6 1 10236 1237552 111340 550360 0 0 0 1788 365 2447 4 1 95 0
    0 1 10236 1237552 111340 550360 0 0 0 1804 368 2362 5 2 93 0
    0 2 10236 1237544 111340 550368 0 0 0 1908 370 2434 3 4 93 0
    0 2 10236 1237544 111340 550368 0 0 0 1848 369 2360 4 1 94 0
    0 2 10236 1237504 111340 550408 0 0 0 1796 358 2655 3 1 96 0
    0 2 10236 1237496 111340 550416 0 0 0 1988 374 2491 4 1 95 0
    0 2 10236 1237488 111340 550424 0 0 0 1960 372 2111 2 1 97 0
    0 2 10236 1237488 111340 550424 0 0 0 1760 360 2433 4 1 95 0
    0 2 10236 1237488 111340 550424 0 0 0 1944 374 2064 2 1 97 0
    0 2 10236 1237496 111340 550424 0 0 0 1868 373 2169 3 0 97 0
    4 1 10236 1237476 111340 550432 0 0 0 1868 372 2170 3 1 96 0

    i did the same on the Xen machine:
    vmstat 1 100
    procs
    r b swpd free buff cache si so bi bo in cs us sy id wa
    2 0 4 3964 5916 99288 0 0 1 21 34 53 0 0 99 0
    0 0 4 3964 5916 99288 0 0 0 0 50 20 0 0 100 0
    0 0 4 3964 5916 99288 0 0 0 0 38 20 0 0 100 0
    0 0 4 3964 5916 99288 0 0 0 0 33 18 0 0 100 0
    0 0 4 3964 5916 99288 0 0 0 0 28 20 0 1 99 0
    0 0 4 3964 5924 99288 0 0 0 28 50 25 0 0 100 0
    0 0 4 3964 5924 99288 0 0 0 0 24 19 0 0 100 0
    0 0 4 3968 5924 99288 0 0 0 0 37 20 0 0 100 0
    0 0 4 3968 5924 99288 0 0 0 0 50 24 0 0 100 0
    0 0 4 3772 5924 99292 0 0 0 0 33 29 0 0 100 0
    0 0 4 3748 5924 99292 0 0 0 0 31 19 0 0 100 0
    0 0 4 3744 5924 99292 0 0 0 0 43 25 0 0 100 0
    1 0 12 2560 5496 100048 0 0 44 17004 1896 812 26 10 56 8
    2 0 16 2388 5080 95152 0 0 180 25788 2505 11372 54 19 2 25
    1 1 16 2576 5084 94956 0 0 0 6824 1215 23437 50 19 1 31
    2 0 16 2668 5084 94872 0 0 0 6588 1188 22923 51 12 2 35
    1 1 16 2600 5088 94840 0 0 0 8664 1701 22326 51 10 2 37
    0 1 16 2432 5100 94968 0 0 0 5492 1183 9985 26 6 2 66
    0 1 16 2464 5112 95048 0 0 0 2404 495 5670 18 5 1 76
    1 0 16 2596 5112 94980 0 0 32 6036 1082 21986 42 16 2 41
    1 0 16 2244 5124 95268 0 0 0 7740 1526 20645 37 14 2 48
    2 1 16 2540 5108 95064 0 0 0 7016 1343 18769 46 12 2 41
    0 1 16 2752 5108 94668 0 0 0 5244 1165 8660 16 6 1 77
    0 2 16 2780 5116 94668 0 0 8 648 80 95 0 1 0 99
    4 0 16 2736 5140 94716 0 0 0 1160 363 2556 9 2 1 88
    1 0 16 2268 5148 95036 0 0 8 5112 853 21498 67 13 2 18
    2 1 16 2788 5048 94676 0 0 8 7876 1535 21278 65 12 2 21
    0 1 16 2764 5060 94788 0 0 0 5372 1203 7024 21 6 1 72
    1 0 16 2648 5076 94932 0 0 12 3112 596 10241 24 9 2 65
    1 0 16 2728 5056 94772 0 0 24 6152 1142 19822 47 16 4 34
    1 0 16 2504 5068 95068 0 0 0 7196 1387 20474 50 17 2 32
    1 1 16 2652 5064 94844 0 0 8 7108 1018 17572 56 8 3 33
    0 1 16 2568 5076 94916 0 0 0 4460 1003 5825 14 1 1 84
    0 1 16 2572 5076 94924 0 0 0 704 52 90 0 0 2 98
    1 0 16 2300 5096 95116 0 0 0 3688 762 14174 52 9 2 37
    1 1 16 2436 5080 95080 0 0 0 7256 1407 19964 66 21 1 12
    1 0 16 2640 5092 94904 0 0 0 6504 1223 19809 69 15 1 15
    0 1 16 2528 5104 94960 0 0 0 4964 1165 4573 18 3 1 78
    1 1 16 2332 5116 95064 0 0 0 2492 521 7197 15 5 1 79
    1 1 16 2580 5080 94900 0 0 0 5076 863 19775 60 14 2 24
    2 0 16 2728 5020 94732 0 0 0 7636 1533 19246 51 15 1 33
    0 0 16 2484 5032 94980 0 0 0 6068 1200 16340 50 16 2 32
    1 0 16 2316 5044 95036 0 0 0 3940 934 3570 10 5 3 82
    0 2 16 2788 5024 94628 0 0 0 640 63 46 0 0 0 100
    0 1 16 2876 5060 94636 0 0 0 1212 402 602 2 0 3 95
    1 1 16 2580 5072 94876 0 0 0 6440 1255 17866 68 11 2 19
    1 0 16 2636 5084 94788 0 0 0 7188 1405 18753 66 16 1 17
    0 1 16 2580 5084 94828 0 0 0 544 116 2877 12 1 0 87
    2 1 16 2536 5104 94908 0 0 0 2968 656 5413 26 5 2 67

    The only difference i see is that there is always at least one process
    waiting for I/

    thx,
    Peter

    2006/10/5, Alexander Staubo <alex (AT) purefiction (DOT) net>:
    If you are on Linux, I recommend iostat(1) and vmstat(8) over top.

    Iostat will report I/ transfer statistics; it's how I discovered
    that work_mem buffers were spilling over to disk files. For Vmstat,
    look in particular at the load (ie., how many processes are competing
    for the scheduler) in the first field ("r") and how many processes
    are blocked by I/ waits ("b").

    Alexander.

    5, 2006, at 14:35 , Peter Bauer wrote:

    I forgot to mention that top does not show a noticeable increase of
    CPU or system load during the pgbench runs (postmaster has 4-8% CPU).
    Shouldn't the machine be busy during such a test?

    thx,
    Peter

    2006/10/5, Peter Bauer <peter.m.bauer (AT) gmail (DOT) com>:
    >I finished the little benchmarking on our server and the results are
    >quite curios.
    >With the numbers from
    >in mind i did
    >./pgbench -i pgbench
    >and then performed some pgbench tests, for example
    >./pgbench -c 1 -t 1000 -s 1 pgbench
    >starting vacuumend.
    >transaction type: TPC-B (sort of)
    >scaling factor: 1
    >number of clients: 1
    >number of transactions per client: 1000
    >number of transactions actually processed: 1000/1000
    >tps = 50.703609 (including connections establishing)
    >tps = 50.709265 (excluding connections establishing)
    >>

    >So our server with two 3.00 GHz Xeon CPUs and 2GB has about 5% of the
    >performance of the server described in the article!
    >>

    >I did some tests on a Xen machine running on my workstation and the
    >results are about 400-500tps which seems to be quite reasonable.
    >>

    >I also tried to disable drbd and put the data directory elsewhere,
    >but
    >the performance was the same.
    >>

    >any ideas?
    >>

    >thx,
    >Peter
    >>
    >>

    >2006/10/5, Alexander Staubo <alex (AT) purefiction (DOT) net>:
    >It appears to me that work_mem is a more significant configuration
    >option than previously assumed by many PostgreSQL users, myself
    >included. As with many database optimizations, it's an obscure
    >problem to diagnose because you generally only observe it
    >through I/
    >activity.
    >>

    >possibility would be to log a warning whenever work_mem is
    >exceeded (or exceeded by a certain ratio). I would also love a
    >couple
    >of new statistics counters tracking the amount of work memory used
    >and the amount of work memory that has spilled over into pgsql_tmp.
    >>

    >Alexander.
    >>

    >5, 2006, at 10:48 , Peter Bauer wrote:
    >>

    >Hi all,
    >>

    >inspired by the last posting "Weird disk write load caused by
    >PostgreSQL?" i increased the work_mem from 1 to 7MB and did some
    >loadtest with vacuum every 10 minutes. The system load
    >(harddisk) went
    >down and everything was very stable at 80% idle for nearly 24
    >hours!
    >I am currently performing some pgbench runs to evaluate the
    >hardware
    >and configuration for the system but i think the biggest
    >problems are
    >solved so far.
    >>

    >thx everybody,
    >Peter
    >>

    >2006/10/2, Tom Lane <tgl (AT) sss (DOT) pgh.pa.us>:
    >>Ray Stell <stellr (AT) cns (DOT) vt.eduwrites:
    >>How would one determine the lock situation definitively?

    >Is there
    >>an internal mechanism that can be queried?
    >>>

    >>pg_locks view.
    >>>

    >>regards, tom lane
    >>>

    >>(end of
    >>broadcast)
    >>TIP 2: Don't 'kill -9' the postmaster
    >>>

    >>

    >(end of
    >broadcast)
    >TIP 4: Have you searched our list archives?
    >>

    >http://archives.postgresql.org
    >>
    >>
    >>

    >

    (end of
    broadcast)
    TIP 9: In versions below 8.0, the planner will ignore your desire to
    choose an index scan if your joining column's datatypes do not
    match
    --

    (end of broadcast)
    TIP 1: if posting/reading through Usenet, please send an appropriate
    subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that your
    message can get through to the mailing list cleanly
  • No.19 | | 1036 bytes | |

    "Peter Bauer" <peter.m.bauer (AT) gmail (DOT) comwrites:
    tps = 50.703609 (including connections establishing)
    tps = 50.709265 (excluding connections establishing)

    That's about what you ought to expect for a single transaction stream
    running on honest disk hardware (ie, disks that don't lie about write
    complete). You can't commit a transaction more often than once per disk
    revolution, because you have to wait for the current WAL file endpoint
    to pass under the heads again. If there are multiple clients then
    "ganging" concurrent commits is possible, but you tested only one.

    The benchmark you reference might have been done on disks with battery
    backed write cache. it might have been just plain unsafe (ie, the
    equivalent of fsync off, but in hardware :-()

    regards, tom lane

    (end of broadcast)
    TIP 9: In versions below 8.0, the planner will ignore your desire to
    choose an index scan if your joining column's datatypes do not
    match
  • No.20 | | 1340 bytes | |

    Hi all,

    2006/10/5, Tom Lane <tgl (AT) sss (DOT) pgh.pa.us>:
    "Peter Bauer" <peter.m.bauer (AT) gmail (DOT) comwrites:
    tps = 50.703609 (including connections establishing)
    tps = 50.709265 (excluding connections establishing)

    That's about what you ought to expect for a single transaction stream
    running on honest disk hardware (ie, disks that don't lie about write
    complete). You can't commit a transaction more often than once per disk
    revolution, because you have to wait for the current WAL file endpoint
    to pass under the heads again. If there are multiple clients then
    "ganging" concurrent commits is possible, but you tested only one.

    The benchmark you reference might have been done on disks with battery
    backed write cache. it might have been just plain unsafe (ie, the
    equivalent of fsync off, but in hardware :-()

    You are right, i performed the pgbench tests on another machine with
    the same hardware but a kernel which supports the onboard Dell Raid
    Controller with battery backed write cache and the result is about 400
    tps. We will see how much difference this makes in practice but at
    least i know where the "problem" was.

    thx,
    Peter

    (end of broadcast)
    TIP 3: Have you checked our extensive FAQ?

Re: Major Performance decrease after some hours


max 4000 letters.
Your nickname that display:
In order to stop the spam: 9 + 8 =
QUESTION ON "Databases"

EMSDN.COM