Databases

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • sluggish performance in MacOS 10.4

    12 answers - 274 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Eric Scouten <scouten (AT) adobe (DOT) comwrote:
    Doesn't that bring with it an increased risk of file corruption?
    The added risk in going from synchronous=FULL to synchronous=NRMAL
    is minimal. You only run into trouble with you select
    synchronous=FF.
  • No.1 | | 331 bytes | |

    12/2/05, drh (AT) hwaci (DOT) com <drh (AT) hwaci (DOT) comwrote:
    The added risk in going from synchronous=FULL to synchronous=NRMAL
    is minimal. You only run into trouble [when] you select
    synchronous=FF.

    you've piqued my interest Does sqlite implement MVCC (multiversion
    concurrency)?
    -cc
  • No.2 | | 1224 bytes | |

    Fri, 2 Dec 2005, C C wrote:

    12/2/05, drh (AT) hwaci (DOT) com <drh (AT) hwaci (DOT) comwrote:
    >The added risk in going from synchronous=FULL to synchronous=NRMAL
    >is minimal. You only run into trouble [when] you select
    >synchronous=FF.
    >
    >you've piqued my interest Does sqlite implement MVCC (multiversion
    >concurrency)?


    No.

    Might be an interesting addition, but locking of the btree structure would
    be difficult to make performant, as communication of locking is via file
    locks only. MVCC works very well on centralised servers because locks are
    short lived and can be quickly communicated using regular IPC.

    It might be feasable to have intra-process MVCC on a single SQLite
    database with multiple handles, but inter-process MVCC would be
    problematic, as is scheduling vacuuming of stale data. It could, however,
    remove the limit that a writer cannot commit until all readers are
    finished, so improving concurrency somewhat.

    Basically, it'd be too complex (IMH) to implement in a 'lite' database
    engine.

    >
    >-cc
    >


    Christian
  • No.3 | | 1894 bytes | |

    Christian Smith wrote:
    Fri, 2 Dec 2005, C C wrote:


    >12/2/05, drh (AT) hwaci (DOT) com <drh (AT) hwaci (DOT) comwrote:
    >>

    The added risk in going from synchronous=FULL to synchronous=NRMAL
    is minimal. You only run into trouble [when] you select
    synchronous=FF.
    >>
    >>you've piqued my interest Does sqlite implement MVCC (multiversion
    >>concurrency)?


    No.

    Might be an interesting addition, but locking of the btree structure would
    be difficult to make performant, as communication of locking is via file
    locks only. MVCC works very well on centralised servers because locks are
    short lived and can be quickly communicated using regular IPC.

    It might be feasable to have intra-process MVCC on a single SQLite
    database with multiple handles, but inter-process MVCC would be
    problematic, as is scheduling vacuuming of stale data. It could, however,
    remove the limit that a writer cannot commit until all readers are
    finished, so improving concurrency somewhat.

    Basically, it'd be too complex (IMH) to implement in a 'lite' database
    engine.

    cc
    >>


    Christian

    The virtue of MVCC is that it does not use locking, instead it maintains
    serialized versions of the data. The whole point of the "lite" in
    Sqlite is that it does not use those heavyweight DBMS methods which make
    a DBMS capable of handling thousands of concurrent transactions.

    The massive overhead of MVCC is pointless when the DBMS is in a small
    scale application with at most a handful of concurrent transactions.
    Sqlite's locking method is quite appropriate at that level.
    JS
  • No.4 | | 2012 bytes | |

    Tue, 6 Dec 2005, John Stanton wrote:

    >Christian Smith wrote:
    >Fri, 2 Dec 2005, C C wrote:
    >>

    >Basically, it'd be too complex (IMH) to implement in a 'lite' database
    >engine.
    >>
    >>

    >The virtue of MVCC is that it does not use locking, instead it maintains
    >serialized versions of the data.


    The btree must still be accessed and updated in a serialized fashion to
    prevent corruption. MVCC removes the need for locks on whole databases,
    tables or rows. But the metadata still requires locks for updating.

    It is these metadata updates that would slow down throughput that a
    centralized db doesn't suffer from. The centralized db can use regular
    locking primitives to coordinate metadata updates (such as mutexes and
    condition variables.) SQLite does not have that option except within a
    single process.

    The whole point of the "lite" in
    >Sqlite is that it does not use those heavyweight DBMS methods which make
    >a DBMS capable of handling thousands of concurrent transactions.


    Agreed.


    >
    >The massive overhead of MVCC is pointless when the DBMS is in a small
    >scale application with at most a handful of concurrent transactions.
    >Sqlite's locking method is quite appropriate at that level.


    Having never implemented a MVCC database (nor any database, for that
    matter:) I couldn't authoritively comment, but MVCC doesn't strike me as
    being particularly high overhead. A couple of extra hidden columns per
    row, tracking transaction id's and the odd vacuum every now and then. What
    could be simpler:) Certainly not massive overhead, and probably easier
    than table or row level locking (think deadlocks.)


    >JS
    >


    Christian
  • No.5 | | 2839 bytes | |

    Christian Smith wrote:
    Tue, 6 Dec 2005, John Stanton wrote:


    >>Christian Smith wrote:
    >>

    Fri, 2 Dec 2005, C C wrote:

    Basically, it'd be too complex (IMH) to implement in a 'lite' database
    engine.


    >>
    >>The virtue of MVCC is that it does not use locking, instead it maintains
    >>serialized versions of the data.


    The btree must still be accessed and updated in a serialized fashion to
    prevent corruption. MVCC removes the need for locks on whole databases,
    tables or rows. But the metadata still requires locks for updating.

    It is these metadata updates that would slow down throughput that a
    centralized db doesn't suffer from. The centralized db can use regular
    locking primitives to coordinate metadata updates (such as mutexes and
    condition variables.) SQLite does not have that option except within a
    single process.


    >The whole point of the "lite" in
    >>Sqlite is that it does not use those heavyweight DBMS methods which make
    >>a DBMS capable of handling thousands of concurrent transactions.


    Agreed.


    >>The massive overhead of MVCC is pointless when the DBMS is in a small
    >>scale application with at most a handful of concurrent transactions.
    >>Sqlite's locking method is quite appropriate at that level.


    Having never implemented a MVCC database (nor any database, for that
    matter:) I couldn't authoritively comment, but MVCC doesn't strike me as
    being particularly high overhead. A couple of extra hidden columns per
    row, tracking transaction id's and the odd vacuum every now and then. What
    could be simpler:) Certainly not massive overhead, and probably easier
    than table or row level locking (think deadlocks.)


    >>JS
    >>


    Christian

    What could be simpler? Implementing transactional integrity by means of
    locks rather than multiple versions of data and complex housekeeping,
    particularly if the intended application doesn't depend upon MVCC for
    performance with a large number of concurrent users. Dr Hipp is very
    clear in advising not to use Sqlite for enterprise applications.

    If you follow this mail list you will see that Sqlite is in some ways
    too "heavy" for some of the embedded applications where it is used.
    to create an "Sqfeatherlite" would be more useful than
    attempting to create yet another general purpose, full function RDBMS.
  • No.6 | | 2887 bytes | |

    Wed, 7 Dec 2005, John Stanton wrote:

    >Christian Smith wrote:

    The massive overhead of MVCC is pointless when the DBMS is in a small
    scale application with at most a handful of concurrent transactions.
    Sqlite's locking method is quite appropriate at that level.
    >>
    >>
    >>

    >Having never implemented a MVCC database (nor any database, for that
    >matter:) I couldn't authoritively comment, but MVCC doesn't strike me as
    >being particularly high overhead. A couple of extra hidden columns per
    >row, tracking transaction id's and the odd vacuum every now and then. What
    >could be simpler:) Certainly not massive overhead, and probably easier
    >than table or row level locking (think deadlocks.)
    >>

    >
    >What could be simpler? Implementing transactional integrity by means of
    >locks rather than multiple versions of data and complex housekeeping,
    >particularly if the intended application doesn't depend upon MVCC for
    >performance with a large number of concurrent users. Dr Hipp is very
    >clear in advising not to use Sqlite for enterprise applications.


    I'd guess MVCC would be simpler to implement than row locks. With row
    locks, you have to track every transaction, every row accessed by every
    transaction, and detect and handle any deadlock conditions, as well as
    correctly lock the metadata for updates.

    With MVCC, you simply have to lock the metadata briefly for inserts
    (updates) or deletes, track every live transaction, have no problems with
    read/write lock deadlocks, and easy to detect stale row updates. Vacuuming
    adds some performance overhead, but can be externally scheduled and simply
    implemented.

    What would you class as enterprise? I think SQLite is very suitable for
    certain classes of enterprise applications, just not as the primary data
    store for a large data warehouse or some such application. As an example,
    consider something like a directory service. A very 'enterprise' type of
    application, and one I think SQLite would excel at.


    >
    >If you follow this mail list you will see that Sqlite is in some ways
    >too "heavy" for some of the embedded applications where it is used.

    to create an "Sqfeatherlite" would be more useful than
    >attempting to create yet another general purpose, full function RDBMS.
    >


    Err, I wasn't proposing that. I explicitly stated I thought MVCC was too
    heavy weight for SQLite. I sense this thread getting hostile, which was
    not the intention. I think we're simply talking at cross purposes.

    Christian
  • No.7 | | 3815 bytes | |

    Christian Smith wrote:
    Wed, 7 Dec 2005, John Stanton wrote:


    >>Christian Smith wrote:
    >>

    The massive overhead of MVCC is pointless when the DBMS is in a small
    scale application with at most a handful of concurrent transactions.
    Sqlite's locking method is quite appropriate at that level.

    Having never implemented a MVCC database (nor any database, for that
    matter:) I couldn't authoritively comment, but MVCC doesn't strike me as
    being particularly high overhead. A couple of extra hidden columns per
    row, tracking transaction id's and the odd vacuum every now and then. What
    could be simpler:) Certainly not massive overhead, and probably easier
    than table or row level locking (think deadlocks.)

    >>
    >>What could be simpler? Implementing transactional integrity by means of
    >>locks rather than multiple versions of data and complex housekeeping,
    >>particularly if the intended application doesn't depend upon MVCC for
    >>performance with a large number of concurrent users. Dr Hipp is very
    >>clear in advising not to use Sqlite for enterprise applications.


    I'd guess MVCC would be simpler to implement than row locks. With row
    locks, you have to track every transaction, every row accessed by every
    transaction, and detect and handle any deadlock conditions, as well as
    correctly lock the metadata for updates.

    With MVCC, you simply have to lock the metadata briefly for inserts
    (updates) or deletes, track every live transaction, have no problems with
    read/write lock deadlocks, and easy to detect stale row updates. Vacuuming
    adds some performance overhead, but can be externally scheduled and simply
    implemented.

    What would you class as enterprise? I think SQLite is very suitable for
    certain classes of enterprise applications, just not as the primary data
    store for a large data warehouse or some such application. As an example,
    consider something like a directory service. A very 'enterprise' type of
    application, and one I think SQLite would excel at.


    >>If you follow this mail list you will see that Sqlite is in some ways
    >>too "heavy" for some of the embedded applications where it is used.

    >to create an "Sqfeatherlite" would be more useful than
    >>attempting to create yet another general purpose, full function RDBMS.
    >>


    Err, I wasn't proposing that. I explicitly stated I thought MVCC was too
    heavy weight for SQLite. I sense this thread getting hostile, which was
    not the intention. I think we're simply talking at cross purposes.

    Christian

    My understanding of Sqlite is that transactions are locked by locking
    the entire database. Deadlocks can occur and the programmer must be
    aware of the possibility of a transaction temporarily failing due to a
    lock and of the need to possibly resolve deadlocks if the application
    makes them possible. This behaviour is a feature of the absence of a
    central DBMS server process. Sqlite runs in each user process, a major
    reason for its simplicity and wide appeal.

    By "enterprise" I meant the usual IT definition, a step up from
    "department".

    If you are interested in the complexity of MVCC take a look at the
    source of PostgreSQL.

    other point, and not a hostile one, is the conception of a lock.
    You seem a little confused between internal synchronisation and locks
    set by the DBMS for the benefit of applications.
  • No.8 | | 3074 bytes | |

    Thu, 8 Dec 2005, John Stanton wrote:

    >My understanding of Sqlite is that transactions are locked by locking
    >the entire database. Deadlocks can occur and the programmer must be
    >aware of the possibility of a transaction temporarily failing due to a
    >lock and of the need to possibly resolve deadlocks if the application
    >makes them possible. This behaviour is a feature of the absence of a
    >central DBMS server process. Sqlite runs in each user process, a major
    >reason for its simplicity and wide appeal.


    Centrally managed DBMS can also have deadlocks. Consider this sequence of
    events:

    1. App 1 reads row X
    2. App 2 reads row Y.
    3. App 1 updates row Y.
    4. App 2 updates row X.

    With row locks, we'll get a deadlock at 4 as App 1 already has a read
    lock on row X, but App 1 is blocked waiting for a write lock on row Y, and
    so cannot proceed to commit or rollback its transaction. This scenario is
    causes the SQLITE_BUSY in SQLite, but will also deadlock a row locking
    DBMS. It is the detection of this that complicates the issue for row
    locking DBMS.

    >
    >By "enterprise" I meant the usual IT definition, a step up from
    >"department".
    >
    >If you are interested in the complexity of MVCC take a look at the
    >source of PostgreSQL.
    >

    other point, and not a hostile one, is the conception of a lock.
    >You seem a little confused between internal synchronisation and locks
    >set by the DBMS for the benefit of applications.
    >


    No. Just overloading the use of the term lock. In a central server based
    DBMS, much of what would require file locking in SQLite can be achieved
    using lighter, quicker IPC primitives. PostgreSQL uses semaphores for
    coordination between servers. Such locks are much quicker and fine grained
    than file locks. But it is still a lock, just as a file lock is a lock as
    well. All locks are used for synchronisation. Internal or external is just
    a matter of boundaries.

    In summary:
    - SQLite uses file locks to coodinate concurrent access at the database
    level.
    - Row locking central DBMS uses row read/write locks to coordinate access
    at the row level. Metadata updates must also be coordinated with locks.
    - MVCC central DBMS uses multiple versions of rows to remove row level
    locking, and only requires any locking for metadata updates, such as
    updating a btree or writing an update to the WAL (write ahead log).
    - A lock is a lock from a conceptual point of view. How a lock is
    implemented is a detail of the lock requirements.

    I think we're mostly in agreement on all these points, other than the
    relative complexity of MVCC against row level locking. I contest that MVCC
    is conceptually simpler than row level locking, from the point of view of
    deadlock detection and recovery, as well as increasing concurrency.

    Christian
  • No.9 | | 3248 bytes | |

    Sigh

    Christian Smith wrote:
    Thu, 8 Dec 2005, John Stanton wrote:


    >>My understanding of Sqlite is that transactions are locked by locking
    >>the entire database. Deadlocks can occur and the programmer must be
    >>aware of the possibility of a transaction temporarily failing due to a
    >>lock and of the need to possibly resolve deadlocks if the application
    >>makes them possible. This behaviour is a feature of the absence of a
    >>central DBMS server process. Sqlite runs in each user process, a major
    >>reason for its simplicity and wide appeal.


    Centrally managed DBMS can also have deadlocks. Consider this sequence of
    events:

    1. App 1 reads row X
    2. App 2 reads row Y.
    3. App 1 updates row Y.
    4. App 2 updates row X.

    With row locks, we'll get a deadlock at 4 as App 1 already has a read
    lock on row X, but App 1 is blocked waiting for a write lock on row Y, and
    so cannot proceed to commit or rollback its transaction. This scenario is
    causes the SQLITE_BUSY in SQLite, but will also deadlock a row locking
    DBMS. It is the detection of this that complicates the issue for row
    locking DBMS.


    >>By "enterprise" I meant the usual IT definition, a step up from
    >>"department".
    >>
    >>If you are interested in the complexity of MVCC take a look at the
    >>source of PostgreSQL.
    >>

    >other point, and not a hostile one, is the conception of a lock.
    >>You seem a little confused between internal synchronisation and locks
    >>set by the DBMS for the benefit of applications.
    >>


    No. Just overloading the use of the term lock. In a central server based
    DBMS, much of what would require file locking in SQLite can be achieved
    using lighter, quicker IPC primitives. PostgreSQL uses semaphores for
    coordination between servers. Such locks are much quicker and fine grained
    than file locks. But it is still a lock, just as a file lock is a lock as
    well. All locks are used for synchronisation. Internal or external is just
    a matter of boundaries.

    In summary:
    - SQLite uses file locks to coodinate concurrent access at the database
    level.
    - Row locking central DBMS uses row read/write locks to coordinate access
    at the row level. Metadata updates must also be coordinated with locks.
    - MVCC central DBMS uses multiple versions of rows to remove row level
    locking, and only requires any locking for metadata updates, such as
    updating a btree or writing an update to the WAL (write ahead log).
    - A lock is a lock from a conceptual point of view. How a lock is
    implemented is a detail of the lock requirements.

    I think we're mostly in agreement on all these points, other than the
    relative complexity of MVCC against row level locking. I contest that MVCC
    is conceptually simpler than row level locking, from the point of view of
    deadlock detection and recovery, as well as increasing concurrency.

    Christian

  • No.10 | | 1037 bytes | |

    Using 3.2.7

    I have a 6.5 gb database with 29,587 records in it. It takes about
    30-40 seconds for a count to return the first time I specify one in
    "Sqlite3" (seeing this in my program as well). Subsequent "count"
    operations are nearly instantaneous even when I exit Sqlite3 and
    restart. I'm pretty much seeing that across the board with other DB's
    as well, that count take a fairly long time the first time it's called
    and then it's fast.

    Is there any way I can speed this up? I use "count" to set my progress
    bars to proper ranges so, the delay causes people to wait.

    schema
    CREATE TABLE Files_V1
    (
    F INTEGER ,
    Filename TEXT,
    Length INTEGER,
    Signature BLB,
    Date INTEGER,
    Topic TEXT,
    Title TEXT,
    Chapter TEXT,
    User1 TEXT,
    User2 TEXT,
    File BLB, PRIMARY KEY(Signature)
    );
    CREATE INDEX FileIndex_V1 N Files_V1 (Filename);
    CREATE INDEX TitleIndex_V1 N Files_V1 (Title);

    "select count(F) from Files_V1;"
  • No.11 | | 1767 bytes | |

    Thu, Dec 08, 2005 at 12:26:44PM -0500, Teg wrote:
    I have a 6.5 gb database with 29,587 records in it. It takes about
    30-40 seconds for a count to return the first time I specify one in
    "Sqlite3" (seeing this in my program as well). Subsequent "count"
    operations are nearly instantaneous even when I exit Sqlite3 and
    restart. I'm pretty much seeing that across the board with other DB's
    as well, that count take a fairly long time the first time it's called
    and then it's fast.

    Is there any way I can speed this up? I use "count" to set my progress
    bars to proper ranges so, the delay causes people to wait.

    Most likely the time is being spent reading parts of the database from
    disk into memory buffers. After it is read once, the S caches it if
    you have enough spare RAM. If you want it to be faster, your goal
    should be to reduce the amount of data that has to be read.

    If you give count() an argument, it counts the number of times that
    argument is non-null. Because you are using count(F) and
    there is no index on this column, the entire database (minus the
    overflow pages from your blobs) is read.

    So your goal will be to come up with a "select count()" statement that
    doesn't need to read as much data. Try using ".explain" then "explain
    select count(F) from Files_V1;" to see what is happening.
    Determine if it is using an index, or reading the whole table.

    Then try "select count(*) from Files_V1" and see if that is much
    faster. If not, try doing the count on Signature to see if you can
    force it to use the index. Failing that, you could always make a
    small table of just F and do the count on that.

    Good luck!

  • No.12 | | 2198 bytes | |

    Hello Nathan,

    Thanks. Counting on the primary key made a HUGE difference in
    performance. I didn't really expect that but, am pleased with the
    results.

    I did the .explain and explain <querybut, wasn't sure what it was
    telling me.

    C

    Thursday, December 8, 2005, 1:14:10 PM, you wrote:

    NKThu, Dec 08, 2005 at 12:26:44PM -0500, Teg wrote:
    >I have a 6.5 gb database with 29,587 records in it. It takes about
    >30-40 seconds for a count to return the first time I specify one in
    >"Sqlite3" (seeing this in my program as well). Subsequent "count"
    >operations are nearly instantaneous even when I exit Sqlite3 and
    >restart. I'm pretty much seeing that across the board with other DB's
    >as well, that count take a fairly long time the first time it's called
    >and then it's fast.
    >
    >Is there any way I can speed this up? I use "count" to set my progress
    >bars to proper ranges so, the delay causes people to wait.


    NKMost likely the time is being spent reading parts of the database from
    NKdisk into memory buffers. After it is read once, the S caches it if
    NKyou have enough spare RAM. If you want it to be faster, your goal
    NKshould be to reduce the amount of data that has to be read.

    NKIf you give count() an argument, it counts the number of times that
    NKargument is non-null. Because you are using count(F) and
    NKthere is no index on this column, the entire database (minus the
    NKoverflow pages from your blobs) is read.

    NKSo your goal will be to come up with a "select count()" statement that
    NKdoesn't need to read as much data. Try using ".explain" then "explain
    NKselect count(F) from Files_V1;" to see what is happening.
    NKDetermine if it is using an index, or reading the whole table.

    NKThen try "select count(*) from Files_V1" and see if that is much
    NKfaster. If not, try doing the count on Signature to see if you can
    NKforce it to use the index. Failing that, you could always make a
    NKsmall table of just F and do the count on that.

    NKGood luck!

    NK

Re: sluggish performance in MacOS 10.4


max 4000 letters.
Your nickname that display:
In order to stop the spam: 3 + 2 =
QUESTION ON "Databases"

EMSDN.COM