Databases

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • serious performance problems with indexes

    0 answers - 7227 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    If you pre-sort the data prior to inserting into sqlite you will
    see much better timings for both the default cache size and
    the larger cache size
    When I run your script (unmodified, except for the removal of
    MySQL) up to 400000:
    tablesize sqlite sqlite-largecache
    0 1.956 1.956
    20000 2.064 2.079
    40000 2.095 2.111
    60000 2.173 2.173
    80000 2.189 2.283
    100000 2.345 2.298
    120000 2.345 2.360
    140000 2.407 2.361
    160000 2.470 2.423
    180000 2.548 2501
    200000 2.564 2.439
    220000 2.830 2.439
    240000 2.876 2.907
    260000 2.829 2.564
    280000 3.423 3.533
    300000 4.939 3.564
    320000 7.236 3.736
    340000 7.283 3.751
    360000 10.611 3.767
    380000 11.142 3.845
    400000 13.736 3.798
    When I make the following change to your script to simulate an ordered data set
    (okay, it's cheating slightly - no pre-sort need be performed here):
    # set r [expr {int(rand()*500000)}]
    set r [expr {($num+$i)*5}]
    tablesize sqlite sqlite-largecache
    0 1.878 1.894
    20000 1.925 1.923
    40000 1.923 1.923
    60000 1.923 1.954
    80000 1.970 1.923
    100000 1.923 1.908
    120000 1.923 1.970
    140000 1.940 2.383
    160000 2.048 1.908
    180000 1.923 2.002
    200000 1923 1.923
    220000 1.939 1.923
    240000 1.923 1.923
    260000 1.939 1.923
    280000 1.939 1.939
    300000 1.954 1.939
    320000 1.939 1.923
    340000 1.970 1.954
    360000 1.939 1.954
    380000 1.923 1.939
    400000 1.970 1.939
    I wonder if other databases pre-sort their batch inserts by
    index order to achieve more consistant insert speed with
    datasets that exceed cache size.
    Message
    From: Peter De Rijk <Peter.DeRijk (AT) ua (DOT) ac.be>
    To: sqlite-users (AT) sqlite (DOT) org
    Sent: Saturday, 28, 2006 10:40:03 AM
    Subject: Re: [sqlite] serious performance problems with indexes
    I have checked more fine grained (results and code at the end of the
    mail), and it seems indeed that the extreme growth of time needed
    for inserts only starts at around 160000 rows (on my system). Before
    that, it also rises, but much much more slowly. As this indeed
    looked like a cache effect, I tried setting the default_cache_size
    to something big (20000). This was enough to remove the bend (at
    least till 2000000 rows), so that the increase in time kept going
    at the same slower pace as for the smaller data sizes.
    Curiously, mysql which I included as a reference, manages to
    keep nearly the same insert time even for the largest data sets.
    Using the large cache size, sqlite is faster than mysql until about
    700000 rows. At 2000000 rows mysql is about 3 times faster than sqlite
    with large cache (vs more than 20 times faster than sqlite with
    default cache size).
    tablesize sqlite sqlite-largecache mysql
    0 1.12 1.04 2.86
    20000 1.09 1.17 2.89
    40000 1.14 1.15 3.27
    60000 1.23 1.2 3.18
    80000 1.47 1.25 3.01
    100000 1.3 1.32 2.94
    120000 1.4 1.34 2.94
    140000 2.06 1.38 3
    160000 2.41 1.46 2.99
    180000 3.09 1.54 3.03
    200000 3.31 1.62 3.04
    220000 3.84 1.72 3.06
    240000 4.22 17 3.03
    260000 4.23 1.76 3.06
    280000 4.83 1.94 2.98
    300000 6.27 2.08 3.07
    320000 9.02 2.06 3.11
    340000 9.48 1.94 2.88
    360000 10 2.13 3.04
    380000 10.6 2.21 3.13
    400000 10.72 2.35 3.05
    420000 11.35 2.24 3.11
    440000 11.55 2.49 3.05
    460000 11.75 2.5 3.07
    480000 16.89 2.39 3.1
    500000 17.81 2.59 3.09
    520000 19.22 2.4 3.03
    540000 19.35 2.77 3.05
    560000 19.44 2.59 3.1
    580000 19.87 2.67 3.08
    600000 21.47 2.88 3.09
    620000 20.82 2.74 3.08
    640000 21.55 3 3.11
    660000 25.17 3.09 3.12
    680000 29.89 2.84 3.16
    700000 28.48 3.25 3.09
    720000 28.62 3.31 3.2
    740000 2869 3.23 3.07
    760000 30.34 3.59 3.1
    780000 30.27 364 3.13
    800000 30.57 3.37 3.14
    820000 31.41 3.57 3.06
    840000 31.89 3.89 3.09
    860000 33.77 3.57 3.05
    880000 36.01 3.87 3.1
    900000 37.58 4.04 3.07
    920000 40.59 3.68 3.04
    940000 40.9 3.72 3.09
    960000 41.45 4.24 3.1
    980000 42.05 4.28 3.13
    1000000 41.11 4.09 3.03
    1020000 42.29 4.63 3.11
    1040000 42.09 4.81 317
    1060000 42.8 4.34 3.14
    1080000 43.82 4.43 3.02
    1100000 43.01 4.5 3.12
    1120000 44.08 4.84 3.1
    1140000 45.4 5.08 3.16
    1160000 47.51 4.66 3.13
    1180000 4746 5.02 3.17
    1200000 48.21 5.42 3.17
    1220000 49.09 5.33 3.21
    1240000 49.65 5.12 3.1
    1260000 50.55 5.3 3.11
    1280000 50.56 5.8 3.17
    1300000 52.35 5.7 3.17
    1320000 53.67 5.43 3.14
    1340000 52.38 5.79 3.26
    1360000 51.44 6.03 3.18
    1380000 52.69 6.16 3.08
    1400000 53.88 5.67 3.04
    1420000 52.61 5.86 3.2
    1440000 5361 5.92 3.01
    1460000 52.98 5.98 3.08
    1480000 54.74 6.27 3.18
    1500000 55.4 6.81 3.18
    1520000 58.45 6.54 3.12
    1540000 57.14 6.65 3.12
    1560000 58.92 6.97 317
    1580000 58.17 6.86 3.22
    1600000 58.29 6.83 3.15
    1620000 58.19 6.6 3.18
    1640000 58.77 7.3 3.16
    1660000 60.59 7.44 3.22
    1680000 61.12 7.14 3.18
    1700000 61.25 7.41 3.33
    1720000 62.2 7.18 3.21
    1740000 61.48 7.95 3.02
    1760000 62.63 7.91 3.17
    1780000 62.58 7.52 3.19
    1800000 62.28 7.56 3.14
    1820000 64.2 7.62 3.33
    1840000 64.18 8.6 3.17
    1860000 64.34 8.71 3.33
    1880000 63.93 8.44 3.22
    1900000 62.98 8.13 3.3
    1920000 63.9 8.45 3.29
    1940000 64.85 8.36 3.21
    1960000 68.26 10.75 4.81
    1980000 67.81 9.46 3.17
    code
    proc runtest {sqlfile} {
    set result {}
    set delay 1000
    exec sync; after $delay;
    set t [time "exec sqlite3 def.db <$sqlfile" 1]
    set t [expr {[lindex $t 0]/1000000.0}]
    lappend result [format %.3f $t]
    exec sync; after $delay;
    set t [time "exec sqlite3 lc.db <$sqlfile" 1]
    set t [expr {[lindex $t 0]/1000000.0}]
    lappend result [format %.3f $t]
    exec sync; after $delay;
    set t [time "exec mysql -f drh <$sqlfile" 1]
    set t [expr {[lindex $t 0]/1000000.0}]
    lappend result [format %.3f $t]
    return $result
    }
    # Initialize the environment
    #
    expr srand(1)
    catch {exec /bin/sh -c {rm -f s*.db}}
    set fd [open clear.sql w]
    puts $fd {
    drop table t1;
    drop table t2;
    }
    close $fd
    catch {exec mysql drh <clear.sql}
    set fd [open 2kinit.sql w]
    puts $fd {
    PRAGMA default_cache_size=2000;
    }
    close $fd
    exec sqlite3 def.db <2kinit.sql
    set fd [open nosync-init.sql w]
    puts $fd {
    PRAGMA default_cache_size=20000;
    }
    close $fd
    exec sqlite3 lc.db <nosync-init.sql
    # initialize
    set fd [open test.sql w]
    puts $fd "BEGIN;"
    puts $fd "CREATE TABLE t1(a INTEGER, b INTEGER);"
    puts $fd "CREATE INDEX tq_b on t1(b);"
    puts $fd "CMMIT;"
    close $fd
    runtest test.sql
    set o [open results.csv w]
    puts $o "tablesize\tsqlite\tsqlite-largecache\t mysql"
    set step 20000
    for {set num 0} {$num < 2000000} {incr num $step} {
    puts $num
    set fd [open test.sql w]
    puts $fd "BEGIN;"
    for {set i 1} {$i<=$step} {incr i} {
    set r [expr {int(rand()*500000)}]
    puts $fd "INSERT INT t1 VALUES([expr {$num+$i}],$r);"
    }
    puts $fd "CMMIT;"
    close $fd
    puts $o $num\t[join [runtest test.sql] \t]
    flush $o
    }
    close $o

Re: serious performance problems with indexes


max 4000 letters.
Your nickname that display:
In order to stop the spam: 9 + 8 =
QUESTION ON "Databases"

EMSDN.COM