Databases

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • indexing for left join

    5 answers - 1220 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    I have two tables:
    TABLE ITEM
    (
    ITEM_PK serial,
    RETAIL_PRICE numeric (7,2) NT NULL,
    PRIMARY KEY (ITEM_PK)
    )
    TABLE SERIAL_N
    (
    SERIAL_NPK serial,
    N varchar (20) NT NULL,
    NAME varchar (20),
    ITEM_FK integer NT NULL,
    PRIMARY KEY (SERIAL_NPK)
    );
    common query:
    SELECT ITEM.ITEM_PK FRM ITEM
    LEFT JIN SERIAL_N N SERIAL_NITEM_FK = ITEM.ITEM_PK
    WHERE SERIAL_NN ='WX1234'
    GRUP BY ITEM.ITEM_PK
    Table ITEM will eventually grow very big and SERIAL_N will grow with
    it. There will normally be zero or one SERIAL_N per ITEM; few ITEMs
    will have more than one SERIAL_N
    I have created an index for SERIAL_NN and one for SERIAL_NITEM_FK
    for the above query.
    I ran an EXPLAIN:
    HashAggregate (cost=1.061.06 rows=1 width=4)
    -Nested Loop (cost=0.001.06 rows=1 width=4)
    Join Filter: ("inner".item_fk = "outer".item_pk)
    -Seq Scan on item (cost=0.000.00 rows=1 width=4)
    -Seq Scan on serial_no (cost=0.001.05 rows=1 width=4)
    Filter: (("no")::text = 'WX1234'::text)
    Sequential despite the indices? is this because the tables of my test
    DB are virtually empty?
    Many thanks in advance.
  • No.1 | | 261 bytes | |

    Sequential despite the indices? is this because the tables of my test
    DB are virtually empty?
    This is it. PostgreSQL changes strategies with data load. Performance
    testing must be done on an approximation of the real data (both values
    and size).
  • No.2 | | 243 bytes | |

    T E Schmitz wrote:
    Sequential despite the indices? is this because the tables of my test
    DB are virtually empty?
    Yes - read up on analyse and column statistics for details. , you've
    probably missed about vacuuming too.
  • No.3 | | 930 bytes | |

    Milorad Poluga wrote:
    Try to execute this modification of your query :

    SELECT ITEM.ITEM_PK FRM ITEM
    LEFT JIN SERIAL_N
    N ( SERIAL_NITEM_FK = ITEM.ITEM_PK
    AND SERIAL_NN ='WX1234' )
    GRUP BY ITEM.ITEM_PK


    >>SELECT ITEM.ITEM_PK FRM ITEM
    >>LEFT JIN SERIAL_N N SERIAL_NITEM_FK = ITEM.ITEM_PK
    >>WHERE SERIAL_NN ='WX1234'
    >>GRUP BY ITEM.ITEM_PK


    For my small test DB both queries result in the same strategy.
    The query will be generated by an object relational interface depending
    on the user's search criteria. It will definitely be of the form I
    specified.

    I wanted to make sure that I have chosen the indices correctly. I am
    presuming, if the tables are big, that the index on SERIAL_NN will be
    used for the WHERE clause and the one on SERIAL_NITEM_FK for the join.
  • No.4 | | 385 bytes | |

    Rod Taylor wrote:
    >>Sequential despite the indices? is this because the tables of my test
    >>DB are virtually empty?


    This is it. PostgreSQL changes strategies with data load. Performance
    testing must be done on an approximation of the real data (both values
    and size).

    Thanks for your responses.
  • No.5 | | 2047 bytes | |

    I am new to PostgreSQL but isn't this query the same as doing an INNER
    JIN?

    For a true LEFT JIN should it not be as follows?

    SELECT ITEM.ITEM_PK
    FRM ITEM
    LEFT JIN SERIAL_N N SERIAL_NITEM_FK = ITEM.ITEM_PK
    AND SERIAL_NN ='WX1234'
    GRUP BY ITEM.ITEM_PK

    Using an AND instead of WHERE for the predicate on SERIAL_NN results
    in very different plans despite the immature statistics. The following
    plan is for the true LEFT JIN.

    QUERY PLAN

    HashAggregate (cost=2.102.13 rows=3 width=4)
    -Hash Left Join (cost=1.042.10 rows=3 width=4)
    Hash Cond: ("outer".item_pk = "inner".item_fk)
    -Seq Scan on item (cost=0.001.03 rows=3 width=4)
    -Hash (cost=1.041.04 rows=1 width=4)
    -Seq Scan on serial_no (cost=0.001.04 rows=1 width=4)
    Filter: (("no")::text = 'WX1234'::text)
    (7 rows)

    The next plan, which is very similary to your original plan, is for the
    INNER JIN you described.

    QUERY PLAN

    HashAggregate (cost=2.112.12 rows=1 width=4)
    -Nested Loop (cost=0.002.11 rows=1 width=4)
    Join Filter: ("outer".item_fk = "inner".item_pk)
    -Seq Scan on serial_no (cost=0.001.04 rows=1 width=4)
    Filter: (("no")::text = 'WX1234'::text)
    -Seq Scan on item (cost=0.001.03 rows=3 width=4)
    (6 rows)

    I wont speculate on how these plans would converge or diverge as the
    tables grew and the statistics matured.
    - Zulq Alam

    T E Schmitz wrote:
    SELECT ITEM.ITEM_PK FRM ITEM
    LEFT JIN SERIAL_N N SERIAL_NITEM_FK = ITEM.ITEM_PK
    WHERE SERIAL_NN ='WX1234'
    GRUP BY ITEM.ITEM_PK

    I ran an EXPLAIN:
    HashAggregate (cost=1.061.06 rows=1 width=4)
    -Nested Loop (cost=0.001.06 rows=1 width=4)
    Join Filter: ("inner".item_fk = "outer".item_pk)
    -Seq Scan on item (cost=0.000.00 rows=1 width=4)
    -Seq Scan on serial_no (cost=0.001.05 rows=1 width=4)
    Filter: (("no")::text = 'WX1234'::text)

    (end of broadcast)
    TIP 6: explain analyze is your friend

Re: indexing for left join


max 4000 letters.
Your nickname that display:
In order to stop the spam: 6 + 5 =
QUESTION ON "Databases"

EMSDN.COM