Java

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Incremental updates / slow searches.

    2 answers - 1666 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    The biggest thing would be to limit how often you open a new
    IndexSearcher, and when you do, warm up the new searcher in the
    background while you continue serving searches with the existing
    searcher. This is the strategy that Solr uses.
    There is also the issue of if you are analyzing/merging docs on the
    same servers that you are executing searches on. You can use a
    separate box to build the index and distribute changes to boxes used
    for searching.
    -Yonik
    Solr, the open-source Lucene search server
    10/9/06, Rickard B <backman.rickard (AT) gmail (DOT) comwrote:
    Hi,
    we are using a search system based on Lucene and have recently tried to add
    incremental updating of the index instead of building a new index every now
    and then. However we now run into problems as our searches starts to take
    very long time to complete.
    index is about 8-9GB large and we are sending lots of updates / second
    (we are probably merging in 200 - 300 in a few seconds). Today we buffer a
    bunch of updates and then merge them into the existing index like a batch,
    first doing deletes and then inserts.
    We are currently not using any special tuning of Lucene.
    Does anyone have any similiar experiences from Lucene or advices on how to
    reduce the amount of times it takes to perform a search? In particular what
    would be an optimal combination of update size, merge factor, max buffered
    docs?
    /Rickard
    --
    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.1 | | 2521 bytes | |

    don't forget to optimize your index every now and then as well deleting
    a document just marks it as "deleted" it still gets inspectected by every
    query during scoring at least once to see that it can skip it, optimizing
    is the only thing that truely removes the "deleted" documents.

    : Date: Mon, 9 2006 13:49:34 -0400
    : From: Yonik Seeley <yonik (AT) apache (DOT) org>
    : Reply-To: java-user (AT) lucene (DOT) apache.org
    : To: java-user (AT) lucene (DOT) apache.org
    : Subject: Re: Incremental updates / slow searches.
    :
    : The biggest thing would be to limit how often you open a new
    : IndexSearcher, and when you do, warm up the new searcher in the
    : background while you continue serving searches with the existing
    : searcher. This is the strategy that Solr uses.
    :
    : There is also the issue of if you are analyzing/merging docs on the
    : same servers that you are executing searches on. You can use a
    : separate box to build the index and distribute changes to boxes used
    : for searching.
    :
    : -Yonik
    : Solr, the open-source Lucene search server
    :
    : 10/9/06, Rickard B <backman.rickard (AT) gmail (DOT) comwrote:
    : Hi,
    : >
    : we are using a search system based on Lucene and have recently tried to add
    : incremental updating of the index instead of building a new index every now
    : and then. However we now run into problems as our searches starts to take
    : very long time to complete.
    : >
    : index is about 8-9GB large and we are sending lots of updates / second
    : (we are probably merging in 200 - 300 in a few seconds). Today we buffer a
    : bunch of updates and then merge them into the existing index like a batch,
    : first doing deletes and then inserts.
    : >
    : We are currently not using any special tuning of Lucene.
    : >
    : Does anyone have any similiar experiences from Lucene or advices on how to
    : reduce the amount of times it takes to perform a search? In particular what
    : would be an optimal combination of update size, merge factor, max buffered
    : docs?
    : >
    : /Rickard
    : >
    : >
    :
    :
    : To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    : For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
    :

    -Hoss

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.2 | | 1400 bytes | |

    10/9/06, Chris Hostetter <hossman_lucene (AT) fucit (DOT) orgwrote:
    don't forget to optimize your index every now and then as well deleting
    a document just marks it as "deleted" it still gets inspectected by every
    query during scoring at least once to see that it can skip it, optimizing
    is the only thing that truely removes the "deleted" documents.

    I'd refine that statement to "optimizing is the easiest way to remove
    any deleted documents that still exist in the index".

    Deleted documents are removed from segments that are merged, so it
    depends on things like the mergeFactor, maxBufferedDocs, and where the
    deleted docs are in the index (in the smallest or largest segments).
    Some deleted docs will be removed quickly, but some won't.

    an index also has a beneficial effect on search speed even
    beyond removing all of the deleted docs. Each index segment is
    actually a complete index on it's own so if search is generally
    (log(N)), searching across M segments of since N will take M *
    log(N). If those segments are "optimized" into a single segment, the
    search will be (log(M*N)).
    -Yonik
    Solr, the open-source Lucene search server

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org

Re: Incremental updates / slow searches.


max 4000 letters.
Your nickname that display:
In order to stop the spam: 6 + 5 =
QUESTION ON "Java"

EMSDN.COM