Apache

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Inserting a document into an index at a specified position

    5 answers - 514 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    All,
    For performance reasons we keep our index of over a million documents ordered
    alphabeticaly. This way for an alpha sort we can just use the index order.
    This works very good, but I'm now looking for a way to insert a single
    document to the index in the correct position.
    Is there any standard way to do this?
    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.1 | | 1469 bytes | |

    When you say you keep your documents ordered alphabetically, it's confusing
    to me. Are you saying that you pre-sort all your documents then insert them
    one after another so that automatically-generated internal Lucene ID maps
    exactly to the alphabetical ordering? That is, for any document IDs D1 and
    D2 and any documents C1 and C2 (where C1 and C2 are the alphabetical
    representations of the documents, whatever that means) if D1 < D2 then C1 <
    C2?

    The short answer is that you can't insert a document into a Lucene index and
    have any control whatsoever about the assigned document ID. The assigned
    document ID is always greater than the maximum document ID already in your
    index.

    But it doesn't make sense to try. You have documents A, B, D that you index.
    They get IDs 1, 2, 3. Now you want to index document C. What sort of
    document ID would you expect? 2.5? do I completely misunderstand your
    problem?

    Would it work to just index a field for each document that contained the
    alphabetical representation and use that for retrieval ordering? I *think*
    you can use a FilteredTermEnum with a new Term("field", "") to enumerate all
    the terms in an index ( They are guaranteed to be in lexical order).
    Then you let lucene do your sorting I'm a little fuzzy on how to go from
    there to a document, but I suspect there's a way.

    Hope this helps
    Erick
  • No.2 | | 830 bytes | |

    All,

    I sent this the other day, but didn't get any responses. I'm hoping that it
    was just missed, so I'm trying again.

    There has to be a better way to to insert a document in to an index then
    reindexing everything.

    Wednesday 05 July 2006 5:06 pm, Jason Calabrese wrote:
    All,

    For performance reasons we keep our index of over a million documents
    ordered alphabeticaly. This way for an alpha sort we can just use the
    index order. This works very good, but I'm now looking for a way to insert
    a single document to the index in the correct position.

    Is there any standard way to do this?

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.3 | | 1598 bytes | |

    When you say you keep your documents ordered alphabetically, it's confusing
    to me. Are you saying that you pre-sort all your documents then insert them
    one after another so that automatically-generated internal Lucene ID maps
    exactly to the alphabetical ordering? That is, for any document IDs D1 and
    D2 and any documents C1 and C2 (where C1 and C2 are the alphabetical
    representations of the documents, whatever that means) if D1 < D2 then C1 <
    C2?

    Yes, this is a pre-sort. For our application we have some fairly large result
    sets and using the standard sort on a name field was too slow. By
    pre-sorting before we index we can make sure that all the docs are inserted
    in alpha order, and then sort them by index order just as fast or faster than
    the standard relvance sort.

    This:
    Hits hits = searcher.search(query, Sort.INDEXRDER);

    is much faster than:
    Hits hits = searcher.search(query, new Sort("fullname"));

    The short answer is that you can't insert a document into a Lucene index
    and have any control whatsoever about the assigned document ID. The
    assigned document ID is always greater than the maximum document ID already
    in your index.

    I know that there is no direct way to insert a doc a at a specified position
    with a single IndexWriter method, but it seems that there is a better way
    then reindexing everything.

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.4 | | 730 bytes | |

    Did you use a Hits object to assemble your results? And is that what you're
    measuring when you say it's slow? In other words, were you measuring the
    time it took to execute the statement

    Hits hits = searcher.search(query, new Sort("fullname"));

    or the time it took to iterate over the Hits object and do something? If the
    latter, your problem may really be the fact that the Hits object re-issues
    the search every 100 retrievals or so (this has been discussed in the mail
    archive) and you'd get satisfactory performance by using a lower-level
    interface HitCollector(?) TopDocs(?).

    , I haven't a clue, but you probably already realized that

    Best
    Erick
  • No.5 | | 1202 bytes | |

    We only display the 10 hits at a time, so we don't need to iterate through all
    the hits.

    It feels like there should be a way to pull a document out 1 index and stick
    it into an other and bring all the unstored fields along with it.

    Friday 07 July 2006 12:52, Erick Erickson wrote:
    Did you use a Hits object to assemble your results? And is that what
    you're measuring when you say it's slow? In other words, were you measuring
    the time it took to execute the statement

    Hits hits = searcher.search(query, new Sort("fullname"));

    or the time it took to iterate over the Hits object and do something? If
    the latter, your problem may really be the fact that the Hits object
    re-issues the search every 100 retrievals or so (this has been discussed in
    the mail archive) and you'd get satisfactory performance by using a
    lower-level interface HitCollector(?) TopDocs(?).

    , I haven't a clue, but you probably already realized that

    Best
    Erick

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org

Re: Inserting a document into an index at a specified position


max 4000 letters.
Your nickname that display:
In order to stop the spam: 9 + 8 =
QUESTION ON "Apache"

EMSDN.COM