Apache

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • bad queryparser bug

    15 answers - 472 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    I have discovered a serious bug in QueryParser. The following query:
    contents:sales && contents:marketing || contents:industrial &&
    contents:sales
    is parsed as:
    +contents:sales +contents:marketing +contents:industrial +contents:sales
    The same parsed query occurs even with parenthesis:
    (contents:sales && contents:marketing) || (contents:industrial &&
    contents:sales)
    Is there any way around this bug?
    Thanks,
    Peter
  • No.1 | | 790 bytes | |

    Correction:

    The query parser produces the correct query with the parenthesis.
    But, I'm still looking for a fix for this. I could use some advice on where
    to look in QueryParser to fix this.

    Thanks,
    Peter

    2/1/07, Peter Keegan <peterlkeegan (AT) gmail (DOT) comwrote:

    I have discovered a serious bug in QueryParser. The following query:
    contents:sales && contents:marketing || contents:industrial &&
    contents:sales

    is parsed as:
    +contents:sales +contents:marketing +contents:industrial +contents:sales

    The same parsed query occurs even with parenthesis:
    (contents:sales && contents:marketing) || (contents:industrial &&
    contents:sales)

    Is there any way around this bug?

    Thanks,
    Peter
    --
  • No.2 | | 790 bytes | |

    Correction:

    The query parser produces the correct query with the parenthesis.
    But, I'm still looking for a fix for this. I could use some advice on where
    to look in QueryParser to fix this.

    Thanks,
    Peter

    2/1/07, Peter Keegan <peterlkeegan (AT) gmail (DOT) comwrote:

    I have discovered a serious bug in QueryParser. The following query:
    contents:sales && contents:marketing || contents:industrial &&
    contents:sales

    is parsed as:
    +contents:sales +contents:marketing +contents:industrial +contents:sales

    The same parsed query occurs even with parenthesis:
    (contents:sales && contents:marketing) || (contents:industrial &&
    contents:sales)

    Is there any way around this bug?

    Thanks,
    Peter
    --
  • No.3 | | 1202 bytes | |

    i seem to be having a problem analogous to this one (no answer that i
    see):

    search_string=cannot%20overwrite;#32268

    trouble is, i just put lucene on my new macbook pro and am having the
    problem that if i build a large index, i get an I/ error due to
    something like

    java.io.IException: Cannot overwrite: /data/reuters/indexes/reuters/
    deleteable.new

    same code worked fine on my previous machine (still running on a G4
    powerbook and a linux machine). sometimes it has trouble writing the
    segments file instead

    has anyone seen and solved this problem? thoughts on what might be
    behind it?

    thanks,
    -Miles

    Feb 1, 2007, at 2:57 PM, Peter Keegan wrote:

    I have discovered a serious bug in QueryParser. The following query:
    contents:sales && contents:marketing || contents:industrial &&
    contents:sales

    is parsed as:
    +contents:sales +contents:marketing +contents:industrial
    +contents:sales

    The same parsed query occurs even with parenthesis:
    (contents:sales && contents:marketing) || (contents:industrial &&
    contents:sales)

    Is there any way around this bug?

    Thanks,
    Peter
  • No.4 | | 1151 bytes | |

    Miles Efron wrote:
    i seem to be having a problem analogous to this one (no answer that i see):

    %20overwrite;#32268

    trouble is, i just put lucene on my new macbook pro and am having the
    problem that if i build a large index, i get an I/ error due to
    something like

    java.io.IException: Cannot overwrite:
    /

    same code worked fine on my previous machine (still running on a G4
    powerbook and a linux machine). sometimes it has trouble writing the
    segments file instead

    has anyone seen and solved this problem? thoughts on what might be
    behind it?

    Are you running Windows on your macbook pro?

    There are known issues like this, but only on Windows, eg:

    We believe such cases are now fixed by lockless commits, on the trunk
    of Lucene (which is not yet released). If you could try the trunk
    (but beware that API, file formats, can change) and see if this still
    happens that'd be great!

    Mike

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.5 | | 1828 bytes | |

    Mike,

    You rule. Swapping out the nightly build seems to have fixed the
    problem tried it on two problematic cases and both worked.

    For the record, I'm running mac os 10.4.8.

    Do you know if the lockless commits will be included in the next
    stable release?

    Thanks so much!
    -Miles

    Feb 1, 2007, at 3:33 PM, Michael McCandless wrote:

    Miles Efron wrote:
    >i seem to be having a problem analogous to this one (no answer
    >that i see):
    >
    >search_string=cannot%20overwrite;#32268 trouble is, i just put
    >lucene on my new macbook pro and am having the problem that if i
    >build a large index, i get an I/ error due to something like
    >java.io.IException: Cannot overwrite: /data/reuters/indexes/
    >reuters/deleteable.new
    >same code worked fine on my previous machine (still running on a
    >G4 powerbook and a linux machine). sometimes it has trouble
    >writing the segments file instead
    >has anyone seen and solved this problem? thoughts on what might
    >be behind it?
    >

    Are you running Windows on your macbook pro?

    There are known issues like this, but only on Windows, eg:

    We believe such cases are now fixed by lockless commits, on the trunk
    of Lucene (which is not yet released). If you could try the trunk
    (but beware that API, file formats, can change) and see if this still
    happens that'd be great!

    Mike

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.6 | | 1091 bytes | |

    K, I see that I'm not the first to discover this behavior of QueryParser.
    Can anyone vouch for the integrity of the PrecedenceQueryParser here:

    Thanks,
    Peter

    2/1/07, Peter Keegan <peterlkeegan (AT) gmail (DOT) comwrote:

    Correction:

    The query parser produces the correct query with the parenthesis.
    But, I'm still looking for a fix for this. I could use some advice on
    where to look in QueryParser to fix this.

    Thanks,
    Peter

    2/1/07, Peter Keegan <peterlkeegan (AT) gmail (DOT) comwrote:

    I have discovered a serious bug in QueryParser. The following query:
    contents:sales && contents:marketing || contents:industrial &&
    contents:sales

    is parsed as:
    +contents:sales +contents:marketing +contents:industrial +contents:sales
    --
    The same parsed query occurs even with parenthesis:
    (contents:sales && contents:marketing) || (contents:industrial &&
    contents:sales)

    Is there any way around this bug?

    Thanks,
    Peter
    >
    >
    >
  • No.7 | | 1091 bytes | |

    K, I see that I'm not the first to discover this behavior of QueryParser.
    Can anyone vouch for the integrity of the PrecedenceQueryParser here:

    Thanks,
    Peter

    2/1/07, Peter Keegan <peterlkeegan (AT) gmail (DOT) comwrote:

    Correction:

    The query parser produces the correct query with the parenthesis.
    But, I'm still looking for a fix for this. I could use some advice on
    where to look in QueryParser to fix this.

    Thanks,
    Peter

    2/1/07, Peter Keegan <peterlkeegan (AT) gmail (DOT) comwrote:

    I have discovered a serious bug in QueryParser. The following query:
    contents:sales && contents:marketing || contents:industrial &&
    contents:sales

    is parsed as:
    +contents:sales +contents:marketing +contents:industrial +contents:sales
    --
    The same parsed query occurs even with parenthesis:
    (contents:sales && contents:marketing) || (contents:industrial &&
    contents:sales)

    Is there any way around this bug?

    Thanks,
    Peter
    >
    >
    >
  • No.8 | | 1105 bytes | |

    Miles Efron wrote:

    You rule. Swapping out the nightly build seems to have fixed the
    problem tried it on two problematic cases and both worked.

    Phew!

    For the record, I'm running mac os 10.4.8.

    Uh-oh, I can't explain why you would hit these errors on S X 10.4.8;
    we have only seen these one Windows.

    Are you sure switching to trunk has fixed it? Lockless commits makes
    Lucene "write once" so this works around a number of file system
    "quirks". Still it'd be good to get to your root cause.

    Is the index stored on a remote (Windows CIFS) mount? is it stored
    on a local (Mac S HFS+) drive?

    Do you know if the lockless commits will be included in the next stable
    release?

    Yes this will be included in 2.1 -- I think 2.1 will be released soon
    (there's been discussions on the dev list to get the release process
    started soon).

    Mike

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.9 | | 2007 bytes | |

    This is a ton of discussion on this if you search the lucene user list
    (QueryParser and precendence and the 'binary' operators). I have seen
    many mentions of the precedence parser still having open issues but no
    mention of what those issues are.

    Peter Keegan wrote:
    K, I see that I'm not the first to discover this behavior of
    QueryParser.
    Can anyone vouch for the integrity of the PrecedenceQueryParser here:

    --
    Thanks,
    Peter

    2/1/07, Peter Keegan <peterlkeegan (AT) gmail (DOT) comwrote:
    >>

    >Correction:
    >>

    >The query parser produces the correct query with the parenthesis.
    >But, I'm still looking for a fix for this. I could use some advice on
    >where to look in QueryParser to fix this.
    >>

    >Thanks,
    >Peter
    >>

    >2/1/07, Peter Keegan <peterlkeegan (AT) gmail (DOT) comwrote:
    >>

    >I have discovered a serious bug in QueryParser. The following query:
    >contents:sales && contents:marketing || contents:industrial &&
    >contents:sales
    >>

    >is parsed as:
    >+contents:sales +contents:marketing +contents:industrial
    >+contents:sales
    >>
    >>

    >The same parsed query occurs even with parenthesis:
    >(contents:sales && contents:marketing) || (contents:industrial &&
    >contents:sales)
    >>

    >Is there any way around this bug?
    >>

    >Thanks,
    >Peter
    >>
    >>
    >>

    >


    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.10 | | 1970 bytes | |

    I really don't know why os x could have induced those kinds of
    filesystem issues. i assumed that since i had switched over to the
    intel architecture that perhaps something was going on with the
    JVMeverything involved in the process was mac; local filesystem, etc.

    but i'm fairly sure that the trunk code has fixed the problem. i ran
    two 'offending' bits of code and checked their results. not only did
    they finish (quite a feat today), but they did so correctly.
    -Miles

    Feb 1, 2007, at 4:19 PM, Michael McCandless wrote:

    Miles Efron wrote:
    >
    >You rule. Swapping out the nightly build seems to have fixed the
    >problem tried it on two problematic cases and both worked.
    >

    Phew!
    >
    >For the record, I'm running mac os 10.4.8.
    >

    Uh-oh, I can't explain why you would hit these errors on S X 10.4.8;
    we have only seen these one Windows.

    Are you sure switching to trunk has fixed it? Lockless commits makes
    Lucene "write once" so this works around a number of file system
    "quirks". Still it'd be good to get to your root cause.

    Is the index stored on a remote (Windows CIFS) mount? is it stored
    on a local (Mac S HFS+) drive?
    >
    >Do you know if the lockless commits will be included in the next
    >stable release?
    >

    Yes this will be included in 2.1 -- I think 2.1 will be released soon
    (there's been discussions on the dev list to get the release process
    started soon).

    Mike

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.11 | | 801 bytes | |

    Miles Efron wrote:
    I really don't know why os x could have induced those kinds of
    filesystem issues. i assumed that since i had switched over to the
    intel architecture that perhaps something was going on with the
    JVMeverything involved in the process was mac; local filesystem, etc.

    but i'm fairly sure that the trunk code has fixed the problem. i ran
    two 'offending' bits of code and checked their results. not only did
    they finish (quite a feat today), but they did so correctly.

    K I will keep my fingers crossed that there isn't another issue
    lurking :)

    Mike

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.12 | | 406 bytes | |

    please do not cross post questions about using the Lucene API to both the
    user and dev mailing lists -- the user list is the correct place to ask
    questions about behavior you are seeing that you think may be a bug.

    -Hoss

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.13 | | 730 bytes | |

    : The query parser produces the correct query with the parenthesis.
    : But, I'm still looking for a fix for this. I could use some advice on where
    : to look in QueryParser to fix this.

    the best advice i can give you: don't use the binary operators.

    * Lucene is not a boolean logic system
    * BooleanQuery does not impliment boolean logic
    * QueryParser is not a boolean language parser

    (If i could go back in time and stop the AND/R/NT/&&/|| "aliases" from
    being added to the QueryParser -- i would)

    -Hoss

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.14 | | 1296 bytes | |

    (If i could go back in time and stop the AND/R/NT/&&/|| "aliases" from
    being added to the QueryParser -- i would)

    Yes, this is the cause of the confusion. users are accustomed to the
    boolean logic syntax from a legacy search engine (also common to many other
    engines). We'll have to convert them into native QueryParser syntax as
    possible.

    Sorry for the cross post.

    Thanks,
    Peter

    2/2/07, Chris Hostetter <hossman_lucene (AT) fucit (DOT) orgwrote:
    --
    : The query parser produces the correct query with the parenthesis.
    : But, I'm still looking for a fix for this. I could use some advice on
    where
    : to look in QueryParser to fix this.

    the best advice i can give you: don't use the binary operators.

    * Lucene is not a boolean logic system
    * BooleanQuery does not impliment boolean logic
    * QueryParser is not a boolean language parser

    (If i could go back in time and stop the AND/R/NT/&&/|| "aliases" from
    being added to the QueryParser -- i would)
    >
    >
    >

    -Hoss
    --

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
    --
  • No.15 | | 996 bytes | |

    Feb 1, 2007, at 5:03 PM, Peter Keegan wrote:
    K, I see that I'm not the first to discover this behavior of
    QueryParser.
    Can anyone vouch for the integrity of the PrecedenceQueryParser here:

    PrecedenceQueryParser was my tinkering attempt to make it more
    logically handle precedence. I don't recall the exact issues that
    occur, though a JIRA issue was just filed with one:

    <>
    "NT foo AND baz" is parsed as "-(+foo +baz)" instead of "-foo
    +bar".
    (I'm setting parser.setD
    () but the issue applies otherwise
    too.)

    I believe the test case points out some potential issues. In other
    words, PrecedenceQueryParser is a work-in-progress that I no longer
    am working on myself. Improvements to it welcome. Query parsing is
    tricky business!

    Erik

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org

Re: bad queryparser bug


max 4000 letters.
Your nickname that display:
In order to stop the spam: 3 + 3 =
QUESTION ON "Apache"

EMSDN.COM