Java

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Case-sensitive search

    11 answers - 259 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Is there any way to do a case-sensitive search?
    Thanks
    Tareque
    ControlDCS
    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.1 | | 832 bytes | |

    Aug 18, 2005, at 3:50 PM, tareque (AT) controldocs (DOT) com wrote:
    Is there any way to do a case-sensitive search?

    All Lucene searches are case-sensitive, actually.

    But most often a lowercasing analyzer is used. So the trick is to
    change the analysis process to not lowercase. It gets more fun when
    you need case sensitive or insensitive searching both in the same
    situation, where the trick is to either build two different indexes
    or to use different fields that use different analysis on the same
    text (though this gets tricky with QueryParser generated queries and
    the potential for user field selection).

    Erik

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.2 | | 1361 bytes | |

    Thanks! I have used StopAnalyzer to index. Does it lower-case before
    indexing? I don't touch the query string before sending for searching, so
    the query string is not lower-cases.

    The search really is case sensitive, it's just that all input is
    usually lower-cased, so it feels like it's case insensitive. In other
    words, don't lower-case your input before indexing, and don't
    lower-case your queries (i.e. pick an Analyzer that doesn't
    lower-case).

    --
    tareque (AT) controldocs (DOT) com wrote:
    >
    >Is there any way to do a case-sensitive search?
    >>

    >Thanks
    >Tareque
    >ControlDCS
    >>
    >>

    >
    >To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    >For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
    >>
    >>

    >
    >


    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.3 | | 1847 bytes | |

    , seems like it does is a LowerCaseFilter. Is there any analyzer that do
    the same thing as StopAnalyzer does, except for lowering the case? Cuz
    StopAnalyzer best fits my purpose.

    Thanks! I have used StopAnalyzer to index. Does it lower-case before
    indexing? I don't touch the query string before sending for searching, so
    the query string is not lower-cases.
    >
    >The search really is case sensitive, it's just that all input is
    >usually lower-cased, so it feels like it's case insensitive. In other
    >words, don't lower-case your input before indexing, and don't
    >lower-case your queries (i.e. pick an Analyzer that doesn't
    >lower-case).
    >>

    >
    >>
    >>

    >tareque (AT) controldocs (DOT) com wrote:
    >>

    Is there any way to do a case-sensitive search?

    Thanks
    Tareque
    ControlDCS

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org


    >>
    >>

    >
    >To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    >For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
    >>

    >
    >
    >


    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.4 | | 2198 bytes | |

    Aug 18, 2005, at 4:16 PM, tareque (AT) controldocs (DOT) com wrote:
    Thanks! I have used StopAnalyzer to index. Does it lower-case before
    indexing? I don't touch the query string before sending for
    searching, so
    the query string is not lower-cases.

    Pretty much all built-in Lucene analyzers lower-case:

    (scroll to the bottom to see the tokenStream method - the heart of an
    analyzer)

    The exception is the WhitespaceAnalyzer, which is probably not what
    you want to use. You can write your own Analyzer (copy/paste one and
    remove the lowercasing filter - though some analyzers use a
    lowercasing tokenizer, not a filter).

    Erik


    >
    >
    >The search really is case sensitive, it's just that all input is
    >usually lower-cased, so it feels like it's case insensitive. In
    >other
    >words, don't lower-case your input before indexing, and don't
    >lower-case your queries (i.e. pick an Analyzer that doesn't
    >lower-case).
    >>

    >
    >>
    >>

    >tareque (AT) controldocs (DOT) com wrote:
    >>
    >>

    Is there any way to do a case-sensitive search?

    Thanks
    Tareque
    ControlDCS

    -
    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org


    >>
    >>

    >
    >To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    >For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
    >>
    >>

    >
    >
    >


    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.5 | | 2671 bytes | |

    Thanks again! The analyzer is working now. But seems like actually the
    QueryParser I am using is probably converting the queries to lowercase
    first. Is there any way to stop that? Here is the line of code where I am
    parsing:

    Query query = QueryParser.parse(line, "contents", analyzer);

    As for analyzer, I have tried both StardaAnalyzer and StopAnalyzer.

    Aug 18, 2005, at 4:16 PM, tareque (AT) controldocs (DOT) com wrote:
    >Thanks! I have used StopAnalyzer to index. Does it lower-case before
    >indexing? I don't touch the query string before sending for
    >searching, so
    >the query string is not lower-cases.
    >

    Pretty much all built-in Lucene analyzers lower-case:

    (scroll to the bottom to see the tokenStream method - the heart of an
    analyzer)

    The exception is the WhitespaceAnalyzer, which is probably not what
    you want to use. You can write your own Analyzer (copy/paste one and
    remove the lowercasing filter - though some analyzers use a
    lowercasing tokenizer, not a filter).

    Erik
    >
    >
    >
    >>
    >>

    The search really is case sensitive, it's just that all input is
    usually lower-cased, so it feels like it's case insensitive. In
    other
    words, don't lower-case your input before indexing, and don't
    lower-case your queries (i.e. pick an Analyzer that doesn't
    lower-case).

    tareque (AT) controldocs (DOT) com wrote:

    Is there any way to do a case-sensitive search?

    Thanks
    Tareque
    ControlDCS

    -
    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org


    >>
    >>
    >>

    >
    >To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    >For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
    >>

    >
    >


    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.6 | | 736 bytes | |

    Thu, 2005-08-18 at 17:16, tareque (AT) controldocs (DOT) com wrote:
    Thanks again! The analyzer is working now. But seems like actually the
    QueryParser I am using is probably converting the queries to lowercase
    first. Is there any way to stop that? Here is the line of code where I am
    parsing:

    Query query = QueryParser.parse(line, "contents", analyzer);

    As for analyzer, I have tried both StardaAnalyzer and StopAnalyzer.

    You need to use the same analyzer for parsing queries as you do for
    indexing content.

    Luke Francl

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.7 | | 1050 bytes | |

    Thu, 2005-08-18 at 17:16, tareque (AT) controldocs (DOT) com wrote:
    >Thanks again! The analyzer is working now. But seems like actually the
    >QueryParser I am using is probably converting the queries to lowercase
    >first. Is there any way to stop that? Here is the line of code where I
    >am
    >parsing:
    >>

    >Query query = QueryParser.parse(line, "contents", analyzer);
    >>

    >As for analyzer, I have tried both StardaAnalyzer and StopAnalyzer.
    >

    You need to use the same analyzer for parsing queries as you do for
    indexing content.

    Luke Francl

    Actually I have used StopAn for indexing. So used the same for parsing
    queries, but it's still converting the queries to lowercase before running
    the actually search

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.8 | | 1814 bytes | |

    Aug 18, 2005, at 6:22 PM, tareque (AT) controldocs (DOT) com wrote:

    >Thu, 2005-08-18 at 17:16, tareque (AT) controldocs (DOT) com wrote:
    >>

    Thanks again! The analyzer is working now. But seems like
    actually the
    QueryParser I am using is probably converting the queries to
    lowercase
    first. Is there any way to stop that? Here is the line of code
    where I
    am
    parsing:

    Query query = QueryParser.parse(line, "contents", analyzer);

    As for analyzer, I have tried both StardaAnalyzer and StopAnalyzer.

    >>

    >You need to use the same analyzer for parsing queries as you do for
    >indexing content.
    >>

    >Luke Francl
    >>
    >>

    >

    Actually I have used StopAn for indexing. So used the same for parsing
    queries, but it's still converting the queries to lowercase before
    running
    the actually search

    Both of those analyzers lowercase. When you said it was working,
    what did you mean? To prevent lowercasing and get stop words
    removed you *will* have to write a custom analyzer. Also keep in
    mind that StopFilter is case-sensitive and that the stop word list is
    all lowercase - so you will need to account for this with a custom
    stop filter probably too.

    It is highly recommended to "analyze the analyzer" - a topic covered
    in depth in the Analysis chapter in Lucene in Action, and one of my
    java.net articles.

    Erik

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.9 | | 1965 bytes | |

    >
    Aug 18, 2005, at 6:22 PM, tareque (AT) controldocs (DOT) com wrote:

    Thu, 2005-08-18 at 17:16, tareque (AT) controldocs (DOT) com wrote:

    Thanks again! The analyzer is working now. But seems like
    actually the
    QueryParser I am using is probably converting the queries to
    lowercase
    first. Is there any way to stop that? Here is the line of code
    where I
    am
    parsing:

    Query query = QueryParser.parse(line, "contents", analyzer);

    As for analyzer, I have tried both StardaAnalyzer and StopAnalyzer.

    You need to use the same analyzer for parsing queries as you do for
    indexing content.

    Luke Francl


    >>

    >Actually I have used StopAn for indexing. So used the same for parsing
    >queries, but it's still converting the queries to lowercase before
    >running
    >the actually search
    >

    Both of those analyzers lowercase. When you said it was working,
    what did you mean? To prevent lowercasing and get stop words
    removed you *will* have to write a custom analyzer. Also keep in
    mind that StopFilter is case-sensitive and that the stop word list is
    all lowercase - so you will need to account for this with a custom
    stop filter probably too.

    It is highly recommended to "analyze the analyzer" - a topic covered
    in depth in the Analysis chapter in Lucene in Action, and one of my
    java.net articles.

    Erik

    It's all working now. I did write a custom analyzer using the
    StopAnalyzer, which correctly indexed. The problem was, when I was parsing
    the query I forgot to use my new analyzer and was using the old
    StopAnalyzer instead. Thanks for all the help!

    Tareque

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.10 | | 2333 bytes | |

    >>
    >Aug 18, 2005, at 6:22 PM, tareque (AT) controldocs (DOT) com wrote:
    >>

    Thu, 2005-08-18 at 17:16, tareque (AT) controldocs (DOT) com wrote:

    Thanks again! The analyzer is working now. But seems like
    actually the
    QueryParser I am using is probably converting the queries to
    lowercase
    first. Is there any way to stop that? Here is the line of code
    where I
    am
    parsing:

    Query query = QueryParser.parse(line, "contents", analyzer);

    As for analyzer, I have tried both StardaAnalyzer and StopAnalyzer.

    You need to use the same analyzer for parsing queries as you do for
    indexing content.

    Luke Francl

    Actually I have used StopAn for indexing. So used the same for parsing
    queries, but it's still converting the queries to lowercase before
    running
    the actually search
    >>

    >Both of those analyzers lowercase. When you said it was working,
    >what did you mean? To prevent lowercasing and get stop words
    >removed you *will* have to write a custom analyzer. Also keep in
    >mind that StopFilter is case-sensitive and that the stop word list is
    >all lowercase - so you will need to account for this with a custom
    >stop filter probably too.
    >>

    >It is highly recommended to "analyze the analyzer" - a topic covered
    >in depth in the Analysis chapter in Lucene in Action, and one of my
    >java.net articles.
    >>

    >Erik
    >>

    >

    It's all working now. I did write a custom analyzer using the
    StopAnalyzer, which correctly indexed. The problem was, when I was parsing
    the query I forgot to use my new analyzer and was using the old
    StopAnalyzer instead. Thanks for all the help!

    Tareque

    Is there any way to index as case-sensitive and then, while searching,
    making the search case-sensitive and case-insensitive using the same index
    as needed?

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org
  • No.11 | | 837 bytes | |

    Aug 22, 2005, at 10:40 AM, tareque (AT) controldocs (DOT) com wrote:
    Is there any way to index as case-sensitive and then, while searching,
    making the search case-sensitive and case-insensitive using the
    same index
    as needed?

    Not really. Terms in the index are ordered lexicographically,
    including case. It certainly would be possible to write customized
    Query subclasses to do this sort of thing at the expense of performance.

    The only techniques I'm aware of are to either build separate indexes
    or index the same information into separate fields of the same
    documents using different analyzers per field.

    Erik

    To unsubscribe, e-mail: java-user-unsubscribe (AT) lucene (DOT) apache.org
    For additional commands, e-mail: java-user-help (AT) lucene (DOT) apache.org

Re: Case-sensitive search


max 4000 letters.
Your nickname that display:
In order to stop the spam: 5 + 4 =
QUESTION ON "Java"

EMSDN.COM