MYSQL

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Fulltext relevance and weighting....

    3 answers - 1655 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Does anyone have any suggestions on how to solve the result weighting
    problem? I have a client whose search results are becoming more and more
    important, and the relevance demands on the results are not entirely
    satisfactory
    The fields that are searched are code, name, small description and large
    description, and are ranked in relevance in that order.
    For example, a product with the name: "Bone-In Serrano Ham" should ALWAYS
    outweigh the product with the name of "Boneless Jamon Iberico", even if the
    Jamon Iberico has the word "ham" in the description 20 times more than the
    Serrano product
    The query that is being run is:
    select *,match(code,name,small_desc,large_desc) against ('ham*') as score
    from products where active='y' and site like '%,1,%' and
    match(code,name,small_desc,large_desc) against ('ham*' IN BLEAN MDE)
    order by score desc
    It returns some good relevant matches, but then in the middle of products
    names with "ham" in them, it returns one without
    Does this require a complete logic switch, or is there a way to build a
    query to do this?
    the actual build of the query is more complex, and there are other
    rules that need to be applied to the user submitted query, but this is the
    basics
    If there is a Fulltext search relevance expert out there in list land, I am
    at my wits end trying to make the results the most relevant that they can be
    - I am willing to work closely with (pay) someone with the knowledge and
    expertise to assist in this. (using PHP)
    TIA - hope someone can help!
  • No.1 | | 2599 bytes | |

    Mike,
    It's likely because MySQL fulltext index defaults to a word length
    of 4 characters or more. So "Ham" probably is not being indexed. You will
    need to change the minimum word length. See

    Mike
    (P.S. Send me a free ham while you're at it.<g>)

    At 02:02 PM 1/31/2007, Mike Morton wrote:
    >Does anyone have any suggestions on how to solve the result weighting
    >problem? I have a client whose search results are becoming more and more
    >important, and the relevance demands on the results are not entirely
    >satisfactory
    >
    >The fields that are searched are code, name, small description and large
    >description, and are ranked in relevance in that order.
    >
    >For example, a product with the name: "Bone-In Serrano Ham" should ALWAYS
    >outweigh the product with the name of "Boneless Jamon Iberico", even if the
    >Jamon Iberico has the word "ham" in the description 20 times more than the
    >Serrano product
    >
    >The query that is being run is:
    >select *,match(code,name,small_desc,large_desc) against ('ham*') as score
    >from products where active='y' and site like '%,1,%' and
    >match(code,name,small_desc,large_desc) against ('ham*' IN BLEAN MDE)
    >order by score desc
    >
    >It returns some good relevant matches, but then in the middle of products
    >names with "ham" in them, it returns one without
    >
    >Does this require a complete logic switch, or is there a way to build a
    >query to do this?
    >

    the actual build of the query is more complex, and there are other
    >rules that need to be applied to the user submitted query, but this is the
    >basics
    >
    >If there is a Fulltext search relevance expert out there in list land, I am
    >at my wits end trying to make the results the most relevant that they can be
    >- I am willing to work closely with (pay) someone with the knowledge and
    >expertise to assist in this. (using PHP)
    >
    >TIA - hope someone can help!
    >


    >Cheers
    >
    >Mike Morton
    >


    >*
    >* Tel: 905-465-1263
    >* Email: mike (AT) webtraxx (DOT) com
    >*


    >
    >
    >
    >


    >MySQL General Mailing List
    >For list archives: http://lists.mysql.com/mysql
    >To unsubscribe: @fastmail.fm
  • No.2 | | 3057 bytes | |

    In the last episode (Jan 31), Mike Morton said:
    Mike:

    :) I wish! Free ham for everyone!

    I had already changed the min length to 2 actually - so that is not the
    affecting factor

    It is more of an issue to prioritizing fields for relevance, and whether it
    is possible to do this within a fulltext query, or whether it needs to be
    done through multiple queries, and then "outside" php processing of those
    query results

    You should be able to do what you need by making your 'score'
    expression something like this:

    select *,
    match(code) against ('ham*' in boolean mode) * 8 +
    match(name) against ('ham*' in boolean mode) * 4 +
    match(small_desc) against ('ham*' in boolean mode) * 2 +
    match(large_desc) against ('ham*' in boolean mode)
    as score from products where active='y' and site like '%,1,%' and
    match(code,name,small_desc,large_desc) against ('ham*' IN BLEAN MDE)
    order by score desc

    This takes advantage of the fact that boolean mode matches always
    return 1 or 0, so a record matching in the "code" field will sort
    higher than a record with "ham" in all 3 of the others but not in
    "code".

    >Does anyone have any suggestions on how to solve the result
    >weighting problem? I have a client whose search results are
    >becoming more and more important, and the relevance demands on the
    >results are not entirely satisfactory
    >
    >The fields that are searched are code, name, small description and
    >large description, and are ranked in relevance in that order.
    >
    >For example, a product with the name: "Bone-In Serrano Ham" should
    >ALWAYS outweigh the product with the name of "Boneless Jamon
    >Iberico", even if the Jamon Iberico has the word "ham" in the
    >description 20 times more than the Serrano product
    >
    >The query that is being run is: select
    >*,match(code,name,small_desc,large_desc) against ('ham*') as score
    >from products where active='y' and site like '%,1,%' and
    >match(code,name,small_desc,large_desc) against ('ham*' IN BLEAN
    >MDE) order by score desc
    >
    >It returns some good relevant matches, but then in the middle of
    >products names with "ham" in them, it returns one without
    >
    >Does this require a complete logic switch, or is there a way to
    >build a query to do this?
    >
    >the actual build of the query is more complex, and there
    >are other rules that need to be applied to the user submitted
    >query, but this is the basics
    >
    >If there is a Fulltext search relevance expert out there in list
    >land, I am at my wits end trying to make the results the most
    >relevant that they can be - I am willing to work closely with
    >(pay) someone with the knowledge and expertise to assist in this.
    >(using PHP)
  • No.3 | | 5836 bytes | |

    Guys,
    I doubt I'd qualify as an expert but here's my two pence worth ( ;^)
    )I wrote a search engine a while back that relies heavily on full-text
    searching and the three things I found that improved results were

    1) Precisely what Dan explains, doing extra biasing per field in the SQL
    with whatever extra info you've got is best. I had fields like
    "seriesFK" that were null if it was a one off TV episode so I used the
    result of (isnull(seriesFK) + 1) * {some multiplier}) and similar to
    shove a whole TV Series above it's episodes. Perhaps you'd want to bias
    the small_desc field more than the large_desc etc

    2) Adding/removing stop words to the |ft_stopword_file| and changing the
    word length as you've already done.

    3) Fiddling with the search requests before they get put into the SQL, I
    was dealing with various people's names as well so and I knew they were
    so I've got a bit of PHP that doubles two word queries up but with one
    swapped around, so if the user enters "George Harrison" they'd actually
    get a search for "George Harrison" and "Harrison, George".

    Some of those you've done or are domain specific (i.e. the names),
    sorry. ||Some other ideas I didn't get to try out were

    1) You can use the myisam_ftdump utility to dump the actual weightings
    MySQL generates, load those stats back in and use it to generate a new
    "meta" table then use that in combination with feedback about requested
    search terms and followed links to make an engine that could to some
    extent "learn".

    2) the soundex() function would allow you to handle spelling mistakes
    and might be of some use in your problem to bias those results that
    sound most like the term? See
    and search
    for sondex.

    The problem I found is that the clients (the people who are paying for
    the search engine) knew, as expert in their subject what exact order
    they'd like the results to come back in but you hit the problem that you
    can't readily program that knowledge into a computer no matter how hard
    you try or how fancy your algorithms get. I can't tell you the pain
    "Charlie Chaplin" caused me, his real name is actually Charles but
    without lots of extra contextual information to hand you can't program
    that sort of knowledge and exception into a computer without spending
    lots of time or money.

    I wrote the search engine bit inside
    and they were happy with it once
    we'd finished but I think this sort of area is somewhere you could spend
    a lot of effort making little difference on small or medium sized
    projects, there is a certain amount of "tweaking" that can be done but
    eventually you'd need to move to a solution that is significantly more
    technologically complex i.e. expensive & time consuming before achieve
    noticeably better results.

    If you want I'd be happy to outline and expand on some ideas.

    Regards,
    Phil

    >It is more of an issue to prioritizing fields for relevance, and whether it
    >is possible to do this within a fulltext query, or whether it needs to be
    >done through multiple queries, and then "outside" php processing of those
    >query results
    >
    >

    You should be able to do what you need by making your 'score'
    expression something like this:

    select *,
    match(code) against ('ham*' in boolean mode) * 8 +
    match(name) against ('ham*' in boolean mode) * 4 +
    match(small_desc) against ('ham*' in boolean mode) * 2 +
    match(large_desc) against ('ham*' in boolean mode)
    as score from products where active='y' and site like '%,1,%' and
    match(code,name,small_desc,large_desc) against ('ham*' IN BLEAN MDE)
    order by score desc

    This takes advantage of the fact that boolean mode matches always
    return 1 or 0, so a record matching in the "code" field will sort
    higher than a record with "ham" in all 3 of the others but not in
    "code".

    Does anyone have any suggestions on how to solve the result
    weighting problem? I have a client whose search results are
    becoming more and more important, and the relevance demands on the
    results are not entirely satisfactory

    The fields that are searched are code, name, small description and
    large description, and are ranked in relevance in that order.

    For example, a product with the name: "Bone-In Serrano Ham" should
    ALWAYS outweigh the product with the name of "Boneless Jamon
    Iberico", even if the Jamon Iberico has the word "ham" in the
    description 20 times more than the Serrano product

    The query that is being run is: select
    *,match(code,name,small_desc,large_desc) against ('ham*') as score
    from products where active='y' and site like '%,1,%' and
    match(code,name,small_desc,large_desc) against ('ham*' IN BLEAN
    MDE) order by score desc

    It returns some good relevant matches, but then in the middle of
    products names with "ham" in them, it returns one without

    Does this require a complete logic switch, or is there a way to
    build a query to do this?

    the actual build of the query is more complex, and there
    are other rules that need to be applied to the user submitted
    query, but this is the basics

    If there is a Fulltext search relevance expert out there in list
    land, I am at my wits end trying to make the results the most
    relevant that they can be - I am willing to work closely with
    (pay) someone with the knowledge and expertise to assist in this.
    (using PHP

Re: Fulltext relevance and weighting....


max 4000 letters.
Your nickname that display:
In order to stop the spam: 6 + 6 =
QUESTION ON "MYSQL"

EMSDN.COM