DSM

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Finding a match in a large dataset - btrees?

    7 answers - 994 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    I have a large set of data (that will be stored in MySQL) that I wish to
    match to and am wondering what the best method is.
    Assume the following data in table LCATIN_MATCH:
    LCATIN_ID LCATIN_PATTERN PARENT_ID
    10 6
    11 4 10
    13 2 11
    12 9 11
    14 1 13
    15 2 13
    The string 6438 should return 11, 6421 14, 6422 15 and 6499 12.
    I've read a bit about btrees on the zope wiki and wonder if that's the best
    way. However I am struggling with the basics:
    1. How do I get the data from MySQL into a btree in Zope? Something like:
    from BTrees.IIBTree import *
    t = IIBTree()
    t.update() # errr, no
    2. How do I find the matching node i.e. when I want to know that 6422
    relates to location_id 15?
    Any help or pointers to further documentation would be appreciated.
    Regards
    Cameron
    Zope maillist - Zope (AT) zope (DOT) org
    ** No cross posts or HTML encoding! **
    (Related lists -
    )
  • No.1 | | 1111 bytes | |

    5. Dezember 2005 16:57:03 +1300 Cameron Beattie <kjcsb (AT) orcon (DOT) net.nz
    wrI've read a bit about btrees on the zope wiki and wonder if that's the
    best way. However I am struggling with the basics:
    1. How do I get the data from MySQL into a btree in Zope? Something like:
    from BTrees.IIBTree import *
    t = IIBTree()
    t.update() # errr, no

    Reading helps:

    <>

    2. How do I find the matching node i.e. when I want to know that 6422
    relates to location_id 15?

    I still have no idea what your example should tell me. A BTree basically
    implements the same API as a Python dictionary. If you can implement your
    solution in pure Python then you can just switch to BTrees. But I have no
    idea about your example especially BTrees implement a 1:1
    relationship (when using an IIBTree).
    -aj

    Zope maillist - Zope (AT) zope (DOT) org

    ** No cross posts or HTML encoding! **
    (Related lists -

    )

    PGP SIGNATURE
    Version: GnuPG v1.4.2 (Darwin)

    0o9KLI19Upvbtta1up/ANAk=
    =4exX
    PGP SIGNATURE
  • No.2 | | 533 bytes | |

    Mon, Dec 05, 2005 at 04:57:03PM +1300, Cameron Beattie wrote:
    I have a large set of data (that will be stored in MySQL) that I wish to
    match to and am wondering what the best method is.

    Assume the following data in table LCATIN_MATCH:
    LCATIN_ID LCATIN_PATTERN PARENT_ID
    10 6
    11 4 10
    13 2 11
    12 9 11
    14 1 13
    15 2 13

    The string 6438 should return 11, 6421 14, 6422 15 and 6499 12.

    I stopped reading here because I couldn't make any sense of the example.
    How does 6438 return 11?
  • No.3 | | 784 bytes | |

    12/5/05, Cameron Beattie <kjcsb (AT) orcon (DOT) net.nzwrote:
    I have a large set of data (that will be stored in MySQL) that I wish to
    match to and am wondering what the best method is.

    Assume the following data in table LCATIN_MATCH:
    LCATIN_ID LCATIN_PATTERN PARENT_ID
    10 6
    11 4 10
    13 2 11
    12 9 11
    14 1 13
    15 2 13

    The string 6438 should return 11, 6421 14, 6422 15 and 6499 12.

    Eh, what? K, I understand what is happening, but why?
    It seems to me that this is an implementation of a btree, but why? SQL
    databases have indexes, why not just use them? :)

    I've read a bit about btrees on the zope wiki and wonder if that's the best
    way.

    Probably, but only if you actually let them do the indexing. :-)
  • No.4 | | 1122 bytes | |

    Thanks for the many replies. I apologise for the original message which was
    obviously very unclear - I will try to correct that.

    Basically I want to match a telephone number to a rate table. Assume the
    following rate table:
    Telephone code Rate
    649 1
    6421 2
    6422 2.5
    64 1.5

    The following table sets out the appropriate rate to return for a given
    telephone number
    Dialled number Returned rate
    649123 1
    643123 1.5
    6421111 2
    64221 2.5

    Things to note:
    If there is no match the right-most digit is removed until there is a match
    e.g. 643123 matches to 64
    The length of the "dialled number" is not consistent e.g. 649123 vs 64221
    The length of the "telephone code" is not consistent e.g. 64 vs 6422

    I want to do this frequently and at low cost i.e. ideally in memory. Perhaps
    the best way is to write a procedure in MySQL however I am interested in any
    python-based alternatives.

    Regards

    Cameron

    Zope maillist - Zope (AT) zope (DOT) org

    ** No cross posts or HTML encoding! **
    (Related lists -

    )
  • No.5 | | 597 bytes | |

    6. Dezember 2005 14:55:08 +1300 Cameron Beattie <kjcsb (AT) orcon (DOT) net.nz
    wrote:
    I want to do this frequently and at low cost i.e. ideally in memory.
    Perhaps the best way is to write a procedure in MySQL however I am
    interested in any python-based alternatives.

    I pointed you already to the BTrees documentation.
    -aj

    Zope maillist - Zope (AT) zope (DOT) org

    ** No cross posts or HTML encoding! **
    (Related lists -

    )

    PGP SIGNATURE
    Version: GnuPG v1.4.2 (Darwin)

    PZoejdAnmx4bhPI3yjVjxFY=
    =wvVn
    PGP SIGNATURE
  • No.6 | | 1295 bytes | |

    6. Dezember 2005 14:55:08 +1300 Cameron Beattie <kjcsb (AT) orcon (DOT) net.nz
    wrote:

    I want to do this frequently and at low cost i.e. ideally in memory.
    Perhaps the best way is to write a procedure in MySQL however I am
    interested in any python-based alternatives.

    Your problem is more an algorithmic problem than q question of the data
    representation. Store the mapping as some BTree (possibly IIBTree using
    normalized integers for the values). Then you have to iterate over all
    prefixes of the dialed number starting with the longest prefix to the
    smallest prefix. Then perform a lookup of the prefix in your BTree
    this is should not take more than five lines of code and requires a maximum
    of #(length_of_dialed_number) comparisons. If you want it more complicated
    construct a tree and you'll have a logarithmic number of comparisons. I
    doubt that it is worth the effort since the first solution should be fast
    enough unless your dialed number is hundreds of digits long :-)
    -aj

    Zope maillist - Zope (AT) zope (DOT) org

    ** No cross posts or HTML encoding! **
    (Related lists -

    )

    PGP SIGNATURE
    Version: GnuPG v1.4.2 (Darwin)

    NT9JEdWryfFiU3gYJ1B2omU=
    =BcEQ
    PGP SIGNATURE
  • No.7 | | 796 bytes | |

    Am Dienstag, den 06.12.2005, 14:55 +1300 schrieb Cameron Beattie:
    Thanks for the many replies. I apologise for the original message which was
    obviously very unclear - I will try to correct that.

    I want to do this frequently and at low cost i.e. ideally in memory. Perhaps
    the best way is to write a procedure in MySQL however I am interested in any
    python-based alternatives.

    *hint* use postgres instead where you can write stored functions in
    python too :-)
    (With some little modifications on the zope database adaptor you
    could even return a dictionary or btree object as result
    of your query then :-)

    SCNR
    Tino

    Zope maillist - Zope (AT) zope (DOT) org

    ** No cross posts or HTML encoding! **
    (Related lists -

    )

Re: Finding a match in a large dataset - btrees?


max 4000 letters.
Your nickname that display:
In order to stop the spam: 5 + 4 =
QUESTION ON "DSM"

EMSDN.COM