Databases

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • LIKE and GLOB bug with numbers?

    6 answers - 1747 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Hello,
    I've populated a datafile with 40.176 records which contain file
    attributes and file paths. I have two columns, CMKey and CMValues.
    The column CMKey contains the path to the file and the column
    CMValues contains the attribute values. For example:
    CMKey: Application
    CMValues:
    (
    0,
    NSFileTypeRegular,
    1,
    21508,
    0,
    staff,
    234881026,
    294022,
    2004-12-16 10:11:35 -0800,
    tciuro,
    384,
    2006-03-26 08:35:55 -0800,
    502,
    20
    )
    Both columns are of type TEXT.
    This is what I've found:
    1) SELECT * FRM FinderFiles WHERE CMKey GLB '*AbiWord.Profile*'
    returns 1 match. This is correct.
    2) SELECT * FRM FinderFiles WHERE CMKey LIKE '%ABIWRD.Profile%'
    returns 1 match. This is correct.
    3) SELECT * FRM FinderFiles WHERE CMValues GLB '*2004-12-16
    10:11:35 -0800*' returns 40.176 matches. This is not correct. There
    is no way I created these 40.176 file at the *very same* time. Just
    to be sure, I looked at one random file (of the 40.176) and I've
    obtained the following creation date attribute:
    NSFileCreationDate = 2004-02-21 06:12:43 -0800;
    The same problem occurs if I perform the query:
    SELECT * FRM FinderFiles WHERE CMValues LIKE '%2004-12-16 10:11:35
    -0800%'
    This problem seems to occur when trying to match something with numbers:
    - If I look for NSFilePosixPermissions 448 (which I know exists) I
    get zero matches
    - If I look for strings, such as in step #1 or #2, it works fine.
    Something is wrong, I just can't figure out why
    Any ideas? Is this a bug?
    Thanks,
    -- Tito
  • No.1 | | 2814 bytes | |

    There may be an issue, but
    Your design is fundamentaly wrong.
    I don't know what your intended use
    is for this data, but I am logging identical fstat file info along with an
    MD5 sums.

    Each informational element needs
    to be stored in an individual column.

    Stuffing all those fields into a single
    string that needs to be parsed to find
    ownership or last mod times removes
    all the benefit from having it in a database in the first place.

    Stuff it into a sorted flat file.
    that would be faster and simpler.

    A database would be 'much better' IMH but only if it it properly
    'normalized'.

    Mgc

    Sent with SnapperMail
    www.snappermail.com

    Message
    Sun, 26 Mar 2006 09:50:31 -0800 "Tito Ciuro" <tciuro (AT) mac (DOT) comwrote:
    >Hello,
    >
    >I've populated a datafile with 40.176 records which contain file
    >attributes and file paths. I have two columns, CMKey and CMValues.
    >The column CMKey contains the path to the file and the column
    >CMValues contains the attribute values. For example:
    >
    >CMKey: Application
    >
    >CMValues:
    >(

    0,
    NSFileTypeRegular,
    1,
    21508,
    0,
    staff,
    234881026,
    294022,
    2004-12-16 10:11:35 -0800,
    tciuro,
    384,
    2006-03-26 08:35:55 -0800,
    502,
    20
    >)
    >
    >Both columns are of type TEXT.
    >
    >This is what I've found:
    >
    >1) SELECT * FRM FinderFiles WHERE CMKey GLB '*AbiWord.Profile*'
    >returns 1 match. This is correct.
    >
    >2) SELECT * FRM FinderFiles WHERE CMKey LIKE '%ABIWRD.Profile%'
    >returns 1 match. This is correct.
    >
    >3) SELECT * FRM FinderFiles WHERE CMValues GLB '*2004-12-16
    >10:11:35 -0800*' returns 40.176 matches. This is not correct. There
    >is no way I created these 40.176 file at the *very same* time. Just
    >to be sure, I looked at one random file (of the 40.176) and I've
    >obtained the following creation date attribute:
    >
    >NSFileCreationDate = 2004-02-21 06:12:43 -0800;
    >
    >The same problem occurs if I perform the query:
    >
    >SELECT * FRM FinderFiles WHERE CMValues LIKE '%2004-12-16 10:11:35
    >-0800%'
    >
    >This problem seems to occur when trying to match something with numbers:
    >
    >- If I look for NSFilePosixPermissions 448 (which I know exists) I
    >get zero matches
    >- If I look for strings, such as in step #1 or #2, it works fine.
    >
    >Something is wrong, I just can't figure out why
    >
    >Any ideas? Is this a bug?
    >
    >Thanks,
    >

    Tito
  • No.2 | | 572 bytes | |

    26/03/2006, at 10:51, MGC wrote:

    Your design is fundamentaly wrong.
    I don't know what your intended use
    is for this data, but I am logging identical fstat file info along
    with an
    MD5 sums.

    Well if you don't know what is the intended use for the data, how
    can you say that my design is fundamentally wrong? :-)

    It's not wrong. That's the way it has to be. Now, if I could match
    the data properly with LIKE and GLB, that would be great.

    Thanks for your response though.

    Regards,
    -- Tito
  • No.3 | | 795 bytes | |

    Tito Ciuro wrote:
    26/03/2006, at 10:51, MGC wrote:

    >Your design is fundamentaly wrong.
    >I don't know what your intended use
    >is for this data, but I am logging identical fstat file info along
    >with an
    >MD5 sums.


    Well if you don't know what is the intended use for the data, how
    can you say that my design is fundamentally wrong? :-)

    It's not wrong. That's the way it has to be. Now, if I could match the
    data properly with LIKE and GLB, that would be great.

    Thanks for your response though.

    Regards,
    -- Tito
    LIKE and GLB do a row scan, and give you none of the advantages of an
    RDBMS. Why not use a flat file and grep and get simplicity and greater
    speed?
  • No.4 | | 1264 bytes | |

    John Stanton wrote:
    Tito Ciuro wrote:
    >26/03/2006, at 10:51, MGC wrote:
    >>

    Your design is fundamentaly wrong.
    I don't know what your intended use
    is for this data, but I am logging identical fstat file info along
    with an
    MD5 sums.
    >>
    >>

    >Well if you don't know what is the intended use for the data, how
    >can you say that my design is fundamentally wrong? :-)
    >>

    >It's not wrong. That's the way it has to be. Now, if I could match
    >the data properly with LIKE and GLB, that would be great.
    >>

    >Thanks for your response though.
    >>

    >Regards,
    >>

    >-- Tito

    LIKE and GLB do a row scan, and give you none of the advantages of an
    RDBMS. Why not use a flat file and grep and get simplicity and greater
    speed?

    even simply split your CSV attributes array into separate columns?
    at least those values which you need to use in queries if you're
    resisting doing that?

    Cheers!
  • No.5 | | 827 bytes | |

    Hello everybody,

    26/03/2006, at 10:08, John Stanton wrote:

    LIKE and GLB do a row scan, and give you none of the advantages of
    an RDBMS. Why not use a flat file and grep and get simplicity and
    greater speed?

    I'm very well aware that LIKE and GLB perform a row scan. I do
    appreciate your concerns about the presumed lack of suitability of my
    design, but that is not open for discussion.

    The original question was related about something very specific:
    whether there is a bug in LIKE and GLB when dealing with numbers.
    Regardless of whether my design is suitable or not, I think it's
    important to clarify what's happening. users might be suffering
    from this issue without knowing it.

    Again, thanks for your responses.

    Regards,
    -- Tito
  • No.6 | | 5195 bytes | |

    Tito,

    I knocked up a quick test with python and apsw and it worked as intended. My
    data isn't exactly the same as yours in that I don't have the variety in the
    keys, but you're not having problems with those. My test database contains
    your data with/without embedded carriage returns - as expected, it makes no
    difference.

    In the following, zip() is a quick hack to get all the results from the
    query. The spurious '[', ']' and other brackets surrounding the results are
    a result of the way that apsw returns data (as lists of python tuples).

    Apologies for the extreme width of the following lines. :(

    zip(csr.execute("select * from t"))

    [
    (('file000000005809',
    '(0,NSFileTypeRegular,0,22537,0,staff,234881026,29 4022,2004-12-16
    10:11:00 -0800,tciuro,384,2006-03-26 08:01:55 -0800,502,20)'),),
    (('file01010000581a',
    '(1,NSFileTypeRegular,1,22554,0,staff,234881026,29 4022,2004-12-16
    10:11:03 -0800,tciuro,384,2006-03-26 08:04:55 -0800,502,20)'),),
    (('file02020000582b',
    '(2,NSFileTypeRegular,2,22571,0,staff,234881026,29 4022,2004-12-16
    10:11:06 -0800,tciuro,384,2006-03-26 08:07:55 -0800,502,20)'),),

    (('file00000000595d', '(\n 0,\n NSFileTypeRegular,\n 0,\n
    22877,\n 0,\n staff,\n 234881026,\n 294022,\n 2004-12-16
    10:11:00 -0800,\n tciuro,\n 384,\n 2006-03-26 08:01:55 -0800,\n
    502,\n 20\n)'),),
    (('file01010000596e', '(\n 1,\n NSFileTypeRegular,\n 1,\n
    22894,\n 0,\n staff,\n 234881026,\n 294022,\n 2004-12-16
    10:11:03 -0800,\n tciuro,\n 384,\n 2006-03-26 08:04:55 -0800,\n
    502,\n 20\n)'),), (('file02020000597f',
    '(\n 2,\n NSFileTypeRegular,\n 2,\n 22911,\n 0,\n staff,\n
    234881026,\n 294022,\n 2004-12-16 10:11:06 -0800,\n tciuro,\n
    384,\n 2006-03-26 08:07:55 -0800,\n 502,\n 20\n)'),),

    ]

    zip(csr.execute("SELECT * FRM t WHERE CMValues GLB '*2004-12-16
    10:11:45 -0800*'"))

    [
    (('file151500005908',
    '(15,NSFileTypeRegular,15,22792,0,staff,234881026, 294022,2004-12-16
    10:11:45 -0800,tciuro,384,2006-03-26 08:46:55 -0800,502,20)'),),
    (('file151500005a5c', '(\n 15,\n NSFileTypeRegular,\n 15,\n
    23132,\n 0,\n staff,\n 234881026,\n 294022,\n 2004-12-16
    10:11:45 -0800,\n tciuro,\n 384,\n 2006-03-26 08:46:55 -0800,\n
    502,\n 20\n)'),)
    ]

    zip(csr.execute("SELECT * FRM t WHERE CMValues LIKE '%2004-12-16
    10:11:45 -0800%'"))

    [
    (('file151500005908',
    '(15,NSFileTypeRegular,15,22792,0,staff,234881026, 294022,2004-12-16
    10:11:45 -0800,tciuro,384,2006-03-26 08:46:55 -0800,502,20)'),),
    (('file151500005a5c', '(\n 15,\n NSFileTypeRegular,\n 15,\n
    23132,\n 0,\n staff,\n 234881026,\n 294022,\n 2004-12-16
    10:11:45 -0800,\n tciuro,\n 384,\n 2006-03-26 08:46:55 -0800,\n
    502,\n 20\n)'),)
    ]

    Could you try reducing your search strings and see if there's a point at
    which they start working?

    HTH,

    Martin Jenkins
    XQP Ltd
    Ascot, UK

    Message
    From: "Tito Ciuro" <tciuro (AT) mac (DOT) com>
    To: "Forum SQLite" <sqlite-users (AT) sqlite (DOT) org>
    Sent: Sunday, March 26, 2006 6:50 PM
    Subject: [sqlite] LIKE and GLB bug with numbers?

    Hello,

    I've populated a datafile with 40.176 records which contain file
    attributes and file paths. I have two columns, CMKey and CMValues. The
    column CMKey contains the path to the file and the column CMValues
    contains the attribute values. For example:

    CMKey: Application

    CMValues:
    (
    0,
    NSFileTypeRegular,
    1,
    21508,
    0,
    staff,
    234881026,
    294022,
    2004-12-16 10:11:35 -0800,
    tciuro,
    384,
    2006-03-26 08:35:55 -0800,
    502,
    20
    )

    Both columns are of type TEXT.

    This is what I've found:

    1) SELECT * FRM FinderFiles WHERE CMKey GLB '*AbiWord.Profile*' returns
    1 match. This is correct.

    2) SELECT * FRM FinderFiles WHERE CMKey LIKE '%ABIWRD.Profile%' returns
    1 match. This is correct.

    3) SELECT * FRM FinderFiles WHERE CMValues GLB '*2004-12-16
    10:11:35 -0800*' returns 40.176 matches. This is not correct. There is
    no way I created these 40.176 file at the *very same* time. Just to be
    sure, I looked at one random file (of the 40.176) and I've obtained the
    following creation date attribute:

    NSFileCreationDate = 2004-02-21 06:12:43 -0800;

    The same problem occurs if I perform the query:

    SELECT * FRM FinderFiles WHERE CMValues LIKE '%2004-12-16
    0:11:35 -0800%'

    This problem seems to occur when trying to match something with numbers:

    - If I look for NSFilePosixPermissions 448 (which I know exists) I get
    zero matches
    - If I look for strings, such as in step #1 or #2, it works fine.

    Something is wrong, I just can't figure out why

    Any ideas? Is this a bug?

    Thanks,

    -- Tito
    --

Re: LIKE and GLOB bug with numbers?


max 4000 letters.
Your nickname that display:
In order to stop the spam: 1 + 0 =
QUESTION ON "Databases"

EMSDN.COM