MYSQL

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Seeking Opinions

    13 answers - 981 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    I'm building a new clipart site. I need to have keyword searching. I'm
    seeking opinions about table design. Here are my proposed options. If
    anyone has any other suggestions, please make them.
    Related tables. Table one (clipart pieces) contains ClipartID and
    ClipartName fields. Table two (keywords) contains KeywordID, ClipartID and
    Keyword fields. This option will create an incredibly large related table
    (keywords) with each piece of clipart having tens of related fields in the
    keyword table. But, searching ought to be fast.
    Two
    Single table. Table one (clipart pieces) contains ClipartID, ClipartName
    and Keywords fields. The Keywords field would be a long text field that
    would be searched with a full-text search. Searching and maintenance would
    be easier but would searching be slowed down significantly?
    Please provide any input you have and make any alternate suggestions.
    Robb Kerr
    Digital IGUANA
  • No.1 | | 2594 bytes | |

    Message
    From: "Robb Kerr" <rkerr.news (AT) digitaliguana (DOT) com>
    To: <mysql (AT) lists (DOT) mysql.com>
    Sent: Friday, November 25, 2005 11:59 AM
    Subject: Seeking

    I'm building a new clipart site. I need to have keyword searching. I'm
    seeking opinions about table design. Here are my proposed options. If
    anyone has any other suggestions, please make them.

    Related tables. Table one (clipart pieces) contains ClipartID and
    ClipartName fields. Table two (keywords) contains KeywordID, ClipartID and
    Keyword fields. This option will create an incredibly large related table
    (keywords) with each piece of clipart having tens of related fields in the
    keyword table. But, searching ought to be fast.

    Two
    Single table. Table one (clipart pieces) contains ClipartID, ClipartName
    and Keywords fields. The Keywords field would be a long text field that
    would be searched with a full-text search. Searching and maintenance would
    be easier but would searching be slowed down significantly?

    Please provide any input you have and make any alternate suggestions.

    I'm not sure if you'll gain or lose by putting the keywords in a separate
    table. Your description of the data is too vague. Could you possibly type an
    example of a few rows of each scenario so that we can see what will actually
    be in the Keywords columns in each scenario? It would also be VERY useful to
    know what the primary and foreign keys of each table are going to be.

    There is one major performance issue that you don't appear to have
    considered yet: how will the clipart images themselves be stored? Are you
    going to store each one as a blob in the data row itself? are you going
    to store a URL or other URL-like description of where the clipart image is
    found? The latter approach keeps the MySQL tables very small and may give
    you performance advantages but also make your job a bit more complicated:
    you have to maintain some kind of directory structure for your clipart files
    and keep them consistent with the URL that you store in the database.

    I've barely touched blobs in MySQL so I don't feel qualified to recommend
    either approach to you from my own experience but I *think* the consensus
    among people with more blob experience is that the second approach I
    mentioned is better. However, it would be very wise of you to check the
    archives for this mailing list - search on 'blob' - to be sure I am getting
    that right.

    Rhino
  • No.2 | | 1095 bytes | |


    Related tables. Table one (clipart pieces) contains ClipartID and
    ClipartName fields. Table two (keywords) contains KeywordID, ClipartID and
    Keyword fields. This option will create an incredibly large related table
    (keywords) with each piece of clipart having tens of related fields in the
    keyword table. But, searching ought to be fast.

    Use this option but use a third table that contains just ClipartID and
    KeywordID to create the m:n relationship. Like this:

    Clipart: ClipartID (primary key) & Clipartname
    Keywords: KeywordID (primary key) & Keyword (just one so must be unique)
    Linktable: ClipartID & KeywordID (ClipartID + KeywordID = primary key)

    I have a database like this with over 250,000 images, 50,000+ keywords and
    more than 2 million image - keyword links. All my keyword searches are very
    fast (under 0.05 seconds per query).

    This is on very mediocre hardware (AMD Sempron 2600+ with only 512Mb of
    memory) so performance on a faster computer with more memory should be
    excellent.

    HTH,

    Johan
  • No.3 | | 960 bytes | |

    Friday 25 November 2005 01:44 pm, Johan wrote:

    Related tables. Table one (clipart pieces) contains ClipartID and
    ClipartName fields. Table two (keywords) contains KeywordID, ClipartID
    and Keyword fields. This option will create an incredibly large related
    table (keywords) with each piece of clipart having tens of related fields
    in the keyword table. But, searching ought to be fast.

    Use this option but use a third table that contains just ClipartID and
    KeywordID to create the m:n relationship. Like this:

    Clipart: ClipartID (primary key) & Clipartname
    Keywords: KeywordID (primary key) & Keyword (just one so must be unique)
    Linktable: ClipartID & KeywordID (ClipartID + KeywordID = primary key)

    In the Clipart table, are names required to be unique? If so, then you can
    use Clipartname as the index, and that would eliminate the 3rd table.

    so I think -- I'm still learning this.

    Hal
  • No.4 | | 1812 bytes | |

    Fri, 25 Nov 2005 10:44:44 -0800, Johan wrote:

    >
    >Related tables. Table one (clipart pieces) contains ClipartID and
    >ClipartName fields. Table two (keywords) contains KeywordID, ClipartID and
    >Keyword fields. This option will create an incredibly large related table
    >(keywords) with each piece of clipart having tens of related fields in the
    >keyword table. But, searching ought to be fast.


    Use this option but use a third table that contains just ClipartID and
    KeywordID to create the m:n relationship. Like this:

    Clipart: ClipartID (primary key) & Clipartname
    Keywords: KeywordID (primary key) & Keyword (just one so must be unique)
    Linktable: ClipartID & KeywordID (ClipartID + KeywordID = primary key)

    I have a database like this with over 250,000 images, 50,000+ keywords and
    more than 2 million image - keyword links. All my keyword searches are very
    fast (under 0.05 seconds per query).

    This is on very mediocre hardware (AMD Sempron 2600+ with only 512Mb of
    memory) so performance on a faster computer with more memory should be
    excellent.

    HTH,

    Johan

    Please explain further your 3 table scenario. Is the following example
    correct

    Table - Clipart
    ClipartID (primary key) | ClipartName
    1 | artone.jpg
    2 | arttwo.jpg
    3 | artthree.jpg

    Table Two - Keywords
    KeywordID (primary key) | Keyword
    1 | black and white
    2 | color
    3 | christmas
    4 | thanksgiving

    Table Three - LinkTable
    ClipartID | KeywordID
    1 | 1
    1 | 3
    2 | 2
    2 | 3

    I don't understand what would be the primary key for the third table or
    what you mean by "(ClipartID + KeywordID = primary key)". Please elaborate.

    Thanx,
    Robb
  • No.5 | | 2807 bytes | |

    Hi Robb,

    Your table setup is correct. In the third table the two ID fields together
    form the Primary Key. This makes sure that you cannot add the same keyword
    twice to the same image. If you use MySQL Administrator to create your
    tables then you just add both columns to the primary key index.
    Programmatically it would look something like (to create the tabloe and
    indexes):

    CREATE TABLE "ClipartKeyword" (
    "ClipartID" INT(11) NT NULL,
    "KeywordID" INT(11) NT NULL,
    PRIMARY KEY ("ClipartID","KeywordID"),
    INDEX ("KeywordID")
    ) TYPE=InnoDB;

    This database uses ANSI, hence the quotes around the field names. The index
    on KeywordID makes it easy to find all images that have a certain keyword
    attached to it.

    Good luck,

    Johan

    11/25/05, Robb Kerr <rkerr.news (AT) digitaliguana (DOT) comwrote:

    Fri, 25 Nov 2005 10:44:44 -0800, Johan wrote:
    >
    >
    >Related tables. Table one (clipart pieces) contains ClipartID and
    >ClipartName fields. Table two (keywords) contains KeywordID, ClipartID

    and
    >Keyword fields. This option will create an incredibly large related

    table
    >(keywords) with each piece of clipart having tens of related fields in

    the
    >keyword table. But, searching ought to be fast.
    >
    >

    Use this option but use a third table that contains just ClipartID and
    KeywordID to create the m:n relationship. Like this:

    Clipart: ClipartID (primary key) & Clipartname
    Keywords: KeywordID (primary key) & Keyword (just one so must be unique)
    Linktable: ClipartID & KeywordID (ClipartID + KeywordID = primary key)

    I have a database like this with over 250,000 images, 50,000+ keywords
    and
    more than 2 million image - keyword links. All my keyword searches are
    very
    fast (under 0.05 seconds per query).

    This is on very mediocre hardware (AMD Sempron 2600+ with only 512Mb of
    memory) so performance on a faster computer with more memory should be
    excellent.

    HTH,

    Johan

    Please explain further your 3 table scenario. Is the following example
    correct

    Table - Clipart
    ClipartID (primary key) | ClipartName
    1 | artone.jpg
    2 | arttwo.jpg
    3 | artthree.jpg

    Table Two - Keywords
    KeywordID (primary key) | Keyword
    1 | black and white
    2 | color
    3 | christmas
    4 | thanksgiving

    Table Three - LinkTable
    ClipartID | KeywordID
    1 | 1
    1 | 3
    2 | 2
    2 | 3

    I don't understand what would be the primary key for the third table or
    what you mean by "(ClipartID + KeywordID = primary key)". Please
    elaborate.

    Thanx,
    Robb
  • No.6 | | 3695 bytes | |

    this note, I feel as though the best way to do this in MySQL is to
    use the ENUM or SET types, instead of indexing against a separate
    table -- I'm guessing SET, so each piece of clipart can be associated
    with more than one category. course, that eliminates relevancy
    searches (which you could do as in the first example given by the P,
    with a text field and fulltext search).

    I think the biggest problem with that is that SET only allows up to 64
    different items, so the clipart site could only have up to 64
    categories (and a Christmas floral arrangement might have the keywords
    "color", "flowers" and "Christmas").

    That being said, the point of this post is to ask -- Is MySQL working
    on allowing the SET limit to increase? is that just too much
    storage and math?
    -Sheeri

    11/25/05, Johan <beest1967 (AT) gmail (DOT) comwrote:
    Hi Robb,

    Your table setup is correct. In the third table the two ID fields together
    form the Primary Key. This makes sure that you cannot add the same keyword
    twice to the same image. If you use MySQL Administrator to create your
    tables then you just add both columns to the primary key index.
    Programmatically it would look something like (to create the tabloe and
    indexes):

    CREATE TABLE "ClipartKeyword" (
    "ClipartID" INT(11) NT NULL,
    "KeywordID" INT(11) NT NULL,
    PRIMARY KEY ("ClipartID","KeywordID"),
    INDEX ("KeywordID")
    ) TYPE=InnoDB;

    This database uses ANSI, hence the quotes around the field names. The index
    on KeywordID makes it easy to find all images that have a certain keyword
    attached to it.

    Good luck,

    Johan

    11/25/05, Robb Kerr <rkerr.news (AT) digitaliguana (DOT) comwrote:

    Fri, 25 Nov 2005 10:44:44 -0800, Johan wrote:
    >
    >
    >Related tables. Table one (clipart pieces) contains ClipartID and
    >ClipartName fields. Table two (keywords) contains KeywordID, ClipartID

    and
    >Keyword fields. This option will create an incredibly large related

    table
    >(keywords) with each piece of clipart having tens of related fields in

    the
    >keyword table. But, searching ought to be fast.
    >
    >

    Use this option but use a third table that contains just ClipartID and
    KeywordID to create the m:n relationship. Like this:

    Clipart: ClipartID (primary key) & Clipartname
    Keywords: KeywordID (primary key) & Keyword (just one so must be unique)
    Linktable: ClipartID & KeywordID (ClipartID + KeywordID = primary key)

    I have a database like this with over 250,000 images, 50,000+ keywords
    and
    more than 2 million image - keyword links. All my keyword searches are
    very
    fast (under 0.05 seconds per query).

    This is on very mediocre hardware (AMD Sempron 2600+ with only 512Mb of
    memory) so performance on a faster computer with more memory should be
    excellent.

    HTH,

    Johan

    Please explain further your 3 table scenario. Is the following example
    correct

    Table - Clipart
    ClipartID (primary key) | ClipartName
    1 | artone.jpg
    2 | arttwo.jpg
    3 | artthree.jpg

    Table Two - Keywords
    KeywordID (primary key) | Keyword
    1 | black and white
    2 | color
    3 | christmas
    4 | thanksgiving

    Table Three - LinkTable
    ClipartID | KeywordID
    1 | 1
    1 | 3
    2 | 2
    2 | 3

    I don't understand what would be the primary key for the third table or
    what you mean by "(ClipartID + KeywordID = primary key)". Please
    elaborate.

    Thanx,
    Robb
  • No.7 | | 5066 bytes | |

    I did answer my own question, and that's why I said:

    That being said, the point of this post is to ask -- Is MySQL working
    on allowing the SET limit to increase? is that just too much
    storage and math?

    Using a SET is, I believe, MUCH faster than making a special table
    simply to normalize the data. I believe in using MySQL's
    functionality to the fullest, and working smarter, not harder.

    Would folks use SET to help normalize their data instead of using a
    separate table? Should I put in a request for the feature I want?
    would very few folks benefit, and it would add too much storage and
    computational cost? (my original question)
    -Sheeri

    11/28/05, SGreen (AT) unimin (DOT) com <SGreen (AT) unimin (DOT) comwrote:

    You, yourself, explain why your suggestion would be a bad fit for this
    project: SET is limited to just 64 discrete values per table.

    I cannot remember reading that increasing the size of the SET features as a
    priority on any development list. I might have missed something but I don't
    think that SET>64 will be coming any time soon.

    Shawn Green
    Database Administrator
    Unimin Corporation - Spruce Pine

    sheeri kritzer <awfief (AT) gmail (DOT) comwrote on 11/28/2005 09:53:49 AM:

    this note, I feel as though the best way to do this in MySQL is to
    use the ENUM or SET types, instead of indexing against a separate
    table -- I'm guessing SET, so each piece of clipart can be associated
    with more than one category. course, that eliminates relevancy
    searches (which you could do as in the first example given by the P,
    with a text field and fulltext search).

    I think the biggest problem with that is that SET only allows up to 64
    different items, so the clipart site could only have up to 64
    categories (and a Christmas floral arrangement might have the keywords
    "color", "flowers" and "Christmas").

    That being said, the point of this post is to ask -- Is MySQL working
    on allowing the SET limit to increase? is that just too much
    storage and math?

    -Sheeri

    11/25/05, Johan <beest1967 (AT) gmail (DOT) comwrote:
    Hi Robb,

    Your table setup is correct. In the third table the two ID fields
    together
    form the Primary Key. This makes sure that you cannot add the same
    keyword
    twice to the same image. If you use MySQL Administrator to create your
    tables then you just add both columns to the primary key index.
    Programmatically it would look something like (to create the tabloe and
    indexes):

    CREATE TABLE "ClipartKeyword" (
    "ClipartID" INT(11) NT NULL,
    "KeywordID" INT(11) NT NULL,
    PRIMARY KEY ("ClipartID","KeywordID"),
    INDEX ("KeywordID")
    ) TYPE=InnoDB;

    This database uses ANSI, hence the quotes around the field names. The
    index
    on KeywordID makes it easy to find all images that have a certain
    keyword
    attached to it.

    Good luck,

    Johan

    11/25/05, Robb Kerr <rkerr.news (AT) digitaliguana (DOT) comwrote:

    Fri, 25 Nov 2005 10:44:44 -0800, Johan wrote:
    >
    >
    >Related tables. Table one (clipart pieces) contains ClipartID and
    >ClipartName fields. Table two (keywords) contains KeywordID,

    ClipartID
    and
    >Keyword fields. This option will create an incredibly large

    related
    table
    >(keywords) with each piece of clipart having tens of related

    fields in
    the
    >keyword table. But, searching ought to be fast.
    >
    >

    Use this option but use a third table that contains just ClipartID
    and
    KeywordID to create the m:n relationship. Like this:

    Clipart: ClipartID (primary key) & Clipartname
    Keywords: KeywordID (primary key) & Keyword (just one so must be
    unique)
    Linktable: ClipartID & KeywordID (ClipartID + KeywordID = primary
    key)

    I have a database like this with over 250,000 images, 50,000+
    keywords
    and
    more than 2 million image - keyword links. All my keyword searches
    are
    very
    fast (under 0.05 seconds per query).

    This is on very mediocre hardware (AMD Sempron 2600+ with only
    512Mb of
    memory) so performance on a faster computer with more memory should
    be
    excellent.

    HTH,

    Johan

    Please explain further your 3 table scenario. Is the following
    example
    correct

    Table - Clipart
    ClipartID (primary key) | ClipartName
    1 | artone.jpg
    2 | arttwo.jpg
    3 | artthree.jpg

    Table Two - Keywords
    KeywordID (primary key) | Keyword
    1 | black and white
    2 | color
    3 | christmas
    4 | thanksgiving

    Table Three - LinkTable
    ClipartID | KeywordID
    1 | 1
    1 | 3
    2 | 2
    2 | 3

    I don't understand what would be the primary key for the third table
    or
    what you mean by "(ClipartID + KeywordID = primary key)". Please
    elaborate.

    Thanx,
    Robb
  • No.8 | | 1189 bytes | |

    Part of me agrees with you, on a "Pure SQL level". but then why would
    anyone ever use ENUM or SET?
    -Sheeri

    11/28/05, Martijn Tonies <m.tonies (AT) upscene (DOT) comwrote:
    >
    >
    >

    I did answer my own question, and that's why I said:

    That being said, the point of this post is to ask -- Is MySQL working
    on allowing the SET limit to increase? is that just too much
    storage and math?

    Using a SET is, I believe, MUCH faster than making a special table
    simply to normalize the data. I believe in using MySQL's
    functionality to the fullest, and working smarter, not harder.

    Would folks use SET to help normalize their data instead of using a
    separate table? Should I put in a request for the feature I want?
    would very few folks benefit, and it would add too much storage and
    computational cost? (my original question)

    Create a separate table.

    MySQL is a database engine.

    Tables is what database engines do.

    Your design should be clear and extendable without having to
    modify metadata (adding a category would need you to do an
    ALTER TABLE).
    --
  • No.9 | | 378 bytes | |

    Part of me agrees with you, on a "Pure SQL level". but then why would
    anyone ever use ENUM or SET?

    I wouldn't :-)

    IM, they're abominations that are to be avoided.

    Now, ENUM could be a bit useful. The Pure SQL way to avoid those
    would simply be a column of the right datatype and a CHECK constraint
    (which MySQL doesn't support, yet).
  • No.10 | | 1215 bytes | |

    11/28/05, Martijn Tonies <m.tonies (AT) upscene (DOT) comwrote:

    Part of me agrees with you, on a "Pure SQL level". but then why would
    anyone ever use ENUM or SET?

    I wouldn't :-)

    IM, they're abominations that are to be avoided.

    Again, part of me agrees with you. But part of me, particularly the
    part that says "our queries need to be optimized as much as possible,
    so our customers will be happy," thinks that ENUM and SET are
    non-standard, MySQL specific ways to create normalized data without
    having an extra table, and thus an extra join, and thus more
    computation, and thus a slower query.

    Tell me, do you also create your own sequences in MySQL instead of
    using the AUTINCREMENT feature? Because that, too, is an
    abomination in the same way.

    I use ENUM a lot, successfully, and when I want to add another ENUM
    value to a table (I never delete one), I have a script that runs a
    show table, gets the ENUM field in question, gets all the values, and
    alters the table modifying the column with the correct values (all the
    current values + the new one). I've never had a problem with it yet
    (knock wood).
    -Sheeri
  • No.11 | | 942 bytes | |

    I did answer my own question, and that's why I said:

    That being said, the point of this post is to ask -- Is MySQL working
    on allowing the SET limit to increase? is that just too much
    storage and math?

    Using a SET is, I believe, MUCH faster than making a special table
    simply to normalize the data. I believe in using MySQL's
    functionality to the fullest, and working smarter, not harder.

    Would folks use SET to help normalize their data instead of using a
    separate table? Should I put in a request for the feature I want?
    would very few folks benefit, and it would add too much storage and
    computational cost? (my original question)

    Create a separate table.

    MySQL is a database engine.

    Tables is what database engines do.

    Your design should be clear and extendable without having to
    modify metadata (adding a category would need you to do an
    ALTER TABLE).
  • No.12 | | 1516 bytes | |

    Part of me agrees with you, on a "Pure SQL level". but then why would
    anyone ever use ENUM or SET?

    I wouldn't :-)

    IM, they're abominations that are to be avoided.

    Again, part of me agrees with you. But part of me, particularly the
    part that says "our queries need to be optimized as much as possible,
    so our customers will be happy," thinks that ENUM and SET are
    non-standard, MySQL specific ways to create normalized data without
    having an extra table, and thus an extra join, and thus more
    computation, and thus a slower query.

    That's wrong, if you use "set", the data isn't normalized cause you're
    storing multiple values in a column.

    Tell me, do you also create your own sequences in MySQL instead of
    using the AUTINCREMENT feature? Because that, too, is an
    abomination in the same way.

    way to do it is to create an auto-inc table, but this is besides
    the point for ENUM/SET.

    I use ENUM a lot, successfully, and when I want to add another ENUM
    value to a table (I never delete one), I have a script that runs a
    show table, gets the ENUM field in question, gets all the values, and
    alters the table modifying the column with the correct values (all the
    current values + the new one). I've never had a problem with it yet
    (knock wood).

    In other words: modifying metadata to adjust the system for data
    requirements ;-) This also means there's no easy way to automate
    this.
  • No.13 | | 1000 bytes | |

    In article <@mail.gmail.com>,
    sheeri kritzer <awfief (AT) gmail (DOT) comwrites:

    11/28/05, Martijn Tonies <m.tonies (AT) upscene (DOT) comwrote:
    >
    >Part of me agrees with you, on a "Pure SQL level". but then why would
    >anyone ever use ENUM or SET?
    >
    >I wouldn't :-)
    >
    >IM, they're abominations that are to be avoided.


    Again, part of me agrees with you. But part of me, particularly the
    part that says "our queries need to be optimized as much as possible,
    so our customers will be happy," thinks that ENUM and SET are
    non-standard, MySQL specific ways to create normalized data without
    having an extra table, and thus an extra join, and thus more
    computation, and thus a slower query.

    I'd say SET is a denormalization which might give you some performance
    (at the price of being completely nonstandard), but ENUM is just
    syntactical sugar for a TINYINT column.

Re: Seeking Opinions


max 4000 letters.
Your nickname that display:
In order to stop the spam: 3 + 2 =
QUESTION ON "MYSQL"

EMSDN.COM