Seeking Opinions
13 answers - 981 bytes -

I'm building a new clipart site. I need to have keyword searching. I'm
seeking opinions about table design. Here are my proposed options. If
anyone has any other suggestions, please make them.
Related tables. Table one (clipart pieces) contains ClipartID and
ClipartName fields. Table two (keywords) contains KeywordID, ClipartID and
Keyword fields. This option will create an incredibly large related table
(keywords) with each piece of clipart having tens of related fields in the
keyword table. But, searching ought to be fast.
Two
Single table. Table one (clipart pieces) contains ClipartID, ClipartName
and Keywords fields. The Keywords field would be a long text field that
would be searched with a full-text search. Searching and maintenance would
be easier but would searching be slowed down significantly?
Please provide any input you have and make any alternate suggestions.
Robb Kerr
Digital IGUANA
No.1 | | 2594 bytes |
| 
Message
From: "Robb Kerr" <rkerr.news (AT) digitaliguana (DOT) com>
To: <mysql (AT) lists (DOT) mysql.com>
Sent: Friday, November 25, 2005 11:59 AM
Subject: Seeking
I'm building a new clipart site. I need to have keyword searching. I'm
seeking opinions about table design. Here are my proposed options. If
anyone has any other suggestions, please make them.
Related tables. Table one (clipart pieces) contains ClipartID and
ClipartName fields. Table two (keywords) contains KeywordID, ClipartID and
Keyword fields. This option will create an incredibly large related table
(keywords) with each piece of clipart having tens of related fields in the
keyword table. But, searching ought to be fast.
Two
Single table. Table one (clipart pieces) contains ClipartID, ClipartName
and Keywords fields. The Keywords field would be a long text field that
would be searched with a full-text search. Searching and maintenance would
be easier but would searching be slowed down significantly?
Please provide any input you have and make any alternate suggestions.
I'm not sure if you'll gain or lose by putting the keywords in a separate
table. Your description of the data is too vague. Could you possibly type an
example of a few rows of each scenario so that we can see what will actually
be in the Keywords columns in each scenario? It would also be VERY useful to
know what the primary and foreign keys of each table are going to be.
There is one major performance issue that you don't appear to have
considered yet: how will the clipart images themselves be stored? Are you
going to store each one as a blob in the data row itself? are you going
to store a URL or other URL-like description of where the clipart image is
found? The latter approach keeps the MySQL tables very small and may give
you performance advantages but also make your job a bit more complicated:
you have to maintain some kind of directory structure for your clipart files
and keep them consistent with the URL that you store in the database.
I've barely touched blobs in MySQL so I don't feel qualified to recommend
either approach to you from my own experience but I *think* the consensus
among people with more blob experience is that the second approach I
mentioned is better. However, it would be very wise of you to check the
archives for this mailing list - search on 'blob' - to be sure I am getting
that right.
Rhino
No.2 | | 1095 bytes |
| 
Related tables. Table one (clipart pieces) contains ClipartID and
ClipartName fields. Table two (keywords) contains KeywordID, ClipartID and
Keyword fields. This option will create an incredibly large related table
(keywords) with each piece of clipart having tens of related fields in the
keyword table. But, searching ought to be fast.
Use this option but use a third table that contains just ClipartID and
KeywordID to create the m:n relationship. Like this:
Clipart: ClipartID (primary key) & Clipartname
Keywords: KeywordID (primary key) & Keyword (just one so must be unique)
Linktable: ClipartID & KeywordID (ClipartID + KeywordID = primary key)
I have a database like this with over 250,000 images, 50,000+ keywords and
more than 2 million image - keyword links. All my keyword searches are very
fast (under 0.05 seconds per query).
This is on very mediocre hardware (AMD Sempron 2600+ with only 512Mb of
memory) so performance on a faster computer with more memory should be
excellent.
HTH,
Johan
No.3 | | 960 bytes |
| 
Friday 25 November 2005 01:44 pm, Johan wrote:
Related tables. Table one (clipart pieces) contains ClipartID and
ClipartName fields. Table two (keywords) contains KeywordID, ClipartID
and Keyword fields. This option will create an incredibly large related
table (keywords) with each piece of clipart having tens of related fields
in the keyword table. But, searching ought to be fast.
Use this option but use a third table that contains just ClipartID and
KeywordID to create the m:n relationship. Like this:
Clipart: ClipartID (primary key) & Clipartname
Keywords: KeywordID (primary key) & Keyword (just one so must be unique)
Linktable: ClipartID & KeywordID (ClipartID + KeywordID = primary key)
In the Clipart table, are names required to be unique? If so, then you can
use Clipartname as the index, and that would eliminate the 3rd table.
so I think -- I'm still learning this.
Hal
No.4 | | 1812 bytes |
| 
Fri, 25 Nov 2005 10:44:44 -0800, Johan wrote:
>
>Related tables. Table one (clipart pieces) contains ClipartID and
>ClipartName fields. Table two (keywords) contains KeywordID, ClipartID and
>Keyword fields. This option will create an incredibly large related table
>(keywords) with each piece of clipart having tens of related fields in the
>keyword table. But, searching ought to be fast.
Use this option but use a third table that contains just ClipartID and
KeywordID to create the m:n relationship. Like this:
Clipart: ClipartID (primary key) & Clipartname
Keywords: KeywordID (primary key) & Keyword (just one so must be unique)
Linktable: ClipartID & KeywordID (ClipartID + KeywordID = primary key)
I have a database like this with over 250,000 images, 50,000+ keywords and
more than 2 million image - keyword links. All my keyword searches are very
fast (under 0.05 seconds per query).
This is on very mediocre hardware (AMD Sempron 2600+ with only 512Mb of
memory) so performance on a faster computer with more memory should be
excellent.
HTH,
Johan
Please explain further your 3 table scenario. Is the following example
correct
Table - Clipart
ClipartID (primary key) | ClipartName
1 | artone.jpg
2 | arttwo.jpg
3 | artthree.jpg
Table Two - Keywords
KeywordID (primary key) | Keyword
1 | black and white
2 | color
3 | christmas
4 | thanksgiving
Table Three - LinkTable
ClipartID | KeywordID
1 | 1
1 | 3
2 | 2
2 | 3
I don't understand what would be the primary key for the third table or
what you mean by "(ClipartID + KeywordID = primary key)". Please elaborate.
Thanx,
Robb
No.5 | | 2807 bytes |
| 
Hi Robb,
Your table setup is correct. In the third table the two ID fields together
form the Primary Key. This makes sure that you cannot add the same keyword
twice to the same image. If you use MySQL Administrator to create your
tables then you just add both columns to the primary key index.
Programmatically it would look something like (to create the tabloe and
indexes):
CREATE TABLE "ClipartKeyword" (
"ClipartID" INT(11) NT NULL,
"KeywordID" INT(11) NT NULL,
PRIMARY KEY ("ClipartID","KeywordID"),
INDEX ("KeywordID")
) TYPE=InnoDB;
This database uses ANSI, hence the quotes around the field names. The index
on KeywordID makes it easy to find all images that have a certain keyword
attached to it.
Good luck,
Johan
11/25/05, Robb Kerr <rkerr.news (AT) digitaliguana (DOT) comwrote:
Fri, 25 Nov 2005 10:44:44 -0800, Johan wrote:
>
>
>Related tables. Table one (clipart pieces) contains ClipartID and
>ClipartName fields. Table two (keywords) contains KeywordID, ClipartID
and
>Keyword fields. This option will create an incredibly large related
table
>(keywords) with each piece of clipart having tens of related fields in
the
>keyword table. But, searching ought to be fast.
>
>
Use this option but use a third table that contains just ClipartID and
KeywordID to create the m:n relationship. Like this:
Clipart: ClipartID (primary key) & Clipartname
Keywords: KeywordID (primary key) & Keyword (just one so must be unique)
Linktable: ClipartID & KeywordID (ClipartID + KeywordID = primary key)
I have a database like this with over 250,000 images, 50,000+ keywords
and
more than 2 million image - keyword links. All my keyword searches are
very
fast (under 0.05 seconds per query).
This is on very mediocre hardware (AMD Sempron 2600+ with only 512Mb of
memory) so performance on a faster computer with more memory should be
excellent.
HTH,
Johan
Please explain further your 3 table scenario. Is the following example
correct
Table - Clipart
ClipartID (primary key) | ClipartName
1 | artone.jpg
2 | arttwo.jpg
3 | artthree.jpg
Table Two - Keywords
KeywordID (primary key) | Keyword
1 | black and white
2 | color
3 | christmas
4 | thanksgiving
Table Three - LinkTable
ClipartID | KeywordID
1 | 1
1 | 3
2 | 2
2 | 3
I don't understand what would be the primary key for the third table or
what you mean by "(ClipartID + KeywordID = primary key)". Please
elaborate.
Thanx,
Robb
No.6 | | 3695 bytes |
| 
this note, I feel as though the best way to do this in MySQL is to
use the ENUM or SET types, instead of indexing against a separate
table -- I'm guessing SET, so each piece of clipart can be associated
with more than one category. course, that eliminates relevancy
searches (which you could do as in the first example given by the P,
with a text field and fulltext search).
I think the biggest problem with that is that SET only allows up to 64
different items, so the clipart site could only have up to 64
categories (and a Christmas floral arrangement might have the keywords
"color", "flowers" and "Christmas").
That being said, the point of this post is to ask -- Is MySQL working
on allowing the SET limit to increase? is that just too much
storage and math?
-Sheeri
11/25/05, Johan <beest1967 (AT) gmail (DOT) comwrote:
Hi Robb,
Your table setup is correct. In the third table the two ID fields together
form the Primary Key. This makes sure that you cannot add the same keyword
twice to the same image. If you use MySQL Administrator to create your
tables then you just add both columns to the primary key index.
Programmatically it would look something like (to create the tabloe and
indexes):
CREATE TABLE "ClipartKeyword" (
"ClipartID" INT(11) NT NULL,
"KeywordID" INT(11) NT NULL,
PRIMARY KEY ("ClipartID","KeywordID"),
INDEX ("KeywordID")
) TYPE=InnoDB;
This database uses ANSI, hence the quotes around the field names. The index
on KeywordID makes it easy to find all images that have a certain keyword
attached to it.
Good luck,
Johan
11/25/05, Robb Kerr <rkerr.news (AT) digitaliguana (DOT) comwrote:
Fri, 25 Nov 2005 10:44:44 -0800, Johan wrote:
>
>
>Related tables. Table one (clipart pieces) contains ClipartID and
>ClipartName fields. Table two (keywords) contains KeywordID, ClipartID
and
>Keyword fields. This option will create an incredibly large related
table
>(keywords) with each piece of clipart having tens of related fields in
the
>keyword table. But, searching ought to be fast.
>
>
Use this option but use a third table that contains just ClipartID and
KeywordID to create the m:n relationship. Like this:
Clipart: ClipartID (primary key) & Clipartname
Keywords: KeywordID (primary key) & Keyword (just one so must be unique)
Linktable: ClipartID & KeywordID (ClipartID + KeywordID = primary key)
I have a database like this with over 250,000 images, 50,000+ keywords
and
more than 2 million image - keyword links. All my keyword searches are
very
fast (under 0.05 seconds per query).
This is on very mediocre hardware (AMD Sempron 2600+ with only 512Mb of
memory) so performance on a faster computer with more memory should be
excellent.
HTH,
Johan
Please explain further your 3 table scenario. Is the following example
correct
Table - Clipart
ClipartID (primary key) | ClipartName
1 | artone.jpg
2 | arttwo.jpg
3 | artthree.jpg
Table Two - Keywords
KeywordID (primary key) | Keyword
1 | black and white
2 | color
3 | christmas
4 | thanksgiving
Table Three - LinkTable
ClipartID | KeywordID
1 | 1
1 | 3
2 | 2
2 | 3
I don't understand what would be the primary key for the third table or
what you mean by "(ClipartID + KeywordID = primary key)". Please
elaborate.
Thanx,
Robb
No.7 | | 5066 bytes |
| 
I did answer my own question, and that's why I said:
That being said, the point of this post is to ask -- Is MySQL working
on allowing the SET limit to increase? is that just too much
storage and math?
Using a SET is, I believe, MUCH faster than making a special table
simply to normalize the data. I believe in using MySQL's
functionality to the fullest, and working smarter, not harder.
Would folks use SET to help normalize their data instead of using a
separate table? Should I put in a request for the feature I want?
would very few folks benefit, and it would add too much storage and
computational cost? (my original question)
-Sheeri
11/28/05, SGreen (AT) unimin (DOT) com <SGreen (AT) unimin (DOT) comwrote:
You, yourself, explain why your suggestion would be a bad fit for this
project: SET is limited to just 64 discrete values per table.
I cannot remember reading that increasing the size of the SET features as a
priority on any development list. I might have missed something but I don't
think that SET>64 will be coming any time soon.
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine
sheeri kritzer <awfief (AT) gmail (DOT) comwrote on 11/28/2005 09:53:49 AM:
this note, I feel as though the best way to do this in MySQL is to
use the ENUM or SET types, instead of indexing against a separate
table -- I'm guessing SET, so each piece of clipart can be associated
with more than one category. course, that eliminates relevancy
searches (which you could do as in the first example given by the P,
with a text field and fulltext search).
I think the biggest problem with that is that SET only allows up to 64
different items, so the clipart site could only have up to 64
categories (and a Christmas floral arrangement might have the keywords
"color", "flowers" and "Christmas").
That being said, the point of this post is to ask -- Is MySQL working
on allowing the SET limit to increase? is that just too much
storage and math?
-Sheeri
11/25/05, Johan <beest1967 (AT) gmail (DOT) comwrote:
Hi Robb,
Your table setup is correct. In the third table the two ID fields
together
form the Primary Key. This makes sure that you cannot add the same
keyword
twice to the same image. If you use MySQL Administrator to create your
tables then you just add both columns to the primary key index.
Programmatically it would look something like (to create the tabloe and
indexes):
CREATE TABLE "ClipartKeyword" (
"ClipartID" INT(11) NT NULL,
"KeywordID" INT(11) NT NULL,
PRIMARY KEY ("ClipartID","KeywordID"),
INDEX ("KeywordID")
) TYPE=InnoDB;
This database uses ANSI, hence the quotes around the field names. The
index
on KeywordID makes it easy to find all images that have a certain
keyword
attached to it.
Good luck,
Johan
11/25/05, Robb Kerr <rkerr.news (AT) digitaliguana (DOT) comwrote:
Fri, 25 Nov 2005 10:44:44 -0800, Johan wrote:
>
>
>Related tables. Table one (clipart pieces) contains ClipartID and
>ClipartName fields. Table two (keywords) contains KeywordID,
ClipartID
and
>Keyword fields. This option will create an incredibly large
related
table
>(keywords) with each piece of clipart having tens of related
fields in
the
>keyword table. But, searching ought to be fast.
>
>
Use this option but use a third table that contains just ClipartID
and
KeywordID to create the m:n relationship. Like this:
Clipart: ClipartID (primary key) & Clipartname
Keywords: KeywordID (primary key) & Keyword (just one so must be
unique)
Linktable: ClipartID & KeywordID (ClipartID + KeywordID = primary
key)
I have a database like this with over 250,000 images, 50,000+
keywords
and
more than 2 million image - keyword links. All my keyword searches
are
very
fast (under 0.05 seconds per query).
This is on very mediocre hardware (AMD Sempron 2600+ with only
512Mb of
memory) so performance on a faster computer with more memory should
be
excellent.
HTH,
Johan
Please explain further your 3 table scenario. Is the following
example
correct
Table - Clipart
ClipartID (primary key) | ClipartName
1 | artone.jpg
2 | arttwo.jpg
3 | artthree.jpg
Table Two - Keywords
KeywordID (primary key) | Keyword
1 | black and white
2 | color
3 | christmas
4 | thanksgiving
Table Three - LinkTable
ClipartID | KeywordID
1 | 1
1 | 3
2 | 2
2 | 3
I don't understand what would be the primary key for the third table
or
what you mean by "(ClipartID + KeywordID = primary key)". Please
elaborate.
Thanx,
Robb
No.8 | | 1189 bytes |
| 
Part of me agrees with you, on a "Pure SQL level". but then why would
anyone ever use ENUM or SET?
-Sheeri
11/28/05, Martijn Tonies <m.tonies (AT) upscene (DOT) comwrote:
>
>
>
I did answer my own question, and that's why I said:
That being said, the point of this post is to ask -- Is MySQL working
on allowing the SET limit to increase? is that just too much
storage and math?
Using a SET is, I believe, MUCH faster than making a special table
simply to normalize the data. I believe in using MySQL's
functionality to the fullest, and working smarter, not harder.
Would folks use SET to help normalize their data instead of using a
separate table? Should I put in a request for the feature I want?
would very few folks benefit, and it would add too much storage and
computational cost? (my original question)
Create a separate table.
MySQL is a database engine.
Tables is what database engines do.
Your design should be clear and extendable without having to
modify metadata (adding a category would need you to do an
ALTER TABLE).
--
No.9 | | 378 bytes |
| 
Part of me agrees with you, on a "Pure SQL level". but then why would
anyone ever use ENUM or SET?
I wouldn't :-)
IM, they're abominations that are to be avoided.
Now, ENUM could be a bit useful. The Pure SQL way to avoid those
would simply be a column of the right datatype and a CHECK constraint
(which MySQL doesn't support, yet).
No.10 | | 1215 bytes |
| 
11/28/05, Martijn Tonies <m.tonies (AT) upscene (DOT) comwrote:
Part of me agrees with you, on a "Pure SQL level". but then why would
anyone ever use ENUM or SET?
I wouldn't :-)
IM, they're abominations that are to be avoided.
Again, part of me agrees with you. But part of me, particularly the
part that says "our queries need to be optimized as much as possible,
so our customers will be happy," thinks that ENUM and SET are
non-standard, MySQL specific ways to create normalized data without
having an extra table, and thus an extra join, and thus more
computation, and thus a slower query.
Tell me, do you also create your own sequences in MySQL instead of
using the AUTINCREMENT feature? Because that, too, is an
abomination in the same way.
I use ENUM a lot, successfully, and when I want to add another ENUM
value to a table (I never delete one), I have a script that runs a
show table, gets the ENUM field in question, gets all the values, and
alters the table modifying the column with the correct values (all the
current values + the new one). I've never had a problem with it yet
(knock wood).
-Sheeri
No.11 | | 942 bytes |
| 
I did answer my own question, and that's why I said:
That being said, the point of this post is to ask -- Is MySQL working
on allowing the SET limit to increase? is that just too much
storage and math?
Using a SET is, I believe, MUCH faster than making a special table
simply to normalize the data. I believe in using MySQL's
functionality to the fullest, and working smarter, not harder.
Would folks use SET to help normalize their data instead of using a
separate table? Should I put in a request for the feature I want?
would very few folks benefit, and it would add too much storage and
computational cost? (my original question)
Create a separate table.
MySQL is a database engine.
Tables is what database engines do.
Your design should be clear and extendable without having to
modify metadata (adding a category would need you to do an
ALTER TABLE).
No.12 | | 1516 bytes |
| 
Part of me agrees with you, on a "Pure SQL level". but then why would
anyone ever use ENUM or SET?
I wouldn't :-)
IM, they're abominations that are to be avoided.
Again, part of me agrees with you. But part of me, particularly the
part that says "our queries need to be optimized as much as possible,
so our customers will be happy," thinks that ENUM and SET are
non-standard, MySQL specific ways to create normalized data without
having an extra table, and thus an extra join, and thus more
computation, and thus a slower query.
That's wrong, if you use "set", the data isn't normalized cause you're
storing multiple values in a column.
Tell me, do you also create your own sequences in MySQL instead of
using the AUTINCREMENT feature? Because that, too, is an
abomination in the same way.
way to do it is to create an auto-inc table, but this is besides
the point for ENUM/SET.
I use ENUM a lot, successfully, and when I want to add another ENUM
value to a table (I never delete one), I have a script that runs a
show table, gets the ENUM field in question, gets all the values, and
alters the table modifying the column with the correct values (all the
current values + the new one). I've never had a problem with it yet
(knock wood).
In other words: modifying metadata to adjust the system for data
requirements ;-) This also means there's no easy way to automate
this.
No.13 | | 1000 bytes |
| 
In article <@mail.gmail.com>,
sheeri kritzer <awfief (AT) gmail (DOT) comwrites:
11/28/05, Martijn Tonies <m.tonies (AT) upscene (DOT) comwrote:
>
>Part of me agrees with you, on a "Pure SQL level". but then why would
>anyone ever use ENUM or SET?
>
>I wouldn't :-)
>
>IM, they're abominations that are to be avoided.
Again, part of me agrees with you. But part of me, particularly the
part that says "our queries need to be optimized as much as possible,
so our customers will be happy," thinks that ENUM and SET are
non-standard, MySQL specific ways to create normalized data without
having an extra table, and thus an extra join, and thus more
computation, and thus a slower query.
I'd say SET is a denormalization which might give you some performance
(at the price of being completely nonstandard), but ENUM is just
syntactical sugar for a TINYINT column.