implicit cast forces table scan?
6 answers - 3498 bytes -

Hi,
I am currently using mysql 4.0.18 as distributed with red hat Linux. I find
when I perform a select on a varchar(30) field, the index is used only if I
have quoted the value in the where clause. , mysql performs a
table scan.
The queries in question are:
This query uses the index:
mysqlexplain
-select itran_log_date, itran_log_actionid from itran_log where
-itran_log_actionid = "170807";
This query performs a table scan:
mysqlexplain
-select itran_log_date, itran_log_actionid from itran_log where
-itran_log_actionid = 170807;
My question is this: is the issue here that mysql is converting every single
itran_log_actionid value, from all 1.5 million rows, and hence the index is
not useful and not used? My initial assumption was that the constant value
170807 in the second query, would be converted to text before the query was
executed, and so the index could be used. This does not seem to be the
case.
I ask both for my own edification, and also because it seems to me this
should be mentioned in the manual for newbies like myself.
thanks,
Details on versions, table structures, indexes, etc. below
$ rpm -qa | grep -i mysql
MySQL-shared-compat-4.0.15-0
MySQL-client-4.0.18-0
php-mysql-4.1.2-7.2.6
MySQL-server-4.0.18-0
$ /usr/bin/mysql -V
/usr/bin/mysql Ver 12.22 Distrib 4.0.18, for pc-linux (i686)
mysqldescribe itran_log;
+
| Field | Type | Null | Key | Default | Extra
|
+
| itran_user_id | varchar(100) | | | |
|
| itran_log_date | date | | MUL | 0000-00-00 |
|
| itran_log_time | time | | | 00:00:00 |
|
| itran_log_filename | varchar(100) | | | |
|
| itran_log_action | varchar(25) | | MUL | |
|
| itran_log_actionid | varchar(30) | | MUL | |
|
| itran_site_id | varchar(100) | YES | MUL | NULL |
|
| itran_log_instructions | text | | | |
|
| itran_log_id | bigint(20) | | PRI | NULL |
auto_increment |
+
mysqlshow indexes from itran_log;
+
| Table | Non_unique | Key_name | Seq_in_index |
Column_name | Collation | Cardinality | Sub_part | Packed | Null |
Index_type | Comment |
+
| itran_log | 0 | PRIMARY | 1 |
itran_log_id | A | 1500793 | NULL | NULL | |
BTREE | |
| itran_log | 1 | itran_site_id_ix | 1 |
itran_site_id | A | NULL | 15 | NULL | YES |
BTREE | |
| itran_log | 1 | itran_log_action_ix | 1 |
itran_log_action | A | NULL | 3 | NULL | |
BTREE | |
| itran_log | 1 | itran_log_actionid_ix | 1 |
itran_log_actionid | A | NULL | NULL | NULL | |
BTREE | |
| itran_log | 1 | itran_log_date_ix | 1 |
itran_log_date | A | NULL | NULL | NULL | |
BTREE | |
+
mysqlexplain
-select itran_log_date, itran_log_actionid from itran_log where
-itran_log_actionid = 170807;
| table | type | possible_keys | key | key_len | ref | rows
| Extra |
| itran_log | ALL | itran_log_actionid_ix | NULL | NULL | NULL | 1500775
| Using where |
1 row in set (0.02 sec)
mysqlexplain
-select itran_log_date, itran_log_actionid from itran_log where
-itran_log_actionid = "170807";
| table | type | possible_keys | key | key_len
| ref | rows | Extra |
| itran_log | ref | itran_log_actionid_ix | itran_log_actionid_ix | 30
| const | 4 | Using where |
1 row in set (0.00 sec)
Faaland
Sovran Inc.
No.1 | | 1470 bytes |
| 
Wednesday 19 2005 01:15 pm, Faaland wrote:
The queries in question are:
This query uses the index:
mysqlexplain
-select itran_log_date, itran_log_actionid from itran_log where
-itran_log_actionid = "170807";
This query performs a table scan:
mysqlexplain
-select itran_log_date, itran_log_actionid from itran_log where
-itran_log_actionid = 170807;
My question is this: is the issue here that mysql is converting every
single itran_log_actionid value, from all 1.5 million rows, and hence the
index is not useful and not used? My initial assumption was that the
constant value 170807 in the second query, would be converted to text
before the query was executed, and so the index could be used. This does
not seem to be the case.
I ask both for my own edification, and also because it seems to me this
should be mentioned in the manual for newbies like myself.
It doesnt know what value your giving it. If it thought to assume converting
the data, you could have "17h120", and it would fail converting the data.
Mysql, nor any DB for that matter, should not, and do not, assume anything.
It just happens in the case your dealing with numeric data. If thats the
case, you should have made the column numeric in type. (int whatever)
Jeff
Jeff
PGP SIGNATURE
Version: GnuPG v1.4.1 (GNU/Linux)
qa/VghnUIFhtkboTG48/t/I=
=+prH
PGP SIGNATURE
No.2 | | 3442 bytes |
| 
Faaland wrote:
Hi,
I am currently using mysql 4.0.18 as distributed with red hat Linux. I find
when I perform a select on a varchar(30) field, the index is used only if I
have quoted the value in the where clause. , mysql performs a
table scan.
A varchar(30) field is a string, so only comparisons to strings really make
sense. If you don't quote the comparison value in the WHERE clause, you're
asking for trouble. First, the unquoted value must make sense. There are 3
possibilities: it's a number, it's a function, or it's a column name.
Hence, you can get away with
WHERE varchar_field = 170807
but probably not
WHERE varachar_field = dogs
(unless you have a column named "dogs").
I expect you know all that. My point is, why aren't you quoting the
comparison string?
The queries in question are:
This query uses the index:
mysqlexplain
-select itran_log_date, itran_log_actionid from itran_log where
-itran_log_actionid = "170807";
This is a string column compared to a string constant, so the index can be used.
This query performs a table scan:
mysqlexplain
-select itran_log_date, itran_log_actionid from itran_log where
-itran_log_actionid = 170807;
This is a string compared to a number. You are asking mysql to make an
implicit conversion so it can make the comparison. You expected the number
to be converted to a string, but that's not how it works -- it's the other
way around, the string is converted to a number. Why? Because many strings
convert to the same number. Consider:
mysqlSELECT '17' + 0, '17.0' + 0, '+17' + 0, '17,345' + 0, '17 dogs' + 0;
| '17' + 0 | '17.0' + 0 | '+17' + 0 | '17,345' + 0 | '17 dogs' + 0 |
| 17 | 17 | 17 | 17 | 17 |
1 row in set (0.00 sec)
The map from string to number is many-to-one, so the only safe course is to
convert the string to a number, then do a numeric comparison. course,
that renders the string index useless, so the full table scan is required.
My question is this: is the issue here that mysql is converting every single
itran_log_actionid value, from all 1.5 million rows, and hence the index is
not useful and not used?
Yes.
My initial assumption was that the constant value
170807 in the second query, would be converted to text before the query was
executed, and so the index could be used. This does not seem to be the
case.
Right, it's not.
I ask both for my own edification, and also because it seems to me this
should be mentioned in the manual for newbies like myself.
I thought it was mentioned in the manual, though I must admit I didn't find
it just now after a brief search.
I'm wondering, however, if we're really approaching this from the right
angle. Your column is named itran_log_actionid, and it seems to contain
numbers such as 170807. The obvious question, then, is why is
itran_log_actionid a VARCHAR(30) instead of one of the INT types? If it
were, you wouldn't need to quote the constant in the WHERE clause, and the
comparison would be numeric (faster than string) with no type conversion
thanks,
Michael
No.3 | | 3623 bytes |
| 
Hello.
I suggest you to check you query with MySQL 4.1.14 which might has a bit
clever optimizer.
Faaland wrote:
Hi,
I am currently using mysql 4.0.18 as distributed with red hat Linux. I find
when I perform a select on a varchar(30) field, the index is used only if I
have quoted the value in the where clause. , mysql performs a
table scan.
The queries in question are:
This query uses the index:
mysqlexplain
-select itran_log_date, itran_log_actionid from itran_log where
-itran_log_actionid = "170807";
This query performs a table scan:
mysqlexplain
-select itran_log_date, itran_log_actionid from itran_log where
-itran_log_actionid = 170807;
My question is this: is the issue here that mysql is converting every single
itran_log_actionid value, from all 1.5 million rows, and hence the index is
not useful and not used? My initial assumption was that the constant value
170807 in the second query, would be converted to text before the query was
executed, and so the index could be used. This does not seem to be the
case.
I ask both for my own edification, and also because it seems to me this
should be mentioned in the manual for newbies like myself.
thanks,
Details on versions, table structures, indexes, etc. below
$ rpm -qa | grep -i mysql
MySQL-shared-compat-4.0.15-0
MySQL-client-4.0.18-0
php-mysql-4.1.2-7.2.6
MySQL-server-4.0.18-0
$ /usr/bin/mysql -V
/usr/bin/mysql Ver 12.22 Distrib 4.0.18, for pc-linux (i686)
mysqldescribe itran_log;
+
| Field | Type | Null | Key | Default | Extra
|
+
| itran_user_id | varchar(100) | | | |
|
| itran_log_date | date | | MUL | 0000-00-00 |
|
| itran_log_time | time | | | 00:00:00 |
|
| itran_log_filename | varchar(100) | | | |
|
| itran_log_action | varchar(25) | | MUL | |
|
| itran_log_actionid | varchar(30) | | MUL | |
|
| itran_site_id | varchar(100) | YES | MUL | NULL |
|
| itran_log_instructions | text | | | |
|
| itran_log_id | bigint(20) | | PRI | NULL |
auto_increment |
+
mysqlshow indexes from itran_log;
+
| Table | Non_unique | Key_name | Seq_in_index |
Column_name | Collation | Cardinality | Sub_part | Packed | Null |
Index_type | Comment |
+
| itran_log | 0 | PRIMARY | 1 |
itran_log_id | A | 1500793 | NULL | NULL | |
BTREE | |
| itran_log | 1 | itran_site_id_ix | 1 |
itran_site_id | A | NULL | 15 | NULL | YES |
BTREE | |
| itran_log | 1 | itran_log_action_ix | 1 |
itran_log_action | A | NULL | 3 | NULL | |
BTREE | |
| itran_log | 1 | itran_log_actionid_ix | 1 |
itran_log_actionid | A | NULL | NULL | NULL | |
BTREE | |
| itran_log | 1 | itran_log_date_ix | 1 |
itran_log_date | A | NULL | NULL | NULL | |
BTREE | |
+
mysqlexplain
-select itran_log_date, itran_log_actionid from itran_log where
-itran_log_actionid = 170807;
| table | type | possible_keys | key | key_len | ref | rows
| Extra |
| itran_log | ALL | itran_log_actionid_ix | NULL | NULL | NULL | 1500775
| Using where |
1 row in set (0.02 sec)
mysqlexplain
-select itran_log_date, itran_log_actionid from itran_log where
-itran_log_actionid = "170807";
| table | type | possible_keys | key | key_len
| ref | rows | Extra |
| itran_log | ref | itran_log_actionid_ix | itran_log_actionid_ix | 30
| const | 4 | Using where |
1 row in set (0.00 sec)
Faaland
Sovran Inc.
No.4 | | 2137 bytes |
| 
Gleb Paharenko wrote:
Hello.
>
I suggest you to check you query with MySQL 4.1.14 which might has a bit
clever optimizer.
Did you read my earlier reply? This is neither an optimizer nor version issue.
Consider:
mysqlSELECT VERSIN();
++
| VERSIN() |
++
| 4.1.15 |
++
1 row in set (0.01 sec)
CREATE TABLE ict (id INT NT NULL AUTINCREMENT PRIMARY KEY,
vcf VARCHAR(30),
INDEX (vcf)
);
INSERT INT ict (vcf)
VALUES ('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),(' 9'),('10'),
('11'),('12'),('13'),('14'),('15'),('16'),('17'),( '18'),('19'),('20');
EXPLAIN SELECT * FRM ict WHERE vcf = '17';
| id | select_type | table | type | p_keys | key | ref | rows | Extra |
| 1 | SIMPLE | ict | ref | vcf | vcf | const | 1 | Using where |
1 row in set (0.01 sec)
EXPLAIN SELECT * FRM ict WHERE vcf = 17;
| id | select_type | table | type | p_keys | key | ref | rows | Extra |
| 1 | SIMPLE | ict | ALL | vcf | NULL | NULL | 20 | Using where |
1 row in set (0.01 sec)
Why? Because there are many strings which evaluate to a given number. For example:
INSERT INT ict (vcf)
VALUES (' 17'), ('+17'), ('17.0'), ('17,34'), ('17 dogs');
SELECT * FRM ict WHERE vcf = '17';
| id | vcf |
| 17 | 17 |
1 row in set (0.00 sec)
SELECT * FRM ict WHERE vcf = 17;
| id | vcf |
| 17 | 17 |
| 21 | 17 |
| 22 | +17 |
| 23 | 17.0 |
| 24 | 17,34 |
| 25 | 17 dogs |
6 rows in set (0.00 sec)
You see? The two WHERE clauses are actually different, so the optimizer must
treat them differently. No amount of optimizer cleverness can change that. The
moral of the story: Use numeric types to store numbers, not string types.
Michael
No.5 | | 2675 bytes |
| 
Dear, Michael!
Did you read my earlier reply? This is neither an optimizer nor >
version issue.
I've sent my e-mail before I've read your. The problem is that I was
testing my guessing on the table with a bit different structure than
your (mine didn't have a primary key field 'id'). And optimizer has
shown that it uses indexes! There is a verified bug:
Thank you for good explanations!
Michael Stassen wrote:
Gleb Paharenko wrote:
>Hello.
>>
>I suggest you to check you query with MySQL 4.1.14 which might has a bit
>clever optimizer.
Did you read my earlier reply? This is neither an optimizer nor version
issue. Consider:
mysqlSELECT VERSIN();
++
| VERSIN() |
++
| 4.1.15 |
++
1 row in set (0.01 sec)
CREATE TABLE ict (id INT NT NULL AUTINCREMENT PRIMARY KEY,
vcf VARCHAR(30),
INDEX (vcf)
);
INSERT INT ict (vcf)
VALUES ('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),(' 9'),('10'),
('11'),('12'),('13'),('14'),('15'),('16'),('17'),( '18'),('19'),('20');
EXPLAIN SELECT * FRM ict WHERE vcf = '17';
| id | select_type | table | type | p_keys | key | ref | rows |
Extra |
| 1 | SIMPLE | ict | ref | vcf | vcf | const | 1 | Using
where |
1 row in set (0.01 sec)
EXPLAIN SELECT * FRM ict WHERE vcf = 17;
| id | select_type | table | type | p_keys | key | ref | rows |
Extra |
| 1 | SIMPLE | ict | ALL | vcf | NULL | NULL | 20 | Using
where |
1 row in set (0.01 sec)
Why? Because there are many strings which evaluate to a given number.
For example:
INSERT INT ict (vcf)
VALUES (' 17'), ('+17'), ('17.0'), ('17,34'), ('17 dogs');
SELECT * FRM ict WHERE vcf = '17';
| id | vcf |
| 17 | 17 |
1 row in set (0.00 sec)
SELECT * FRM ict WHERE vcf = 17;
| id | vcf |
| 17 | 17 |
| 21 | 17 |
| 22 | +17 |
| 23 | 17.0 |
| 24 | 17,34 |
| 25 | 17 dogs |
6 rows in set (0.00 sec)
You see? The two WHERE clauses are actually different, so the optimizer
must treat them differently. No amount of optimizer cleverness can
change that. The moral of the story: Use numeric types to store
numbers, not string types.
Michael
No.6 | | 1021 bytes |
| 
Gleb Paharenko wrote:
Dear, Michael!
>>Did you read my earlier reply? This is neither an optimizer nor >
>>version issue.
I've sent my e-mail before I've read your. The problem is that I was
testing my guessing on the table with a bit different structure than
your (mine didn't have a primary key field 'id'). And optimizer has
shown that it uses indexes! There is a verified bug:
Thank you for good explanations!
I've added a comment to bug 14220 explaining why I do not believe this is a bug.
The short version is that without the id column, "SELECT *" is the same as
"SELECT vcf". In that case, the index on vcf is a covering index, so mysql
reads the data from the index, rather than the table. You will note in the
EXPLAIN output that it is still doing a full scan. It's just that it's a full
index scan rather than a full table scan.
Michael