JOINed ORDERed LIMITed query not optimized enough - need help

mysql

    Next

  • 1. what's better one common table for something like category or multi different tables? and which better and why?
    Gordon Burditt wrote: >> what's better one common table for something like category or multi >> different tables? and which better and why? > > I'll take "Reasons to not drink and post" for $200, Bob. > Ah, don't be so hard on him. I suspect English isn't his first language. But no, I don't understand the question, either. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. XXXX@XXXXX.COM ==================
  • 2. REGEXP and CREATE TRIGGER
    Always use manualy Query (UPDATE table SET column = replace(column,"-","") WHERE `column` REGEXP '-') My question: How to build a "trigger" for the update database which removes the sign "-"?
  • 3. Mysql preference setting optimize help...
    Dear All: can help me about mysql preference setting optimize ..? my server 5000 visits of a day.. now .. my database slowly services to my application. I found some sql use disk temp .... when many people visit on the same time.. .. how can i setting my.ini ? my DB environment ====================================================================== OS:windows 2003 server CPU: intel core 2.3GHz Mysql version: 5.0.45-community-nt RAM: 2G my.ini setting ====================================================================== default-storage-engine=INNODB sql- mode="STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION" max_connections=1000 key_buffer_size=128M read_buffer_size=64K read_rnd_buffer_size=256K query_cache_size=192M max_allowed_packet=32M innodb_flush_log_at_trx_commit=2 innodb_log_file_size = 128M innodb_additional_mem_pool_size = 20M innodb_log_buffer_size = 64M innodb_buffer_pool_size = 916M set-variable = back_log=50 set-variable = key_buffer=64M set-variable = max_allowed_packet=32M set-variable = table_cache=2048 set-variable = thread_cache=128 set-variable = sort_buffer=256K set-variable = record_buffer=256K set-variable = max_connect_errors=30000 set-variable = long_query_time=1 set-variable = max_heap_table_size=384M set-variable = tmp_table_size=512M set-variable = myisam_sort_buffer_size=128M set-variable = net_buffer_length=16K
  • 4. autoincrement pk in many-to-many table?
    Here's a question. Say you have a situation where you have a many-to- many relationship. You have a table Students, a table Classes, and the m2m join table called Schedule. The Schedule table would have at least two columns, student_id and class_id, which are foreign keys to Students and Classes, respectively. You would make them unique, and index them. Should you include an autoincrement primary key? I've found out that, from a programming perspective, it's easier to create a DELETE statement when you only have one variable to worry about, that being the primary key. Otherwise, you have to pass around two variables, which is slightly less simple. However, from a purely data point of view, it's redundant to have a separate primary key field in that table. Would it have an effect on the table's performance also, either negative or positive?
  • 5. Seperating select results by pages
    Is there a way of selecting a "page" of results from a select query? I know I can use Limit to get just a certain amount, but can is there a way to have for example LIMIT 20 pageoffset 3 and get the results 40 - 59? (i hope this makes sense) Bill H

JOINed ORDERed LIMITed query not optimized enough - need help

Postby Joachim Durchholz » Mon, 02 Apr 2007 03:58:18 GMT

Hi all,

I need to retrieve 13 rows from a 100,000-row JOIN, starting with a row
with given values.
All required indexes are available AFAICT, yet EXPLAIN tells me mysql is
"Using index; Using temporary; Using filesort" on the first table it's
touching.

I don't know where I'm going wrong, or whether I'm hitting a limitation
of mysql's query optimizer. Any help appreciated.

DETAILS

mysql version is 5.0.15.

Tables are set up like this:

-- Main categories. An ID and a name.
CREATE TABLE main (
   id int(9),
   name varchar(40),
   PRIMARY KEY (id),
   KEY name (name)
) TYPE = InnoDB;
-- Sample categories. Choose random numbers so that id order and name
-- order are different.
INSERT INTO main VALUES
   (0, CONCAT(RAND(), ' main 0')),
   ...
   (9, CONCAT(RAND(), ' main 9'));
-- Just not to leave this stone unturned:
ANALYZE TABLE main;

-- Subcategories. ID-plus-name.
-- An additional foreign key that links up with the 'main' table.
CREATE TABLE sub (
   id int(9),
   name varchar(40),
   main_id int(9),
   PRIMARY KEY (id),
   KEY name_main (name, main_id),
   KEY main_name (main_id, name)
) TYPE = InnoDB;
INSERT INTO sub VALUES
   (000, CONCAT(RAND(), ' main 0 sub 000'), 0),
   ...
   (999, CONCAT(RAND(), ' main 0 sub 099'), 9);
ANALYZE TABLE sub;

-- Finally, detail records, all 100,000 of them.
CREATE TABLE detail (
   id int(9),
   name varchar(40),
   sub_id int(9),
   PRIMARY KEY (id),
   KEY name_sub (name, sub_id),
   KEY sub_name (sub_id, name)
) TYPE = InnoDB;
INSERT INTO detail VALUES
   (00000, CONCAT(RAND(), ' sub 000 detail 00000'), 000),
   ...
   (99999, CONCAT(RAND(), ' sub 999 detail 99999'), 999);
ANALYZE TABLE detail;


Now I want to present a detail-sub-main list to the end user:

SELECT * FROM
   main
   JOIN sub ON main.id = sub.main_id
   JOIN detail ON sub.id = detail.sub_id

I want this to be sorted by detail name, then sub name, finally main name:

SELECT * FROM
   main
   JOIN sub ON main.id = sub.main_id
   JOIN detail ON sub.id = detail.sub_id
   ORDER by detail.name, sub.name, main.name

The user isn't supposed to see the entire list, he's scrolling through 
it. Let's say he's at the record with detail name '0.5xxxx' and needs to 
see the next 12 records, then I have

SELECT * FROM
   main
   JOIN sub ON main.id = sub.main_id
   JOIN detail ON sub.id = detail.sub_id
   WHERE detail.name > '0.5'
   ORDER by detail.name, sub.name, main.name
   LIMIT 12

This takes just 0.46-0.54 seconds to complete on my machine, but the 
real thing with 150,000 detail records and four additional JOINs to 
produce auxiliary information is unbearably slow: 25-40 seconds.

Now EXPLAIN tells me mysql started with the 'main' table - that's just 
backwards, it should have started with 'detail' to find just the first 
twelve rows, then work up through 'sub' to 'main'!
It's no surprise that it then needed to "use temporary; use filesort".

Well, to speed up this particular case, I might be able to use a 
STRAIGHT JOIN. However, that would require reordering the JOIN part of 
the query, and I'd like to avoid that if possible - I'd have to do some 
serious rewriting of the dynamic SQL part of the application.

Hints, tips, improvements very much appreciated.

Regards,
Jo

Re: JOINed ORDERed LIMITed query not optimized enough - need help

Postby Joachim Durchholz » Mon, 02 Apr 2007 07:02:49 GMT

> Well, to speed up this particular case, I might be able to use a 

Update: I just tried STRAIGHT_JOIN.
It doesn't help either.

Strangely enough, the mysql manual says this on 
 http://www.**--****.com/ :
"If you use LIMIT row_count with ORDER BY, MySQL ends the sorting as 
soon as it has found the first row_count rows of the sorted result, 
rather than sorting the entire result. If ordering is done by using an 
index, this is very fast."

That's exactly what I hoped for.
However, the text goes on to say:

"If a filesort must be done, all rows that match the query without the 
LIMIT clause must be selected, and most or all of them must be sorted, 
before it can be ascertained that the first row_count rows have been found."

Now EXPLAIN indeed says that it's using a filesort:

id: 1
select_type: SIMPLE
table: detail
type: range
possible_keys: name_sub, sub_name
key: name_sub
key_len: 43
ref: NULL
rows: 50231 (that's probably the number of rows that are >= '0.5')
Extra: Using where; Using index; Using temporary; Using filesort


... but why?


Regards,
Jo

Re: JOINed ORDERed LIMITed query not optimized enough - need help

Postby Axel Schwenke » Tue, 03 Apr 2007 18:57:51 GMT



table main:        10 rows
table sub:      1.000 rows
table detail: 100.000 rows

This is potentially a 1.000.000.000 row JOIN result.



If you ORDER BY columns from different tables, then MySQL always uses
external sorting of the result set:
 http://www.**--****.com/ 


LIMIT doesn't help here. Before LIMIT can be applied, sorting has to be
done. Before sorting can be done, the complete result set must be known.


Nope. The first 12 rows in table detail may not have corresponding rows
in the sub table and may thus not be part of the unLIMITed join result.
If the first N rows from detail all have the same value in name, then
the sort order is unknown unless MySQL looks at the other JOINed tables.

LIMIT is done last, with one exception: if MySQL can retrieve rows in
sort order, that is: if there is an index that can be used for finding
rows in the specified result order - then MySQL will stop retrieving
rows when the LIMIT is reached. This is not the case here.


Also the decision of the optimizer depends on many things. The main
goal of the optimizer is to keep the intermediate JOIN result small.
If MySQL JOINs in order main -> sub -> detail there are [10], [at most
10.000], [final] rows in the intermediate results table.

For order sub -> detail -> main this is [100.000 * filter selectivity],
[at most 100.000 * filter selectivity], [final].

Now all depends on how selective your filter on the details table is
(or rather: what the optimizer thinks how selective it is). I guess
detail.name > '0.5' is not very selective. But even if your filter
would yield just 10.000 rows from the detail table, the overall query
cost would still be higher than with the original plan.

It would be interesting to see the query plan when the filter finds
only very few (say 10) rows from the detail table.



This is quite old. I recommend to upgrade to latest 5.0. There have
been many bugfixes and probably optimizer improvements since.


XL
Online User Manual:  http://www.**--****.com/ 
MySQL User Forums:   http://www.**--****.com/ 

Re: JOINed ORDERed LIMITed query not optimized enough - need help

Postby Joachim Durchholz » Tue, 03 Apr 2007 23:57:05 GMT

xel Schwenke schrieb:

I have
...
JOIN detail ON sub.id = ...
where sub.id is the primary key. So it's already known that there's at
most 1 sub record per detail record (and, likewise, at most 1 main
record per sub record).
In summary, we can infer that the result is no more records than were
selected from the detail table.

If I had added referential integrity constraints, we'd also be able to
infer that there is always a sub record for each main record. (This is
InnoDb, so this would have been a possibility.)

The interesting question here is, of course, whether the optimizer is
smart enough to do the same kind of inference that we can.


Well, my (admittedly slightly short-sighted) expectation would have been
that mysql would read 12 detail rows and add the sub and main rows after
that, finally sorting the result.


Agreed.

Would adding referential integrity constraints have made a difference?


Right. That case could actually happen (not really with the example data
which has randomized 'name' data, but with production data, real names
tend to have duplicates).
Though I would have expected that in this case, mysql would read the
first 12 rows via main's name index, then continue reading on that index
as long as there are rows with the same name. Since mysql can inspect
the key distribution, it would even have at least a rough estimate of
how may rows that would be even before reading them (something that I
cannot easily do from the application level). I.e. the selectivity would
not only be far better than 50%, mysql would even have an estimate.


The real problem seems to be that the ORDER BY clause happens on columns
from multiple tables. At least that's what I gathered from the 5.0
documentation (a day after posting, as usual).

I have googled around a bit. Paging in huge lists seems to be something
that few databases can do well - at least that's was people said in a
pgsql mailing list said.

>
> For order sub -> detail -> main this is [100.000 * filter
> selectivity], [at most 100.000 * filter selectivity], [final].

Actually EXPLAIN showed that it would start with detail.
I guess the ORDER BY clause gave the incentive (without the ORDER BY, it
indeed started with main). Possibly plus the knowledge that each JOIN
has a fan-out of at most 1.


Indeed. I chose '0.5' because the standard usage would be paging through
the list with a browser, and the default selectivity would then be
roughly 50%.

I had hoped that the LIMIT clause would imply enough selectivity, which
is essentially
12/100,000*(factor for non-equidistribution of 'name' key)
which would have ensured the expected behavior even if mysql's key
statistics had been very coarse-grained.

> But even if your filter

SELECT * FROM
main
JOIN sub ON main.id = sub.main_id
JOIN detail ON sub.id = detail.sub_id
WHERE detail.name LIKE '0.5000%'
ORDER by detail.name, sub.name, main.name

0.0041 sec, which is what I'd expect.

The JOIN order is still detail-sub-main, with "Using where; Using index;
Using temporary; Using filesort" on 'detail'.

On second thought, I don't find the JOIN order very surprising: the
selectivity is on detail anyway, so any other join order would have been
suicidal.

Since the LIKE clause happens to restrict the result to exactly 12 rows,
it's no surprise that the filesort is blindingly fast in

Similar Threads:

1.Need to optimize a long query with JOIN

Hello,

I wanted to know if there's anyone that could help me with this
problem. My query takes too much time to execute and I don't know how
to make it better.

SELECT SQL_CALC_FOUND_ROWS i.*, DATE_FORMAT(i.date, '%d-%m-%y') as
date, u.login
FROM img i
LEFT JOIN comments c ON c.imgid=i.id
LEFT JOIN users u ON u.id=i.userid
LEFT JOIN flags f ON i.id=f.imgid
WHERE i.deleted=0
AND c.date > f.lastview
AND c.userid!=530
AND f.userid=530
GROUP BY i.id ORDER BY i.id DESC LIMIT 0,12

The problem is that this query needs to examinate more that 1M rows
(by running EXPLAIN). There's 150k rows on table "comments", 4k on
users, 35k on img and 1,8 M on "flags". All table have indexes and PK.

Thanks a lot,


Jean-Baptiste

2.Need help ordering by the count of a join table in x number of hours

I'm a bit stumped on this problem (definitely not a SQL guru).  I have
2 tables like so:

create table items (
  id              int             unsigned not null auto_increment,
  user_id         int             unsigned not null,
  title           varchar(64)     not null,
  link            varchar(128)    not null,
  created_on      timestamp,
  primary key (id)
) engine=innodb default charset=utf8;

create table hits (
  id              int             unsigned not null auto_increment,
  remote_ip        varchar             not null,
  item_id         int             unsigned not null,
  created_at          datetime            not null,
  primary key (id)
) engine=innodb default charset=utf8;


How can I find all items ordered by the count of hits occuring within
x number of hours?  So, basically I don't want a where clause.  I want
to retrieve *all* items.  I just want them to be weighted by the
number of hits they've received in the last hour or 6 hours or
whatever...

Thanks in advance

3.I need help optimizing this query

Here is the query I am currently using.
It seems to take forever. It keeps wanting to use filesort and a temporary can 
anyone plz help me. I will send the creates for both tables.

select collections.`id`,collections.`subject`,collections.`numfiles`,collections.
`groups`,
collections.`size`,collections.`from`,collections.`date`,collections.`nfo`
FROM temp.collections,temp.nfo where match (nfo.nfo) against ('$searchterm' 
in boolean mode) and nfo.cid = collections.id order by `date` desc limit 0,50

CREATE TABLE `nfo` (
`id` mediumint(6) unsigned NOT NULL auto_increment,
`cid` int(10) unsigned NOT NULL,
`fid` int(10) unsigned NOT NULL,
`nfo` mediumtext NOT NULL,
`filename` varchar(200) NOT NULL,
PRIMARY KEY (`id`),
KEY `cid` (`cid`),
KEY `fid` (`fid`),
FULLTEXT KEY `nfo` (`nfo`)
) ENGINE=MyISAM AUTO_INCREMENT=152507 DEFAULT 
CHARSET=latin1

CREATE TABLE `collections` (
`id` int(10) unsigned NOT NULL auto_increment,
`subject` text NOT NULL,
`filename` text NOT NULL,
`numfiles` smallint(4) unsigned NOT NULL default '0',
`groups` text NOT NULL,
`partids` mediumtext NOT NULL,
`size` bigint(20) unsigned NOT NULL default '0',
`from` tinytext NOT NULL,
`date` int(10) unsigned NOT NULL default '0',
`group1` smallint(4) unsigned NOT NULL,
`group2` smallint(4) unsigned default NULL,
`group3` smallint(4) unsigned default NULL,
`group4` smallint(4) unsigned default NULL,
`group5` smallint(4) unsigned default NULL,
`nfo` int(10) unsigned NOT NULL default '0',
PRIMARY KEY (`id`),
UNIQUE KEY `filename_U` (`filename`(255)),
KEY `from` (`from`(50)),
KEY `date` (`date`),
KEY `Size` (`size`),
KEY `nfo` (`nfo`),
KEY `groupindex` USING BTREE (`group1`,`group2`,`group3`,`group4`,
`group5`),
FULLTEXT KEY `Subject_full` (`subject`)
) ENGINE=MyISAM AUTO_INCREMENT=1865103 DEFAULT 
CHARSET=latin1 MIN_ROWS=100000 MAX_ROWS=2500000 
AVG_ROW_LENGTH=500


4.HELP WITH QUERY - I THINK I need a JOIN, not sure

If anyone could help with this, I'd be much obliged.

Basically I have one table, users, with these columns:

user_id
name
email

and a 2nd table called user_interests with fields:

interest_id
user_id
interest_text

I want to run one query that gives me all the users in the "users"
table and adds columns that contain the "interest_text" of any matching
rows in the "user_interests" table

i.e.
user_id:  10
name:  andy
email:   XXXX@XXXXX.COM 
interest1:  soccer
interest2:  music
interest3:  movies


This seems fairly simple.  What do I need to do to make this happen?

Thanks!

5.Need help optimizing a batch query

I have a sproc that will return several tables, and it looks something like 
this:

SELECT * FROM TABLEA WHERE somelongasswhereclause

SELECT * FROM TABLEB WHERE TableBID IN (SELECT foreignKeyToTableB FROM 
TABLEA WHERE somelongasswhereclause)

SELECT * FROM TABLEC WHERE TableBID IN (SELECT foreignKeyToTableC FROM 
TABLEA WHERE somelongasswhereclause)


So what I'm trying to do here is get a buncha records from TableA, and then 
get a buncha records from TableB that relate to TableA and so on.  I could 
just do a JOIN, but I really would like the data to be in separate tables. 
My problem is my somelongasswhereclause.  Its a pretty long WHERE clause 
with lots of conditions, and it seems to me that having to evaluate them 
over and over is a waste.  What options do I have here?  I was thinking of 
opening a cursor on TABLEA and creating an IN clause of the TABLEA IDs, so 
that in the following queries I would never have to say "...FROM TABLEA 
WHERE somelongasswhereclause" again.  I could just say "...FROM TABLEA WHERE 
tableAID IN (...)", but I still need to return TABLEA and how would I do 
that if I made a cursor out of it? 

6. Need Help Optimizing A Query

7. Need help optimizing this query

8. Need help to optimize query



Return to mysql

 

Who is online

Users browsing this forum: No registered users and 97 guest