Select record with MAX value in MySQL

06-29-2010

Registered User

5,521, 335

Join Date: Dec 2008

Last Activity: 28 March 2014, 8:35 AM EDT

Location: Vienna, Austria, Earth

Posts: 5,521

Thanks Given: 38

Thanked 335 Times in 308 Posts

It's been a while since I've spoken MySQL... but I'd do it this way

Code:

CREATE UNIQUE INDEX IX_uploaded_name_time ON uploaded ( file_name, file_time )

But the ALTER TABLE syntax (the first one) should get you the same result. If not, you can always drop and re-create the index.

It would be interesting tho to see the comparison of the execution timing and EXPLAIN plan with and without the indices.

And if you want to optimize further, here's an article on how to use the MySQL profiler.

This User Gave Thanks to pludi For This Post:

pludi

View Public Profile for pludi

Find all posts by pludi

06-29-2010

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

Quote:

In the real world, I have a big table containing a huge number of uploaded files.[...]

Well,
first of all, I think, you should rethink the relational design.

radoulov

View Public Profile for radoulov

Find all posts by radoulov

06-29-2010

Registered User

358, 5

Join Date: Nov 2008

Last Activity: 11 June 2020, 6:22 AM EDT

Location: various

Posts: 358

Thanks Given: 17

Thanked 5 Times in 5 Posts

Quote:

Originally Posted by radoulov

Well,
first of all, I think, you should rethink the relational design.

Thanks radoulov for this note but could you be more precise or at least give an example.

chebarbudo

View Public Profile for chebarbudo

Find all posts by chebarbudo

06-30-2010

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

OK, for example, you could try to normalize the data structures at least to the 1 normal form:

Quote:

First normal form excludes variable repeating fields and groups

You should take out the owners and the groups in a separate structures - so you'll end up with:

Code:

users                  

uid pk
username

Code:

groups

gid pk
group

May one archive contain more than one file? So:

Code:

paths

path_id pk
path

And so on ..., the permission mask is also a candidate because of repeating data.

Then you could create the files table with all attributes referenced by the relative ids.

Why do you want a separate tables for files and versions? You could have all the information in one table: filename, path_id, uid, gid, mod_time etc.

If you need to know the most recent versions at any time, you could create a view that selects them.

Just my 0.02�.

radoulov

View Public Profile for radoulov

Find all posts by radoulov

06-30-2010

Registered User

358, 5

Join Date: Nov 2008

Last Activity: 11 June 2020, 6:22 AM EDT

Location: various

Posts: 358

Thanks Given: 17

Thanked 5 Times in 5 Posts

Thanks radoulov, I got your point now.
I decided on purpose to allow the table uploaded to contain a lot of redundant data.
Why?
Datas are meant to go through the table uploaded but not stay in it. Once a line is added to the table, it goes through various processes and is then deleted. It would take ages to explain it all but here are the basics:

proftpd adds a row each time a file is uploaded
some process work line by line like the one that extract informations from tgz because it implies working with the filesystem.
some processes work as updates of numerous lines at the same time like those that retrieve file_id, file_owner_id, file_mod_id, etc.
and there's more...
until the entry is deleted

So basically, there is a process that update lines one by one with as few operations as I can. Actually I just parse tar -tzvvf "$path".
Then the other processes can work much faster by adding unknown file_owner and file_mod and then updating the whole table at once with the corresponding ids.
This view might help you understand:

Code:

+----+---------------------+-----------+--------------+------------+----------------+------------+----------------+---------------------+--------------------------+---------+--------+
| id | tgz_path            | file_mod  | file_mod_id  | file_owner | file_owner_id  | file_group | file_owner_id  | file_time           | file_name                | file_id | status |
+----+---------------------+-----------+--------------+------------+----------------+------------+----------------+---------------------+--------------------------+---------+--------+
|  1 | /home/ftp/18027.tgz | rw-r----- |           56 | root       |             34 | root       |             34 | 2010-06-28 10:11:51 | /home/santiago/doc.txt   |   94516 |      2 |
|  2 | /home/ftp/8400.tgz  | rw-r--r-- |           12 | santiago   |              3 | santiago   |              3 | 2010-06-29 17:37:29 | /home/santiago/doc.txt   |   94516 |      2 |
|  3 | /home/ftp/18695.tgz | rw-rw-r-- |              | root       |                | root       |                | 2010-06-27 11:42:56 | /usr/local/bin/script.sh |         |      1 |
|  4 | /home/ftp/27750.tgz | rwxr-x--- |              | foo        |                | foo        |                | 2010-06-29 00:53:32 | /usr/local/bin/script.sh |         |      1 |
|  5 | /home/ftp/4333.tgz  |           |              |            |                |            |                |                     |                          |         |      0 |
+----+---------------------+-----------+--------------+------------+----------------+------------+----------------+---------------------+--------------------------+---------+--------+

line 5 has just been added by proftpd, it has gone through no processes (status 0)
line 3 and 4 have gone through the process that reads info from tgz files (status 1)
line 1 and 2 have gone through the process that retrieves ids (status 2)
the next process add the record to the table version and delete it

To finish with your notes, I decided to separate table files and versions exactly for the reason you mentioned. If it was all in one table, the field file_path (255 char) would be repeated very often (because a lot of files exist in many versions).
The whole system is a backup facility. The table file records the address of each file with their latest mod/owner/group information (because one should not be able to retrieve an old version only because once in the past he had access to the file). The table versions records where is located each version of each files, no need to repeat file_path, file_mod and file_owner here.

Your remarks are a great help for me, don't hesitate to come back on what I said.

chebarbudo

View Public Profile for chebarbudo

Find all posts by chebarbudo

06-30-2010

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

I understand,
why don't you just update the files table at the end of each file processing? A single record update vs scheduled mass update.

P.S. If I understand correctly the last time update in the files table is the slowest part, right?

radoulov

View Public Profile for radoulov

Find all posts by radoulov

06-30-2010

Registered User

358, 5

Join Date: Nov 2008

Last Activity: 11 June 2020, 6:22 AM EDT

Location: various

Posts: 358

Thanks Given: 17

Thanked 5 Times in 5 Posts

Well my idea is that if I retrieved ids one by one, I would run millions of queries:

Code:

for each record; do
    UPDATE uploaded with tar -tzzf "$path" WHERE id = $id
    INSERT INTO IGNORE mods (file_mod) VALUES (the current mod)
    INSERT INTO IGNORE owner (file_owner) VALUES (the current owner)
    INSERT INTO IGNORE files (file_path) VALUES (the current path)
    UPDATE uploaded JOIN mods, owner, files SET mod_id =..., owner_id=... WHERE id = $id
done

This is no scripting language, it just a explanation of what it does.
This means that I will run 5 queries for each record.
Instead, I do that:

Code:

for each record; do
    UPDATE uploaded with tar -tzzf "$path" WHERE id = $id WHERE id = $id
done
# once it's all done
INSERT INTO IGNORE mods (file_mod) SELECT DISTINCT mod FROM uploaded
INSERT INTO IGNORE owner (file_owner) SELECT DISTINCT owner FROM uploaded
INSERT INTO IGNORE files (file_path) SELECT DISTINCT file_path FROM uploaded
UPDATE uploaded JOIN mods, owner, files SET mod_id =..., owner_id=...

Which means I run 5 times less queries + 4.
Does it make sens?

chebarbudo

View Public Profile for chebarbudo

Find all posts by chebarbudo

Shell Programming and Scripting

Select record with MAX value in MySQL

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script to record max memory usage of program

Discussion started by: LMHmedchem

2. Programming

MySQL select user with the same IP address

Discussion started by: Lord Spectre

3. Shell Programming and Scripting

Select record having different value in second column

Discussion started by: Sanjeev Yadav

4. Programming

Getting Rows from a MySQL Table with max values?

Discussion started by: Astrocloud

5. Shell Programming and Scripting

Max value Record from a File

Discussion started by: traininfa

6. UNIX for Dummies Questions & Answers

Select max value based on filename

Discussion started by: madden

7. UNIX for Advanced & Expert Users

mysql select query optimization..

Discussion started by: senkerth

8. Shell Programming and Scripting

mysql select query optimization..

Discussion started by: senkerth

9. Shell Programming and Scripting

Select Record based on First Column

Discussion started by: mgirinath