Select record with MAX value in MySQL


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Select record with MAX value in MySQL
# 8  
Old 06-29-2010
It's been a while since I've spoken MySQL... but I'd do it this way
Code:
CREATE UNIQUE INDEX IX_uploaded_name_time ON uploaded ( file_name, file_time )

But the ALTER TABLE syntax (the first one) should get you the same result. If not, you can always drop and re-create the index.

It would be interesting tho to see the comparison of the execution timing and EXPLAIN plan with and without the indices.

And if you want to optimize further, here's an article on how to use the MySQL profiler.
This User Gave Thanks to pludi For This Post:
# 9  
Old 06-29-2010
Quote:
In the real world, I have a big table containing a huge number of uploaded files.[...]
Well,
first of all, I think, you should rethink the relational design.
# 10  
Old 06-29-2010
Quote:
Originally Posted by radoulov
Well,
first of all, I think, you should rethink the relational design.
Thanks radoulov for this note but could you be more precise or at least give an example.
# 11  
Old 06-30-2010
OK, for example, you could try to normalize the data structures at least to the 1 normal form:

Quote:
First normal form excludes variable repeating fields and groups
You should take out the owners and the groups in a separate structures - so you'll end up with:

Code:
users                  

uid pk
username

Code:
groups

gid pk
group

May one archive contain more than one file? So:

Code:
paths

path_id pk
path

And so on ..., the permission mask is also a candidate because of repeating data.

Then you could create the files table with all attributes referenced by the relative ids.

Why do you want a separate tables for files and versions? You could have all the information in one table: filename, path_id, uid, gid, mod_time etc.

If you need to know the most recent versions at any time, you could create a view that selects them.


Just my 0.02€.
# 12  
Old 06-30-2010
Thanks radoulov, I got your point now.
I decided on purpose to allow the table uploaded to contain a lot of redundant data.
Why?
Datas are meant to go through the table uploaded but not stay in it. Once a line is added to the table, it goes through various processes and is then deleted. It would take ages to explain it all but here are the basics:
  • proftpd adds a row each time a file is uploaded
  • some process work line by line like the one that extract informations from tgz because it implies working with the filesystem.
  • some processes work as updates of numerous lines at the same time like those that retrieve file_id, file_owner_id, file_mod_id, etc.
  • and there's more...
  • until the entry is deleted
So basically, there is a process that update lines one by one with as few operations as I can. Actually I just parse tar -tzvvf "$path".
Then the other processes can work much faster by adding unknown file_owner and file_mod and then updating the whole table at once with the corresponding ids.
This view might help you understand:
Code:
+----+---------------------+-----------+--------------+------------+----------------+------------+----------------+---------------------+--------------------------+---------+--------+
| id | tgz_path            | file_mod  | file_mod_id  | file_owner | file_owner_id  | file_group | file_owner_id  | file_time           | file_name                | file_id | status |
+----+---------------------+-----------+--------------+------------+----------------+------------+----------------+---------------------+--------------------------+---------+--------+
|  1 | /home/ftp/18027.tgz | rw-r----- |           56 | root       |             34 | root       |             34 | 2010-06-28 10:11:51 | /home/santiago/doc.txt   |   94516 |      2 |
|  2 | /home/ftp/8400.tgz  | rw-r--r-- |           12 | santiago   |              3 | santiago   |              3 | 2010-06-29 17:37:29 | /home/santiago/doc.txt   |   94516 |      2 |
|  3 | /home/ftp/18695.tgz | rw-rw-r-- |              | root       |                | root       |                | 2010-06-27 11:42:56 | /usr/local/bin/script.sh |         |      1 |
|  4 | /home/ftp/27750.tgz | rwxr-x--- |              | foo        |                | foo        |                | 2010-06-29 00:53:32 | /usr/local/bin/script.sh |         |      1 |
|  5 | /home/ftp/4333.tgz  |           |              |            |                |            |                |                     |                          |         |      0 |
+----+---------------------+-----------+--------------+------------+----------------+------------+----------------+---------------------+--------------------------+---------+--------+

  • line 5 has just been added by proftpd, it has gone through no processes (status 0)
  • line 3 and 4 have gone through the process that reads info from tgz files (status 1)
  • line 1 and 2 have gone through the process that retrieves ids (status 2)
  • the next process add the record to the table version and delete it

To finish with your notes, I decided to separate table files and versions exactly for the reason you mentioned. If it was all in one table, the field file_path (255 char) would be repeated very often (because a lot of files exist in many versions).
The whole system is a backup facility. The table file records the address of each file with their latest mod/owner/group information (because one should not be able to retrieve an old version only because once in the past he had access to the file). The table versions records where is located each version of each files, no need to repeat file_path, file_mod and file_owner here.

Your remarks are a great help for me, don't hesitate to come back on what I said.
# 13  
Old 06-30-2010
I understand,
why don't you just update the files table at the end of each file processing? A single record update vs scheduled mass update.

P.S. If I understand correctly the last time update in the files table is the slowest part, right?
# 14  
Old 06-30-2010
Well my idea is that if I retrieved ids one by one, I would run millions of queries:
Code:
for each record; do
    UPDATE uploaded with tar -tzzf "$path" WHERE id = $id
    INSERT INTO IGNORE mods (file_mod) VALUES (the current mod)
    INSERT INTO IGNORE owner (file_owner) VALUES (the current owner)
    INSERT INTO IGNORE files (file_path) VALUES (the current path)
    UPDATE uploaded JOIN mods, owner, files SET mod_id =..., owner_id=... WHERE id = $id
done

This is no scripting language, it just a explanation of what it does.
This means that I will run 5 queries for each record.
Instead, I do that:
Code:
for each record; do
    UPDATE uploaded with tar -tzzf "$path" WHERE id = $id WHERE id = $id
done
# once it's all done
INSERT INTO IGNORE mods (file_mod) SELECT DISTINCT mod FROM uploaded
INSERT INTO IGNORE owner (file_owner) SELECT DISTINCT owner FROM uploaded
INSERT INTO IGNORE files (file_path) SELECT DISTINCT file_path FROM uploaded
UPDATE uploaded JOIN mods, owner, files SET mod_id =..., owner_id=...

Which means I run 5 times less queries + 4.
Does it make sens?
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script to record max memory usage of program

Hello, I am working on an application that uses allot of memory depending on the input. I am also working on more than one processing algorithm. The program has a long processing time (hours) so it would be nice to be able to monitor the maximum memory footprint of the application during runs... (1 Reply)
Discussion started by: LMHmedchem
1 Replies

2. Programming

MySQL select user with the same IP address

Dear community, I woul like to make a query to output all the users having the same IP address. The table is somethig like: name logged_ip ==== ========= user1 127.0.0.1 user2 127.0.0.2 user3 127.0.0.3 user4 127.0.0.1 user5 127.0.0.2 user6 127.0.0.5I used this query... (4 Replies)
Discussion started by: Lord Spectre
4 Replies

3. Shell Programming and Scripting

Select record having different value in second column

I want records which have more than one and different value in the second column on the below sample file. Ex, I have the samle file below :- XYZ 1 XYZ 3 abc 1 abc 1 qwe 2 qwe 1 qwe 3 I want to select XYZ and QWE line only. (6 Replies)
Discussion started by: Sanjeev Yadav
6 Replies

4. Programming

Getting Rows from a MySQL Table with max values?

I feel stupid for asking this because it seems that MYSQL code isn't working the way that I think it should work. Basically I wrote code like this: select * from `Test_DC_Trailer` HAVING max(DR_RefKey); Where the DR_RefKey is a unique numeric field that is auto iterated (like a primary key)... (7 Replies)
Discussion started by: Astrocloud
7 Replies

5. Shell Programming and Scripting

Max value Record from a File

I have a file with 3 columns I want to get the record with the max salary. If there are more than 1 record having the highest salary, i want any one of them. empid,ename,Sal 1,abc,100 2,def,200 3,xyz,300 4,pqr,100 5,mnq,300 Output Any record with 300 as salary Please use next... (1 Reply)
Discussion started by: traininfa
1 Replies

6. UNIX for Dummies Questions & Answers

Select max value based on filename

Hi, I would just like to know how to get the file with the max filename on a directory and get rid of all the others. For example, in directory A:/ i have the ff files: APPLE2001 APPLE2002 APPLE2003 GRAPE2004 what I want to get is the max in files whose filenames start with APPLE*,... (4 Replies)
Discussion started by: madden
4 Replies

7. UNIX for Advanced & Expert Users

mysql select query optimization..

hi.. i need to optimize my select query .. my situation is like this .. i have 15 lac recors in my table.. the following query takes nine seconds to give the required output.. SELECT max(ah.AUC_AMT), SUBSTRING_INDEX(GROUP_CONCAT(SUBSTRING_INDEX(ah.AUC_CUS_NAME,'@',1) order by AUC_AMT... (1 Reply)
Discussion started by: senkerth
1 Replies

8. Shell Programming and Scripting

mysql select query optimization..

hi.. i need to optimize my select query .. my situation is like this .. i have 15 lac recors in my table.. the following query takes nine seconds to give the required output.. SELECT max(ah.AUC_AMT), SUBSTRING_INDEX(GROUP_CONCAT(SUBSTRING_INDEX(ah.AUC_CUS_NAME,'@',1) order by AUC_AMT... (0 Replies)
Discussion started by: senkerth
0 Replies

9. Shell Programming and Scripting

Select Record based on First Column

Hi, I have a file with multiple records...and I have to select records based on first column....here is the sample file... I01,abc,125,1a2,LBVI02 I01,abc,126,2b5,LBVI02 I02,20070530,254,abc,LLBI01 I02,20070820,111,bvd,NGBI01 I need all records with I01 in first field in one file and... (8 Replies)
Discussion started by: mgirinath
8 Replies
Login or Register to Ask a Question