Optimizing query

 
Thread Tools Search this Thread
Special Forums UNIX and Linux Applications Optimizing query
# 8  
Old 08-03-2007
Quote:
Originally Posted by kahuna
I don't know if this is more efficient, but it seems like a positive approach might be better where there are only a few duplicates.

DELETE FROM tableA A1
WHERE column1 in (SELECT column1 FROM tableA GROUP BY column1 having count(*) > 1)
and rowid != (select min(rowid) from tableA A2 where A1.column1 = A2.column1)

Correct me if am wrong !

Am not able to understand the optimization that you have made to make the query run faster.

Basically, with combination of using selected columns ( column1 ) and the specific rowids ( rowid ), what is the need to specify an extra condition with a separate subquery to extract column1.

Isnt that redundant ? Or how does it make the query optimized and more efficient.

This is really equivalent to for each record with a rowid ' x ', necessarily two subqueries should be executed for all the records.

Last edited by reborg; 08-03-2007 at 03:30 PM..
# 9  
Old 08-03-2007
Quote:
Originally Posted by matrixmadhan
Basically, with combination of using selected columns ( column1 ) and the specific rowids ( rowid ), what is the need to specify an extra condition with a separate subquery to extract column1.
Yes you are right that it makes 2 sub-selects. If you have many duplicates, then your original post may be better. But suppose you only have a single duplicate. My subquery1 returns only a single value of column1 (the duplicate), where your subquery returns many values (the non duplicates). So now I have a single value to compare against tableA where you have many values. Yes, I still have to make subquery2, but it is against a much smaller set.

Subquery2 makes sure that, for a given value of column1, we don't delete the row with the smallest rowid.

Last edited by kahuna; 08-03-2007 at 04:03 PM.. Reason: Clarification
# 10  
Old 08-04-2007
Look - the query has to search the ENTIRE output of the IN (SELECT ..)
for each set of rows it deletes. I know that rowid's are what you get from an index.

Instead of me going on about this - look in Tom Kyte's book 'EXpoert One on One' and look in the analytical function chapter - there is an example of how to optimize a query very like this. Or try the asktom oracle site:
From there:
Code:
from 
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1224636375004

delete from tableA a
     where rowid <> ( select max(rowid)
                        from tableA b 
                       where b.column1 = a.column1)
    /

# 11  
Old 08-04-2007
I know that it may not be possible, but it would be interesting to see a comparison timing of the different queries.
# 12  
Old 08-06-2007
I made a test table with a single varchar field and no index. I loaded it with 30,000 records and an additional 30 duplicate records. I ran the following queries.

Code:
select count(*) FROM tableA
WHERE rowid not in
(SELECT MIN(rowid) FROM tableA GROUP BY column1);

1 hour 12 minutes 58 seconds

Code:
select count(*) FROM tableA A1
WHERE column1 in (SELECT column1 FROM tableA GROUP BY column1 having count(*)
> 1)
and rowid != (select min(rowid) from tableA A2 where A1.column1 = A2.column1);

1 second

Code:
select count(*) from tableA a
     where rowid <> ( select max(rowid)
                        from tableA b
                       where b.column1 = a.column1);

6 min 29 sec

Your mileage may vary.

Last edited by kahuna; 08-06-2007 at 11:04 AM..
# 13  
Old 08-16-2007
Matrixmadhan,

Have you decided if anything works better than your original query? If so, can you let us know what worked and some idea of the time difference? Thanks.
# 14  
Old 08-16-2007
Thanks for the follow up.

I had one more option of doing that.

Just tried with few records and that seems to be better.
Actually I didnt compare with bulk number of records.

Will do that and post the results positively by end of tomorrow ! Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Web Development

Optimizing JS and CSS

Yes. Got few suggestions. - How about minifying resources - mod_expires - Service workers setup https://www.unix.com/attachments/web-programming/7709d1550557731-sneak-preview-new-unix-com-usercp-vuejs-demo-screenshot-png (8 Replies)
Discussion started by: Akshay Hegde
8 Replies

2. Shell Programming and Scripting

Optimizing bash loop

now, i have to search for a pattern within a particular time frame which the user will provide in the following format: 19/Jun/2018:07:04,21/Jun/2018:21:30 it is easy to get tempted to attempt this search with a variation of the following awk command: awk... (3 Replies)
Discussion started by: SkySmart
3 Replies

3. Shell Programming and Scripting

Optimizing find with many replacements

Hello, I'm looking for advice on how to optimize this bash script, currently i use the shotgun approach to avoid file io/buffering problems of forks trying to write simultaneously to the same file. i'd like to keep this as a fairly portable bash script rather than writing a C routine. in a... (8 Replies)
Discussion started by: f77hack
8 Replies

4. Shell Programming and Scripting

Optimizing search using grep

I have a huge log file close to 3GB in size. My task is to generate some reporting based on # of times something is being logged. I need to find the number of time StringA , StringB , StringC is being called separately. What I am doing right now is: grep "StringA" server.log | wc -l... (4 Replies)
Discussion started by: Junaid Subhani
4 Replies

5. Shell Programming and Scripting

Optimizing awk script

Can this awk statement be optimized? i ask because log.txt is a giant file with several hundred thousands of lines of records. myscript.sh: while read line do searchterm="${1}" datecurr=$(date +%s) file=$(awk 'BEGIN{split(ARGV,var,",");print var}' $line) ... (3 Replies)
Discussion started by: SkySmart
3 Replies

6. Shell Programming and Scripting

Optimizing the code

Hi, I have two files in the format listed below. I need to find out all values from field 12 to field 20 present in file 2 and list them in file3(format as file2) File1 : FEIN,CHRISTA... (2 Replies)
Discussion started by: nua7
2 Replies

7. OS X (Apple)

Optimizing OSX

Hi forum, I'm administrating a workstation/server for my lab and I was wondering how to optimize OSX. I was wondering what unnecessary background tasks I could kick off the system so I free up as much memory and cpu power. Other optimization tips are also welcome (HD parameters, memory... (2 Replies)
Discussion started by: deiphon
2 Replies

8. Shell Programming and Scripting

Optimizing for a Speed-up

How would one go about optimizing this current .sh program so it works at a more minimal time. Such as is there a better way to count what I need than what I have done or better way to match patterns in the file? Thanks, #declare variables to be used. help=-1 count=0 JanCount=0 FebCount=0... (3 Replies)
Discussion started by: switch
3 Replies

9. Filesystems, Disks and Memory

optimizing disk performance

I have some questions regarding disk perfomance, and what I can do to make it just a little (or much :)) more faster. From what I've heard the first partitions will be faster than the later ones because tracks at the outer edges of a hard drive platter simply moves faster. But I've also read in... (4 Replies)
Discussion started by: J.P
4 Replies

10. Filesystems, Disks and Memory

Optimizing the system reliability

My product have around 10-15 programs/services running in the sun box, which together completes a task, sequentially. Several instances of the each program/service are running in the unix box, to manage the load and for risk-management reasons. As of now, we dont follow a strict strategy in... (2 Replies)
Discussion started by: Deepa
2 Replies
Login or Register to Ask a Question