![]() |
|
|
|
|
|||||||
| UNIX and Linux Applications Questions involving software not covered by other forum go here. This includes Databases and Middleware. |
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Tips for Optimizing Rails on Oracle | iBot | Oracle Updates (RSS) | 0 | 04-06-2008 02:10 AM |
| Optimizing for a Speed-up | switch | Shell Programming and Scripting | 3 | 04-07-2006 05:52 PM |
| optimizing disk performance | J.P | Filesystems, Disks and Memory | 4 | 03-01-2005 08:32 AM |
| Optimizing Apache Server Performance | Neo | News, Links, Events and Announcements | 0 | 02-14-2005 06:30 PM |
| Optimizing the system reliability | Deepa | Filesystems, Disks and Memory | 2 | 12-11-2002 07:09 PM |
|
|
Submit Tools | LinkBack | Thread Tools | Display Modes |
|
|||
|
Optimizing query
Hi All,
My first thread to this sub-forum and first thread of this sub-forum Here it is, Am trying to delete duplicates from a table retaining just 1 duplicate value out of the duplicate records for example : from n records of a table out of which x are duplicates, I want to remove x - 1 records retaining 1 record from the x duplicates This is the query am using, but I think this could still be optimized for this query really takes time for huge dumps. DELETE FROM tableA WHERE rowid not in (SELECT MIN(rowid) FROM tableA GROUP BY column1) Oracle 9i |
| Forum Sponsor | ||
|
|
|
|||
|
rowid is not an indexed column - it is a "pseudocolumn'. the in () subselect will read thru the entire select statement's result set each time. When I get back in a while I'll write something that is faster. You may need to add an index.
|
|
||||
|
I don't know if this is more efficient, but it seems like a positive approach might be better where there are only a few duplicates.
DELETE FROM tableA A1 WHERE column1 in (SELECT column1 FROM tableA GROUP BY column1 having count(*) > 1) and rowid != (select min(rowid) from tableA A2 where A1.column1 = A2.column1) |
|
|||
|
Quote:
rowid <1> - evaluate sub query rowid <2> - evaluate sub query . . . rowid <n> - evaluate sub query Last edited by reborg; 08-03-2007 at 11:29 AM. Reason: touch post to fix quotes |
|
|||
|
Shell_Life
out of the 4 potential hazards that you have listed since the query is executed only on a table with 0.25 million records, I just encounter the 4th hazard which is taking real long time. When it initially took such a long time, I though I might be receiving ' Long transaction aborted '. But didn't. Considering the alternative of programmatically deleting is a fine idea without filling the logs. |
|
||||
|
Quote:
It is also important to note that the database server does not assign rowids to rows in fragmented tables. Last edited by reborg; 08-03-2007 at 11:29 AM. Reason: touch post to fix quotes |
|
|||
|
Quote:
Correct me if am wrong ! Am not able to understand the optimization that you have made to make the query run faster. Basically, with combination of using selected columns ( column1 ) and the specific rowids ( rowid ), what is the need to specify an extra condition with a separate subquery to extract column1. Isnt that redundant ? Or how does it make the query optimized and more efficient. This is really equivalent to for each record with a rowid ' x ', necessarily two subqueries should be executed for all the records. Last edited by reborg; 08-03-2007 at 11:30 AM. |
|
||||
|
Quote:
Subquery2 makes sure that, for a given value of column1, we don't delete the row with the smallest rowid. Last edited by kahuna; 08-03-2007 at 12:03 PM. Reason: Clarification |
|
|||
|
Look - the query has to search the ENTIRE output of the IN (SELECT ..)
for each set of rows it deletes. I know that rowid's are what you get from an index. Instead of me going on about this - look in Tom Kyte's book 'EXpoert One on One' and look in the analytical function chapter - there is an example of how to optimize a query very like this. Or try the asktom oracle site: From there: Code:
from
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1224636375004
delete from tableA a
where rowid <> ( select max(rowid)
from tableA b
where b.column1 = a.column1)
/
|
|
||||
|
I made a test table with a single varchar field and no index. I loaded it with 30,000 records and an additional 30 duplicate records. I ran the following queries.
Code:
select count(*) FROM tableA WHERE rowid not in (SELECT MIN(rowid) FROM tableA GROUP BY column1); 1 hour 12 minutes 58 seconds Code:
select count(*) FROM tableA A1 WHERE column1 in (SELECT column1 FROM tableA GROUP BY column1 having count(*) > 1) and rowid != (select min(rowid) from tableA A2 where A1.column1 = A2.column1); 1 second Code:
select count(*) from tableA a
where rowid <> ( select max(rowid)
from tableA b
where b.column1 = a.column1);
6 min 29 sec
Last edited by kahuna; 08-06-2007 at 07:04 AM. |
|
|||
|
Thanks for the follow up.
I had one more option of doing that. Just tried with few records and that seems to be better. Actually I didnt compare with bulk number of records. Will do that and post the results positively by end of tomorrow ! |
| Thread Tools | |
| Display Modes | |
|
|
|
The 50 most popular UNIX and Linux searches.
Google Search Cloud for The UNIX and Linux Forums
|
| "inappropriate ioctl for device" 421 service not available, remote server has closed connection ^m autosys awk trim bash eval bash exec bash for loop boot: cannot open kernel/sparcv9/unix close_wait command copy/move folder in unix curses.h cut command in unix dead.letter find grep find null character in a unix file grep multiple lines grep or grep recursive grep unique inaddr_any inappropriate ioctl for device logrotate.conf lynx javascript mailx attachment mget mtime ping port remove first character from string in k shell replace space by comma , perl script scp recursive segmentation fault(coredump) sftp batch sftp script |