Optimizing query

 
Thread Tools Search this Thread
Special Forums UNIX and Linux Applications Optimizing query
# 1  
Old 08-03-2007
Optimizing query

Hi All,

My first thread to this sub-forum and first thread of this sub-forum Smilie

Here it is,

Am trying to delete duplicates from a table retaining just 1 duplicate value out of the duplicate records

for example : from n records of a table out of which x are duplicates, I want to remove x - 1 records retaining 1 record from the x duplicates

This is the query am using,

but I think this could still be optimized for this query really takes time for huge dumps.

DELETE FROM tableA
WHERE rowid not in
(SELECT MIN(rowid) FROM tableA GROUP BY column1)

Oracle 9i
# 2  
Old 08-03-2007
rowid is not an indexed column - it is a "pseudocolumn'. the in () subselect will read thru the entire select statement's result set each time. When I get back in a while I'll write something that is faster. You may need to add an index.
# 3  
Old 08-03-2007
Matrix,
Your solution is very simple and easy to understand.

Although it works fine for small to medium tables, or for tables with low access/update,
for very large tables, or tables with heavy access/update, it may:
1) Generate an abnormally long transaction.
2) Fill the logs.
3) Be involved with another process in a deadlock.
4) Run for a very long time.

It is also important to note that for large tables, the internals of your query will be
very ineficient as the system will store a rowid for each unique key and loop thru
each one for every row.

The best and optimized solution would be to write a program to loop thru each row
in the table, begin a transaction and commit every number of deleted rows -- usually
one to five thousand is very quick, safe and easy on the database.

Good luck!
# 4  
Old 08-03-2007
I don't know if this is more efficient, but it seems like a positive approach might be better where there are only a few duplicates.

DELETE FROM tableA A1
WHERE column1 in (SELECT column1 FROM tableA GROUP BY column1 having count(*) > 1)
and rowid != (select min(rowid) from tableA A2 where A1.column1 = A2.column1)
# 5  
Old 08-03-2007
Quote:
Originally Posted by jim mcnamara
rowid is not an indexed column - it is a "pseudocolumn'. the in () subselect will read thru the entire select statement's result set each time. When I get back in a while I'll write something that is faster. You may need to add an index.
Does that mean its executing in this fashion.

rowid <1> - evaluate sub query
rowid <2> - evaluate sub query
.
.
.
rowid <n> - evaluate sub query

Last edited by reborg; 08-03-2007 at 03:29 PM.. Reason: touch post to fix quotes
# 6  
Old 08-03-2007
Shell_Life

out of the 4 potential hazards that you have listed

since the query is executed only on a table with 0.25 million records, I just encounter the 4th hazard which is taking real long time.

When it initially took such a long time, I though I might be receiving ' Long transaction aborted '. But didn't.

Considering the alternative of programmatically deleting is a fine idea without filling the logs. Smilie
# 7  
Old 08-03-2007
Quote:
rowid is not an indexed column - it is a "pseudocolumn'.
By definition, rowids are the physical address of each row, thus it is also an index.

It is also important to note that the database server does not assign rowids to rows
in fragmented tables.

Last edited by reborg; 08-03-2007 at 03:29 PM.. Reason: touch post to fix quotes
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Web Development

Optimizing JS and CSS

Yes. Got few suggestions. - How about minifying resources - mod_expires - Service workers setup https://www.unix.com/attachments/web-programming/7709d1550557731-sneak-preview-new-unix-com-usercp-vuejs-demo-screenshot-png (8 Replies)
Discussion started by: Akshay Hegde
8 Replies

2. Shell Programming and Scripting

Optimizing bash loop

now, i have to search for a pattern within a particular time frame which the user will provide in the following format: 19/Jun/2018:07:04,21/Jun/2018:21:30 it is easy to get tempted to attempt this search with a variation of the following awk command: awk... (3 Replies)
Discussion started by: SkySmart
3 Replies

3. Shell Programming and Scripting

Optimizing find with many replacements

Hello, I'm looking for advice on how to optimize this bash script, currently i use the shotgun approach to avoid file io/buffering problems of forks trying to write simultaneously to the same file. i'd like to keep this as a fairly portable bash script rather than writing a C routine. in a... (8 Replies)
Discussion started by: f77hack
8 Replies

4. Shell Programming and Scripting

Optimizing search using grep

I have a huge log file close to 3GB in size. My task is to generate some reporting based on # of times something is being logged. I need to find the number of time StringA , StringB , StringC is being called separately. What I am doing right now is: grep "StringA" server.log | wc -l... (4 Replies)
Discussion started by: Junaid Subhani
4 Replies

5. Shell Programming and Scripting

Optimizing awk script

Can this awk statement be optimized? i ask because log.txt is a giant file with several hundred thousands of lines of records. myscript.sh: while read line do searchterm="${1}" datecurr=$(date +%s) file=$(awk 'BEGIN{split(ARGV,var,",");print var}' $line) ... (3 Replies)
Discussion started by: SkySmart
3 Replies

6. Shell Programming and Scripting

Optimizing the code

Hi, I have two files in the format listed below. I need to find out all values from field 12 to field 20 present in file 2 and list them in file3(format as file2) File1 : FEIN,CHRISTA... (2 Replies)
Discussion started by: nua7
2 Replies

7. OS X (Apple)

Optimizing OSX

Hi forum, I'm administrating a workstation/server for my lab and I was wondering how to optimize OSX. I was wondering what unnecessary background tasks I could kick off the system so I free up as much memory and cpu power. Other optimization tips are also welcome (HD parameters, memory... (2 Replies)
Discussion started by: deiphon
2 Replies

8. Shell Programming and Scripting

Optimizing for a Speed-up

How would one go about optimizing this current .sh program so it works at a more minimal time. Such as is there a better way to count what I need than what I have done or better way to match patterns in the file? Thanks, #declare variables to be used. help=-1 count=0 JanCount=0 FebCount=0... (3 Replies)
Discussion started by: switch
3 Replies

9. Filesystems, Disks and Memory

optimizing disk performance

I have some questions regarding disk perfomance, and what I can do to make it just a little (or much :)) more faster. From what I've heard the first partitions will be faster than the later ones because tracks at the outer edges of a hard drive platter simply moves faster. But I've also read in... (4 Replies)
Discussion started by: J.P
4 Replies

10. Filesystems, Disks and Memory

Optimizing the system reliability

My product have around 10-15 programs/services running in the sun box, which together completes a task, sequentially. Several instances of the each program/service are running in the unix box, to manage the load and for risk-management reasons. As of now, we dont follow a strict strategy in... (2 Replies)
Discussion started by: Deepa
2 Replies
Login or Register to Ask a Question