Delete unique rows - optimize script

09-16-2012

Registered User

28, 0

Join Date: Oct 2008

Last Activity: 17 February 2014, 12:56 PM EST

Location: UK - South East

Posts: 28

Thanks Given: 4

Thanked 0 Times in 0 Posts

Delete unique rows - optimize script

Hi all,

I have the following input - the unique row key is 1st column

Code:

cat file.txt
 
[4] A response
[1] C request
[1] C response
[3] D request
[2] C request
[2] C response
[5] E request

The desired output should be

Code:

[1] C request
[1] C response
[2] C request
[2] C response

Now i have implemented the below loop which does work but when the input file is bigger than 300 mb in size the whole process of removing the non-pairs rows takes ages since it needs to scan the whole file in a loop.

Code:

#/bin/bash
req=$(mktemp)
res=$(mktemp)
new=$(mktemp)
tmp=$(mktemp)
grep request  $1 > $req
grep response $1 > $res
for id in `cat $req | awk '{ print $1}'` 
do    
    id=$(echo $id | tr -d "[]")
    grep "$id" $res > $tmp 
    if [[ -s $tmp ]]
    then
 grep "$id" $req >> $new
 cat $tmp >> $new 
    fi
done
mv $new $2
rm $req $res $tmp

Any idea how i can optimize/ do it differently to remove the unique rows as per above example in order to speed up the process?

varu0612

View Public Profile for varu0612

Find all posts by varu0612

09-16-2012

Registered User

1,650, 478

Join Date: Mar 2012

Last Activity: 11 September 2019, 8:06 AM EDT

Posts: 1,650

Thanks Given: 58

Thanked 478 Times in 474 Posts

Code:

[4] A response
[1] C request
[1] C response
[3] D request
[2] C request
[2] C response
[5] E request

I think you want all the response items as per your request?????

Code:

#/bin/bash
req=$(mktemp)
res=$(mktemp)
new=$(mktemp)
tmp=$(mktemp)
grep request  $1 | sort | uniq > $req   #to get unique request items.
grep response $1 > $res
for id in "$(cat $req | awk '{ print $1,$2}')" # Get whole part as - [1] C,[2] C -- Makes it more unique
do    
    #id=$(echo $id | tr -d "[]")   # I think its better to search with [1] C to get more unique results.. that's why commented this.
    grep "$id" $res > $tmp 
    if [[ -s $tmp ]]
    then
 grep "$id" $req >> $new
 cat $tmp >> $new 
    fi
done
mv $new $2
rm $req $res $tmp

Last edited by pamu; 09-16-2012 at 04:58 AM..

pamu

View Public Profile for pamu

Find all posts by pamu

09-16-2012

Registered User

28, 0

Join Date: Oct 2008

Last Activity: 17 February 2014, 12:56 PM EST

Location: UK - South East

Posts: 28

Thanks Given: 4

Thanked 0 Times in 0 Posts

Thanks for quick reply.

I want only request/response pairs. IF there is only request or response without any associated pair then it should be removed.

Cheers.

varu0612

View Public Profile for varu0612

Find all posts by varu0612

09-16-2012

Registered User

1,650, 478

Join Date: Mar 2012

Last Activity: 11 September 2019, 8:06 AM EDT

Posts: 1,650

Thanks Given: 58

Thanked 478 Times in 474 Posts

Quote:

Originally Posted by varu0612

I want only request/response pairs. IF there is only request or response without any associated pair then it should be removed.

Have you tried my suggestions..?

pamu

View Public Profile for pamu

Find all posts by pamu

09-16-2012

Registered User

28, 0

Join Date: Oct 2008

Last Activity: 17 February 2014, 12:56 PM EST

Location: UK - South East

Posts: 28

Thanks Given: 4

Thanked 0 Times in 0 Posts

Yes i did - but there is no difference since is the same logic, for each row it needs to scan the whole file in a loop.

I think i'll gain more if i find a way to scan the file once and delete the unique rows/ lines, something like this....

check for key "[1] request" or "[1] response" and if you don't find the pair delete it..

varu0612

View Public Profile for varu0612

Find all posts by varu0612

09-16-2012

Registered User

1,650, 478

Join Date: Mar 2012

Last Activity: 11 September 2019, 8:06 AM EDT

Posts: 1,650

Thanks Given: 58

Thanked 478 Times in 474 Posts

Quote:

Originally Posted by varu0612

check for key "[1] request" or "[1] response" and if you don't find the pair delete it..

Try this..

Code:

sort file | awk '/request/ {s=$0;p=$1;getline} {if ($0 ~ /response/ && $1 == p) { print s"\n"$0 }}'

This User Gave Thanks to pamu For This Post:

pamu

View Public Profile for pamu

Find all posts by pamu

09-16-2012

Registered User

320, 81

Join Date: Aug 2009

Last Activity: 14 May 2019, 11:07 AM EDT

Location: France

Posts: 320

Thanks Given: 19

Thanked 81 Times in 76 Posts

One way :

Code:

~/unix.com$ awk 'NR==FNR{A[$1]++;next}A[$1]==2' file.txt file.txt

This User Gave Thanks to tukuyomi For This Post:

tukuyomi

View Public Profile for tukuyomi

Find all posts by tukuyomi

Shell Programming and Scripting

Delete unique rows - optimize script

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help Optimize the Script Further

Discussion started by: narayanv

2. UNIX for Dummies Questions & Answers

Removing rows that contain non-unique column entry

Discussion started by: msatseqs

3. Shell Programming and Scripting

Script to delete rows in a file

Discussion started by: alok2082

4. Shell Programming and Scripting

Unique extraction of rows

Discussion started by: Kanja

5. UNIX and Linux Applications

Script to delete few rows from a file and then update header

Discussion started by: mirwasim

6. UNIX for Dummies Questions & Answers

Extract unique combination of rows from text files

Discussion started by: Unilearn

7. UNIX for Dummies Questions & Answers

Delete rows with unique value for specific column

Discussion started by: A-V

8. Shell Programming and Scripting

Shell script to count unique rows in a CSV

Discussion started by: Nani369

9. Shell Programming and Scripting

Script to delete older versions of unique files

Discussion started by: zzstore

10. Shell Programming and Scripting

optimize the script

Discussion started by: amitrajvarma