Delete unique rows - optimize script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Delete unique rows - optimize script
# 1  
Old 09-16-2012
Delete unique rows - optimize script

Hi all,

I have the following input - the unique row key is 1st column

Code:
cat file.txt
 
[4] A response
[1] C request
[1] C response
[3] D request
[2] C request
[2] C response
[5] E request

The desired output should be

Code:
[1] C request
[1] C response
[2] C request
[2] C response

Now i have implemented the below loop which does work but when the input file is bigger than 300 mb in size the whole process of removing the non-pairs rows takes ages since it needs to scan the whole file in a loop.

Code:
#/bin/bash
req=$(mktemp)
res=$(mktemp)
new=$(mktemp)
tmp=$(mktemp)
grep request  $1 > $req
grep response $1 > $res
for id in `cat $req | awk '{ print $1}'` 
do    
    id=$(echo $id | tr -d "[]")
    grep "$id" $res > $tmp 
    if [[ -s $tmp ]]
    then
 grep "$id" $req >> $new
 cat $tmp >> $new 
    fi
done
mv $new $2
rm $req $res $tmp

Any idea how i can optimize/ do it differently to remove the unique rows as per above example in order to speed up the process?
# 2  
Old 09-16-2012
Code:
[4] A response
[1] C request
[1] C response
[3] D request
[2] C request
[2] C response
[5] E request

I think you want all the response items as per your request?????

Code:
#/bin/bash
req=$(mktemp)
res=$(mktemp)
new=$(mktemp)
tmp=$(mktemp)
grep request  $1 | sort | uniq > $req   #to get unique request items.
grep response $1 > $res
for id in "$(cat $req | awk '{ print $1,$2}')" # Get whole part as - [1] C,[2] C -- Makes it more unique
do    
    #id=$(echo $id | tr -d "[]")   # I think its better to search with [1] C to get more unique results.. that's why commented this.
    grep "$id" $res > $tmp 
    if [[ -s $tmp ]]
    then
 grep "$id" $req >> $new
 cat $tmp >> $new 
    fi
done
mv $new $2
rm $req $res $tmp


Last edited by pamu; 09-16-2012 at 04:58 AM..
# 3  
Old 09-16-2012
Thanks for quick reply.

I want only request/response pairs. IF there is only request or response without any associated pair then it should be removed.

Cheers.
# 4  
Old 09-16-2012
Quote:
Originally Posted by varu0612
I want only request/response pairs. IF there is only request or response without any associated pair then it should be removed.
Have you tried my suggestions..?
# 5  
Old 09-16-2012
Yes i did - but there is no difference since is the same logic, for each row it needs to scan the whole file in a loop.

I think i'll gain more if i find a way to scan the file once and delete the unique rows/ lines, something like this....

check for key "[1] request" or "[1] response" and if you don't find the pair delete it..
# 6  
Old 09-16-2012
Quote:
Originally Posted by varu0612
check for key "[1] request" or "[1] response" and if you don't find the pair delete it..
Try this..

Code:
sort file | awk '/request/ {s=$0;p=$1;getline} {if ($0 ~ /response/ && $1 == p) { print s"\n"$0 }}'

This User Gave Thanks to pamu For This Post:
# 7  
Old 09-16-2012
One way :
Code:
~/unix.com$ awk 'NR==FNR{A[$1]++;next}A[$1]==2' file.txt file.txt

This User Gave Thanks to tukuyomi For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help Optimize the Script Further

Hi All, I have written a new script to check for DB space and size of dump log file before it can be imported into a Oracle DB. I'm relatively new to shell scripting. Please help me optimize this script further. (0 Replies)
Discussion started by: narayanv
0 Replies

2. UNIX for Dummies Questions & Answers

Removing rows that contain non-unique column entry

Background: I have a file of thousands of potential SSR primers from Batch Primer 3. I can't use primers that will contain the same sequence ID or sequence as another primer. I have some basic shell scripting skills, but not enough to handle this. What you need to know: I need to remove the... (1 Reply)
Discussion started by: msatseqs
1 Replies

3. Shell Programming and Scripting

Script to delete rows in a file

Hi All, I am new to UNIX . Please help me in writing code to delete all records from the file where all columns after cloumn 5 in file is either 0, #MI or NULL. Initial 5 columns are string e.g. "alsod" "1FEV2" "wjwroe" " wsse" "hd3" 1 2 34 #Mi "malasl" "wses" "trwwwe" " wsse" "hd3" 1 2 0... (4 Replies)
Discussion started by: alok2082
4 Replies

4. Shell Programming and Scripting

Unique extraction of rows

I do have a tab delimited file of the following format: 431 kat1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 432 kat2 2 NA NA NA NA NA NA NA NA NA NA NA NA NA 433 KATe NA 3 NA NA 6 NA NA NA 10 11 NA NA NA NA 542 Kaed 2 NA NA NA NA NA NA NA NA NA NA NA NA NA 543 hkwuy NA NA NA NA 6 NA NA NA NA 11 NA NA... (11 Replies)
Discussion started by: Kanja
11 Replies

5. UNIX and Linux Applications

Script to delete few rows from a file and then update header

HJKL1Name00014300010800000418828124201 L201207022012070228XAM 00000000031795404 001372339540000000000000000000000 COOLTV KEYA Zx00 xI-50352202553 00000000 00000000 G000000000000 00000000 ... (10 Replies)
Discussion started by: mirwasim
10 Replies

6. UNIX for Dummies Questions & Answers

Extract unique combination of rows from text files

Hi Gurus, I have 100 tab-delimited text files each with 21 columns. I want to extract only 2nd and 5th column from each text file. However, the values in both 2bd and 5th column contain duplicate values but the combination of these values in a row are not duplicate. I want to extract only those... (3 Replies)
Discussion started by: Unilearn
3 Replies

7. UNIX for Dummies Questions & Answers

Delete rows with unique value for specific column

Hi all I have a file which looks like this 1234|1|Jon|some text|some text 1234|2|Jon|some text|some text 3453|5|Jon|some text|some text 6533|2|Kate|some text|some text 4567|3|Chris|some text|some text 4567|4|Maggie|some text|some text 8764|6|Maggie|some text|some text My third column is my... (9 Replies)
Discussion started by: A-V
9 Replies

8. Shell Programming and Scripting

Shell script to count unique rows in a CSV

HI All, I have a CSV file of 30 columns separated by ,. I want to get a count of all unique rows written to a flat file. The CSV file is around 5000 rows The first column is a time stamp and I need to exclude while counting unique Thanks, Ravi (4 Replies)
Discussion started by: Nani369
4 Replies

9. Shell Programming and Scripting

Script to delete older versions of unique files

I have directory where new sub directories and files being created every few minutes. The directories are like abc_date, def_date, ghi_date. I am looking to keep the latest 2 unique directories and delete everything else. Here is what I have so far This gives me unique names excluding the... (5 Replies)
Discussion started by: zzstore
5 Replies

10. Shell Programming and Scripting

optimize the script

Hi, I have this following script below. Its searching a log file for 2 string and if found then write the strings to success.txt and If not found write strings to failed.txt . if one found and not other...then write found to success.txt and not found to failed.txt. I want to optimize this... (3 Replies)
Discussion started by: amitrajvarma
3 Replies
Login or Register to Ask a Question