Removing specific records from files when duplicate key


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Removing specific records from files when duplicate key
# 1  
Old 05-21-2014
Removing specific records from files when duplicate key

Hello

I have been trying to remove a row from a file which has the same first three columns as another row - I have tried lots of different combinations of suggestion on this forum but can't get it exactly right.

what I have is
Code:
900 - 1000 = 0
900 - 1000 =  2562
1000 - 1100 = 0
1000 - 1100 =  931
1100 - 1200 = 0
1100 - 1200 =  469
1200 - 1300 = 0
1300 - 1400 = 0
1300 - 1400 =  175
1400 - 1500 = 0
1400 - 1500 =  112

what I want is
Code:
900 - 1000 =  2562
1000 - 1100 =  931
1100 - 1200 =  469
1200 - 1300 = 0
1300 - 1400 =  175
1400 - 1500 =  112

Any help would be greatly appreciated
Smilie

Last edited by Franklin52; 05-21-2014 at 09:42 AM.. Reason: Please use code tags
# 2  
Old 05-21-2014
Let's give it a try.

Code:
awk '{a[$1]=$0; next}END{for (i in a) {print a[i]}}' filename | sort -n

These 2 Users Gave Thanks to Aia For This Post:
# 3  
Old 05-21-2014
1. thanks for the quick reply
2. your recommendation works with a larger amount of data, and now I have a large bunch of data that i need to parse - but I can handle that
3. I owe you a tasty beverage if you are in my neck of the woods Smilie
# 4  
Old 05-21-2014
Quote:
Originally Posted by Aia
Let's give it a try.

Code:
awk '{a[$1]=$0; next}END{for (i in a) {print a[i]}}' filename | sort -n

It works great but I don't get it. Smilie

To me it looks like you save each record in an array, using $1 as index. That's ok, I understand. Then you decide to jump to the next record.. why?

And then it magically worked and it's all saved in the array and print it at the end.

I can't see the light in this one. Could you explain it a little bit?
# 5  
Old 05-21-2014
Quote:
Originally Posted by Aia
Let's give it a try.

Code:
awk '{a[$1]=$0; next}END{for (i in a) {print a[i]}}' filename | sort -n

Since it is the first three columns, technically that would need to be:
Code:
awk '{a[$1,$2,$3]=$0; next} .....


Last edited by Scrutinizer; 05-21-2014 at 09:17 PM..
# 6  
Old 05-21-2014
Quote:
Originally Posted by Scrutinizer
Since it is the first three columns, technically that would need to be:
Code:
awk 'a[$1,$2,$3]=$0 .....

Now I understand.

Because there's always the first ocurrence that equals 0 which does not count, and it has the same index for the array, the second value overlaps the first, so it's always saved the second value of the same pattern, in case there's a second value with the same pattern.

This time the key was paying attention to the index and how awk saves in the array.

Thanks.
# 7  
Old 05-22-2014
Quote:
Originally Posted by Kibou
Now I understand.

Because there's always the first ocurrence that equals 0 which does not count, and it has the same index for the array, the second value overlaps the first, so it's always saved the second value of the same pattern, in case there's a second value with the same pattern.

This time the key was paying attention to the index and how awk saves in the array.

Thanks.
There is no test for 0. It is not necessarily the second line with a given value for the 1st three fields that is saved in the array; it is the last line with a given value for the 1st three fields that is saved. If there is one line with 900, -, and 1000 as the 1st three fields on the line, respectively, a[$1, $2, $3]'s value (or in this case a["900", "-", "1000"]'s value) will be that entire line. If there is more one line with 900, -, and 1000 as the 1st three fields on the line, respectively, a[$1, $2, $3]'s value will be the last line starting with those three values.

When processing an array with:
Code:
for(i in a)

the elements are processed in a random order (not necessarily the order in which they were found in the input file). This is why aia used sort -n to print the output in the same order as the (sorted) input file.
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Join and merge multiple files with duplicate key and fill void columns

Join and merge multiple files with duplicate key and fill void columns Hi guys, I have many files that I want to merge: file1.csv: 1|abc 1|def 2|ghi 2|jkl 3|mno 3|pqr file2.csv: (5 Replies)
Discussion started by: yjacknewton
5 Replies

2. Shell Programming and Scripting

Listing the file name and no of records in each files for the files created on a specific day

Hi, I want to display the file names and the record count for the files in the 2nd column for the files created today. i have written the below command which is listing the file names. but while piping the above command to the wc -l command its not working for me. ls -l... (5 Replies)
Discussion started by: Showdown
5 Replies

3. Shell Programming and Scripting

Removing duplicate records in a file based on single column explanation

I was reading this thread. It looks like a simpler way to say this is to only keep uniq lines based on field or column 1. https://www.unix.com/shell-programming-scripting/165717-removing-duplicate-records-file-based-single-column.html Can someone explain this command please? How are there no... (5 Replies)
Discussion started by: cokedude
5 Replies

4. Shell Programming and Scripting

removing duplicate records comparing 2 csv files

Hi All, I want to remove the rows from File1.csv by comparing a column/field in the File2.csv. If both columns matches then I want that row to be deleted from File1 using shell script(awk). Here is an example on what I need. File1.csv: RAJAK,ACTIVE,1 VIJAY,ACTIVE,2 TAHA,ACTIVE,3... (6 Replies)
Discussion started by: rajak.net
6 Replies

5. Shell Programming and Scripting

Removing duplicate records in a file based on single column

Hi, I want to remove duplicate records including the first line based on column1. For example inputfile(filer.txt): ------------- 1,3000,5000 1,4000,6000 2,4000,600 2,5000,700 3,60000,4000 4,7000,7777 5,999,8888 expected output: ---------------- 3,60000,4000 4,7000,7777... (5 Replies)
Discussion started by: G.K.K
5 Replies

6. Linux

Need awk script for removing duplicate records

I have log file having Traffic line 2011-05-21 15:11:50.356599 TCP (6), length: 52) 10.10.10.1.3020 > 10.10.10.254.50404: 2011-05-21 15:11:50.652739 TCP (6), length: 52) 10.10.10.254.50404 > 10.10.10.1.3020: 2011-05-21 15:11:50.652558 TCP (6), length: 89) 10.10.10.1.3020 >... (1 Reply)
Discussion started by: Rastamed
1 Replies

7. Shell Programming and Scripting

Removing duplicate records from 2 files

Can anyone help me to removing duplicate records from 2 separate files in UNIX? Please find the sample records for both the files cat Monday.dat 3FAHP0JA1AR319226MOHMED ATEK 966504453742 SAU2010DE 3LNHL2GC6AR636361HEA DEUK CHOI 821057314531 KOR2010LE 3MEHM0JG7AR652083MUTLAB NAL-NAFISAH... (4 Replies)
Discussion started by: zooby
4 Replies

8. Linux

Need awk script for removing duplicate records

I have huge txt file having millions of trade data. For e.g Trade.txt (first 8 lines in the file is header info) COB_DATE,TRADE_ID,SOURCE_SYSTEM_TRADE_ID,TRADE_GROUP_ID, TRADE_TYPE,DEALER_NAME,EXTERNAL_COUNTERPARTY_ID, EXTERNAL_COUNTERPARTY_NAME,DB_COUNTERPARTY_ID,... (6 Replies)
Discussion started by: nmumbarkar
6 Replies

9. Shell Programming and Scripting

How to delete duplicate records based on key

For example suppose I have a file which contains data as: $cat data 800,2 100,9 700,3 100,9 200,8 100,3 Now I want the output as 200,8 700,3 800,2 Key is first three characters, I don't want any reords which are having duplicate keys. Like sort +0.0 -0.3 data can we use... (9 Replies)
Discussion started by: sumitc
9 Replies

10. Shell Programming and Scripting

Removing duplicate files from list with different path

I have a list which contains all the jar files shipped with the product I am involved with. Now, in this list I have some jar files which appear again and again. But these jar files are present in different folders. My input file looks like this /path/1/to a.jar /path/2/to a.jar /path/1/to... (10 Replies)
Discussion started by: vino
10 Replies
Login or Register to Ask a Question