awk - Remove duplicates during array build


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk - Remove duplicates during array build
# 1  
Old 07-04-2016
awk - Remove duplicates during array build

Greetings Experts,
Issue: Within awk script, remove the duplicate occurrences that are space (1 single space character) separated

Description: I am processing 2 files using awk and during processing, I am building an array and there are duplicates on this; how can I delete the duplicates within an awk without moving out of it; To put in a simple way, I am building an array as

Code:
awk -F "@" '
......
v_array[$1 OFS $2]=(v_array[$1 OFS $2] ? v_array[$1 OFS $2] "," $3 : $3)
.....
' file1.txt  file2.txt

File1.txt
Code:
col1          col2          col3
abc           def           xyz
abc           efg           pqr
abc           def           qrs
stu           vwx           yz
abc           def           xyz

current contents in v_array:
Code:
v_array[abc def]=xyz,qrs,xyz
v_array[abc efg]=pqr
v_array[stu vwx]=yz

Expected contents in v_array: As you can see xyz is repeated for the combination of abc def; hence it needs to be picked only once
Code:
v_array[abc def]=xyz,qrs   
v_array[abc efg]=pqr
v_array[stu vwx]=yz

Ordering is not required. It can be xyz,qrsor qrs,xyz

I can check for the presence of $3 in the v_array using the split function as
Code:
if (v_array) {
for (i in v_array) {
v_dup_check="not present"
v_cnt=split(i,v_a_tmp,",");
for (k=1;k<=v_cnt;k++) {
if (a_tmp[k]==$3) {
v_dup_check="present"} }
if (v_dup_check=="not present") {
v_array[$1 OFS $2]=(v_array[$1 OFS $2] ? v_array[$1 OFS $2] "," $3 : $3)
}
else {
v_array[$1 OFS $2]=v_array[$1 OFS $2] }
}}

This is what I can think as of now; hope there would be a much better approach to handle this within awk;

Also, how to sort the array index and array elements after completion of array build as I am learning awk through the forums; I mean
v_array[$1 OFS $2] -- how to process the elements in the order of $1 OFS $2
and also how to sort on the array values as
v_array[$1 OFS $2]=$3 -- how to process the elements in array in the order of $3

Thank you for your valuable time..

Edit:
Please note that for further processing, the array index should not be changed v_array[$1 OFS $2]

Last edited by chill3chee; 07-04-2016 at 02:25 PM.. Reason: Added additional info.
# 2  
Old 07-04-2016
Make (and test) the combination of $1, $2, and $3 unique:
Code:
awk  '
FNR == 1        {next
                }
!T[$1,$2,$3]++  {v_array[$1 OFS $2]=v_array[$1 OFS $2] ? v_array[$1 OFS $2] "," $3 : $3
                }
END             {for (v in v_array) print v, v_array[v]
                }
' file
abc def xyz,qrs
abc efg pqr
stu vwx yz

While some awk implementations provide a sort function, and you could build one yourself in others, piping through sort might be the easiest way to get what you want:

Code:
. . . | sort -k3
abc efg pqr
abc def xyz,qrs
stu vwx yz

This User Gave Thanks to RudiC For This Post:
# 3  
Old 07-06-2016
I aligned your post to suit my requirement and works great. As always, awesome and thank you RudiC.
I have a small question here. My script uses 2 input files is facing issues when reading the second file as
Code:
awk -F "@" ' NR==FNR { ....; next; } {  #second file processing }' file1.txt file2.txt

It doesn't even read the 2nd file; Tested with some print statements in the 2nd file processing and they never get printed. When I changed it to if (NR!=FNR) the same part works great.
Code:
awk -F "@" ' { if(NR==FNR) {
....; next; }
if(NR!=FNR){
#second file processing } }' file1.txt file2.txt

and this works; I am not complaining about awk; I am sure that I messed up some where and not able to figure it out. But if(NR!=FNR) comes to my rescue at this point and hence I am using it.

Though I know it is not possible to figure out the issue without looking into the script and files, looking for some guess; did someone ever face similar issue.
!T[$1 OFS $2 OFS $3]++ should have worked in my script. But strangely, it didn't. So, I just tweaked as
Code:
if (!T[$1 OFS $2 OFS $3] {
v_array[$1 OFS $2]=(v_array[$1 OFS $2]? v_array[$1 OFS $2] "," $3 : $3)
T[$1 OFS $2 OFS $3="1"
}

and this works. I am not sure what difference does ++ and the if make as they should be ideal.

Thank you for your time.
# 4  
Old 07-06-2016
There shouldn't be any NR == FNR nor NR != FNR; I simply put in FNR == 1 to exclude the header line(s). The scriptlet should work on any number of files supplied to it as one single stream of data (unless you terribly messed up something).
This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove duplicates in C shell Array?

Please help me on this My script name is uniqueArray.csh #!/bin/csh set ARRAY = ( one teo three one three ) set ARRAY = ( $ARRAY one five three five ) How to remove the duplicates in this array ,sort and save those in the same variable or different variable. Thanks in the advance ... (3 Replies)
Discussion started by: SA_Palani
3 Replies

2. Shell Programming and Scripting

Remove duplicates

Hi I have a below file structure. 200,1245,E1,1,E1,,7611068,KWH,30, ,,,,,,,, 200,1245,E1,1,E1,,7611070,KWH,30, ,,,,,,,, 300,20140223,0.001,0.001,0.001,0.001,0.001 300,20140224,0.001,0.001,0.001,0.001,0.001 300,20140225,0.001,0.001,0.001,0.001,0.001 300,20140226,0.001,0.001,0.001,0.001,0.001... (1 Reply)
Discussion started by: tejashavele
1 Replies

3. Shell Programming and Scripting

Remove top 3 duplicates

hello , I have a requirement with input in below format abc 123 xyz bcd 365 kii abc 987 876 cdf 987 uii abc 456 yuu bcd 654 rrr Expecting Output abc 456 yuu bcd 654 rrr cdf 987 uii (1 Reply)
Discussion started by: Tomlight
1 Replies

4. Shell Programming and Scripting

Remove duplicates

I have a file with the following format: fields seperated by "|" title1|something class|long...content1|keys title2|somhing class|log...content1|kes title1|sothing class|lon...content1|kes title3|shing cls|log...content1|ks I want to remove all duplicates with the same "title field"(the... (3 Replies)
Discussion started by: dtdt
3 Replies

5. Shell Programming and Scripting

Help with merge and remove duplicates

Hi all, I need some help to remove duplicates from a file before merging. I have got 2 files: file1 has data in format 4300 23456 4301 2357 the 4 byte values on the right hand side is uniq, and are not repeated anywhere in the file file 2 has data in same format but is not in... (10 Replies)
Discussion started by: roy121
10 Replies

6. Shell Programming and Scripting

awk remove first duplicates

Hi All, I have searched many threads for possible close solution. But I was unable to get simlar scenario. I would like to print all duplicate based on 3rd column except the first occurance. Also would like to print if it is single entry(non-duplicate). i/P file 12 NIL ABD LON 11 NIL ABC... (6 Replies)
Discussion started by: sybadm
6 Replies

7. Shell Programming and Scripting

Awk: Remove Duplicates

I have the following code for removing duplicate records based on fields in inputfile file & moves the duplicate records in duplicates file(1st Awk) & in 2nd awk i fetch the non duplicate entries in inputfile to tmp file and use move to update the original file. Requirement: Can both the awk... (4 Replies)
Discussion started by: siramitsharma
4 Replies

8. Shell Programming and Scripting

bash - remove duplicates

I need to use a bash script to remove duplicate files from a download list, but I cannot use uniq because the urls are different. I need to go from this: http://***/fae78fe/file1.wmv http://***/39du7si/file1.wmv http://***/d8el2hd/file2.wmv http://***/h893js3/file2.wmv to this: ... (2 Replies)
Discussion started by: locoroco
2 Replies

9. Shell Programming and Scripting

Remove duplicates

Hello Experts, I have two files named old and new. Below are my example files. I need to compare and print the records that only exist in my new file. I tried the below awk script, this script works perfectly well if the records have exact match, the issue I have is my old file has got extra... (4 Replies)
Discussion started by: forumthreads
4 Replies

10. Shell Programming and Scripting

Arranging an array so that duplicates will turn up first

Hi All, I have an array that contains duplicates as well unique numbers. ex- (21, 33, 35, 21, 33, 70, 33, 35, 50) I need to arrange it in such a way that all the duplicates will come up first followed by unique numbers. Result for the given example should be: (21, 21, 33, 33, 35, 35, 70,... (4 Replies)
Discussion started by: ashim
4 Replies
Login or Register to Ask a Question