awk - Remove duplicates during array build Post: 302976719

Sponsored Content

Top Forums Shell Programming and Scripting awk - Remove duplicates during array build Post 302976719 by chill3chee on Monday 4th of July 2016 01:17:11 PM

07-04-2016

Registered User

awk - Remove duplicates during array build

Greetings Experts,
Issue: Within awk script, remove the duplicate occurrences that are space (1 single space character) separated

Description: I am processing 2 files using awk and during processing, I am building an array and there are duplicates on this; how can I delete the duplicates within an awk without moving out of it; To put in a simple way, I am building an array as

Code:

awk -F "@" '
......
v_array[$1 OFS $2]=(v_array[$1 OFS $2] ? v_array[$1 OFS $2] "," $3 : $3)
.....
' file1.txt  file2.txt

File1.txt

Code:

col1          col2          col3
abc           def           xyz
abc           efg           pqr
abc           def           qrs
stu           vwx           yz
abc           def           xyz

current contents in v_array:

Code:

v_array[abc def]=xyz,qrs,xyz
v_array[abc efg]=pqr
v_array[stu vwx]=yz

Expected contents in v_array: As you can see xyz is repeated for the combination of abc def; hence it needs to be picked only once

Code:

v_array[abc def]=xyz,qrs   
v_array[abc efg]=pqr
v_array[stu vwx]=yz

Ordering is not required. It can be xyz,qrsor qrs,xyz

I can check for the presence of $3 in the v_array using the split function as

Code:

if (v_array) {
for (i in v_array) {
v_dup_check="not present"
v_cnt=split(i,v_a_tmp,",");
for (k=1;k<=v_cnt;k++) {
if (a_tmp[k]==$3) {
v_dup_check="present"} }
if (v_dup_check=="not present") {
v_array[$1 OFS $2]=(v_array[$1 OFS $2] ? v_array[$1 OFS $2] "," $3 : $3)
}
else {
v_array[$1 OFS $2]=v_array[$1 OFS $2] }
}}

This is what I can think as of now; hope there would be a much better approach to handle this within awk;

Also, how to sort the array index and array elements after completion of array build as I am learning awk through the forums; I mean
v_array[$1 OFS $2] -- how to process the elements in the order of $1 OFS $2
and also how to sort on the array values as
v_array[$1 OFS $2]=$3 -- how to process the elements in array in the order of $3

Thank you for your valuable time..

Edit:
Please note that for further processing, the array index should not be changed v_array[$1 OFS $2]

Last edited by chill3chee; 07-04-2016 at 02:25 PM.. Reason: Added additional info.

chill3chee

View Public Profile for chill3chee

Find all posts by chill3chee

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Arranging an array so that duplicates will turn up first

Hi All, I have an array that contains duplicates as well unique numbers. ex- (21, 33, 35, 21, 33, 70, 33, 35, 50) I need to arrange it in such a way that all the duplicates will come up first followed by unique numbers. Result for the given example should be: (21, 21, 33, 33, 35, 35, 70,...

2. Shell Programming and Scripting

Remove duplicates

Hello Experts, I have two files named old and new. Below are my example files. I need to compare and print the records that only exist in my new file. I tried the below awk script, this script works perfectly well if the records have exact match, the issue I have is my old file has got extra...

3. Shell Programming and Scripting

bash - remove duplicates

I need to use a bash script to remove duplicate files from a download list, but I cannot use uniq because the urls are different. I need to go from this: http://***/fae78fe/file1.wmv http://***/39du7si/file1.wmv http://***/d8el2hd/file2.wmv http://***/h893js3/file2.wmv to this: ...

4. Shell Programming and Scripting

Awk: Remove Duplicates

I have the following code for removing duplicate records based on fields in inputfile file & moves the duplicate records in duplicates file(1st Awk) & in 2nd awk i fetch the non duplicate entries in inputfile to tmp file and use move to update the original file. Requirement: Can both the awk...

5. Shell Programming and Scripting

awk remove first duplicates

Hi All, I have searched many threads for possible close solution. But I was unable to get simlar scenario. I would like to print all duplicate based on 3rd column except the first occurance. Also would like to print if it is single entry(non-duplicate). i/P file 12 NIL ABD LON 11 NIL ABC...

6. Shell Programming and Scripting

Help with merge and remove duplicates

Hi all, I need some help to remove duplicates from a file before merging. I have got 2 files: file1 has data in format 4300 23456 4301 2357 the 4 byte values on the right hand side is uniq, and are not repeated anywhere in the file file 2 has data in same format but is not in...

7. Shell Programming and Scripting

Remove duplicates

8. Shell Programming and Scripting

Remove top 3 duplicates

hello , I have a requirement with input in below format abc 123 xyz bcd 365 kii abc 987 876 cdf 987 uii abc 456 yuu bcd 654 rrr Expecting Output abc 456 yuu bcd 654 rrr cdf 987 uii

9. Shell Programming and Scripting

Remove duplicates

Hi I have a below file structure. 200,1245,E1,1,E1,,7611068,KWH,30, ,,,,,,,, 200,1245,E1,1,E1,,7611070,KWH,30, ,,,,,,,, 300,20140223,0.001,0.001,0.001,0.001,0.001 300,20140224,0.001,0.001,0.001,0.001,0.001 300,20140225,0.001,0.001,0.001,0.001,0.001 300,20140226,0.001,0.001,0.001,0.001,0.001...

10. Shell Programming and Scripting

How to remove duplicates in C shell Array?

Please help me on this My script name is uniqueArray.csh #!/bin/csh set ARRAY = ( one teo three one three ) set ARRAY = ( $ARRAY one five three five ) How to remove the duplicates in this array ,sort and save those in the same variable or different variable. Thanks in the advance ...

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Arranging an array so that duplicates will turn up first

Discussion started by: ashim

2. Shell Programming and Scripting

Remove duplicates

Discussion started by: forumthreads

3. Shell Programming and Scripting

bash - remove duplicates

Discussion started by: locoroco

4. Shell Programming and Scripting

Awk: Remove Duplicates

Discussion started by: siramitsharma

5. Shell Programming and Scripting

awk remove first duplicates

Discussion started by: sybadm

6. Shell Programming and Scripting

Help with merge and remove duplicates

Discussion started by: roy121

7. Shell Programming and Scripting

Remove duplicates

Discussion started by: dtdt

8. Shell Programming and Scripting

Remove top 3 duplicates

Discussion started by: Tomlight

9. Shell Programming and Scripting

Remove duplicates

Discussion started by: tejashavele

10. Shell Programming and Scripting

How to remove duplicates in C shell Array?

Discussion started by: SA_Palani