Unique Field


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Unique Field
# 1  
Old 01-17-2012
Unique Field

I have this input file

Code:
 
tilenet_test:clar_r5_performance:server_2:4.80762:0%:APM00083103999-009E,APM00083103999-009F
tilenet_int:clar_r5_performance:server_2:4.80762:0%:APM00083103999-00C4
tilenet_prod1_S1:clar_r5_performance:server_2:147.711:0%:FNM00085201122-00AA,FNM00085201122-0234,FNM00085204455-0131
tilenet_prod2_S1:clar_r5_performance:server_2:49.2373:0%:FNM00085201122-00AA,FNM00085204455-0131,FNM00085204455-0133,FNM00085207788-0131

How do i get only unique instances from the last field

like this


Code:
 
tilenet_test:clar_r5_performance:server_2:4.80762:0%:APM00083103999
tilenet_int:clar_r5_performance:server_2:4.80762:0%:APM00083103999
tilenet_prod1_S1:clar_r5_performance:server_2:147.711:0%:FNM00085201122,FNM00085204455
tilenet_prod2_S1:clar_r5_performance:server_2:49.2373:0%:FNM00085201122,FNM00085204455,FNM00085207788

all the parts after "-" can be ignored

THanks
# 2  
Old 01-17-2012
Hi, see if this works:
Code:
awk -F: '{ m=split($NF,T,",")
           for(i=0;i<=m;i++){
             sub(/-.*/,x,T[i])
             if (!(T[i] in A)) s=s (s?",":x) T[i]
             A[T[i]]
           }
           $NF=s
           s=x
           for(i in A) delete A[i]
          }1'  OFS=: infile

I used the array A to test for uniqueness
This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 01-17-2012
Something similar...
Code:
awk -F: '{l=split($NF,a,"[-,]");NF-=1;printf $0":";
        for(i=1;i<=l;i+=2){ if(++v[a[i]]==1){printf a[i]; if(i!=l-1)printf"," }}
        printf "\n";delete v }'

--ahamed
This User Gave Thanks to ahamed101 For This Post:
# 4  
Old 01-18-2012
As a note:
Only gawk and mawk have an extension where delete operates on an entire array. The POSIX standard for awk necessitates the use of a loop like for(i in A) delete A[i] to delete the array elements one by one, in order to be compatible with all the other compliant awks out there.
This User Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Awk: count unique elements in a field and sum their occurence across the entire file

Hi, Sure it's an easy one, but it drives me insane. input ("|" separated): 1|A,B,C,A 2|A,D,D 3|A,B,B I would like to count the occurence of each capital letters in $2 across the entire file, knowing that duplicates in each record count as 1. I am trying to get this output... (5 Replies)
Discussion started by: beca123456
5 Replies

2. UNIX for Beginners Questions & Answers

Print lines based upon unique values in Nth field

For some reason I am having difficulty performing what should be a fairly easy task. I would like to print lines of a file that have a unique value in the first field. For example, I have a large data-set with the following excerpt: PS003,001 MZMWR/ L-DWD// * PS003,001... (4 Replies)
Discussion started by: jvoot
4 Replies

3. Shell Programming and Scripting

awk to print unique text in field before hyphen

Trying to print the unique values in $2 before the -, currently the count is displayed. Hopefully, the below is close. Thank you :). file chr2:46603668-46603902 EPAS1-902|gc=54.3 253.1 chr2:211471445-211471675 CPS1-1205|gc=48.3 264.7 chr19:15291762-15291983 NOTCH3-1003|gc=68.8 195.8... (3 Replies)
Discussion started by: cmccabe
3 Replies

4. Shell Programming and Scripting

Count of unique lines in field 4

When I use the below awk to count the unique lines in $4 for the input it seems to work. The answer is 3 because $4 is only unique 3 times in all the entries. However, when I use the same on actual data I get 56,536 and I know the answer should be 56,548. My question is there a better way to... (8 Replies)
Discussion started by: cmccabe
8 Replies

5. Shell Programming and Scripting

awk to print unique text in field

I am trying to use awk to print the unique entries in $2 So in the example below there are 3 lines but 2 of the lines match in $2 so only one is used in the output. File.txt chr17:29667512-29667673 NF1:exon.1;NF1:exon.2;NF1:exon.38;NF1:exon.4;NF1:exon.46;NF1:exon.47 703.807... (5 Replies)
Discussion started by: cmccabe
5 Replies

6. Shell Programming and Scripting

Change unique file names into new unique filenames

I have 84 files with the following names splitseqs.1, spliseqs.2 etc. and I want to change the .number to a unique filename. E.g. change splitseqs.1 into splitseqs.7114_1#24 and change spliseqs.2 into splitseqs.7067_2#4 So all the current file names are unique, so are the new file names.... (1 Reply)
Discussion started by: avonm
1 Replies

7. Shell Programming and Scripting

Compare Tab Separated Field with AWK to all and print lines of unique fields.

Hi. I have a tab separated file that has a couple nearly identical lines. When doing: sort file | uniq > file.new It passes through the nearly identical lines because, well, they still are unique. a) I want to look only at field x for uniqueness and if the content in field x is the... (1 Reply)
Discussion started by: rocket_dog
1 Replies

8. Shell Programming and Scripting

get part of file with unique & non-unique string

I have an archive file that holds a batch of statements. I would like to be able to extract a certain statement based on the unique customer # (ie. 123456). The end for each statement is noted by "ENDSTM". I can find the line number for the beginning of the statement section with sed. ... (5 Replies)
Discussion started by: andrewsc
5 Replies

9. Shell Programming and Scripting

Perl sort unique by one field only

Hi all, I've searched the forum and I can find some code to sort uniquely in perl but not by a single field. I have a file with data such as the following: 1,test,34 1,test2,65 2,test,35, 1,test3,34 2,test,34 What i want to do is sort it uniqely by the first field only so I'd end... (2 Replies)
Discussion started by: Donkey25
2 Replies

10. Shell Programming and Scripting

Finding unique reocrds at a particular field

I have a pipe delimited flat file. I want to grep the records that are unique in the 4th field and repeat only once in the file for e.g.. if the file contains this 3 records i want to get the o/p as: I just gave a sample here and the file is huge one and i cant just grep from the... (7 Replies)
Discussion started by: dsravan
7 Replies
Login or Register to Ask a Question