Visit The New, Modern Unix Linux Community


Print unique names in a specific column using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Print unique names in a specific column using awk
# 1  
Print unique names in a specific column using awk

Is it possible to modify file like this.

1. Remove all the duplicate names in a define column i.e 4th col
2. Count the no.of unique names separated by ";" and print as a 5th col

thanx in advance!!
Q

input

Code:
c1	30	3	Eh2
c10	96	3	Frp
c41	396	3	Ua5;Lop;Kol;Kol
c62	2	30	Fmp;Fmp;Fmp

output

Code:
c1	30	3	Eh2	1
c10	96	3	Frp	1
c41	396	3	Ua5;Lop;Kol	3
c62	2	30	Fmp	1

# 2  
Try
Code:
awk     '       {n=split ($4, T, ";")
                 for (i=n; i>=1; i--) {
                   for (j=i-1; j>=1; j--)
                     if (T[i]==T[j]) {n--; break}
                    }
                 $4 = T[1]
                 for (i=2; i<=n; i++) $4=$4 ";" T[i]
                 $5 = n
                }
         1
        ' OFS="\t" file
c1     30    3    Eh2    1
c10    96    3    Frp    1
c41    396   3    Ua5;Lop;Kol    3
c62    2    30    Fmp    1


Last edited by RudiC; 05-02-2013 at 05:53 AM.. Reason: missed OFS assignment; simplification
This User Gave Thanks to RudiC For This Post:
# 3  
I just noticed one of my 4th col has 300 names (most of them duplicates). The script is failing in this case.
# 4  
Quote:
Originally Posted by quincyjones
I just noticed one of my 4th col has 300 names (most of them duplicates). The script is failing in this case.
Are you sure its all in same line and not divided into two lines in your file?
# 5  
Yes I am sure.

Last edited by quincyjones; 05-02-2013 at 09:44 AM..
# 6  
There was glitch in logic..

modified it
Code:
 
 
awk     '       {n=split ($4, T, ";")
                 for (i=n; i>=1; i--) {
                   for (j=i-1; j>=1; j--)
                     if (T[i]==T[j]) {delete T[i]; break}
                    }
                 $4 = T[1]
                 for (i=2; i<=n; i++) {if(T[i]){ $4=$4 ";" T[i]}}
                 $5 = split($4,A,";")
                }
         1
        ' OFS="\t" filename

This User Gave Thanks to vidyadhar85 For This Post:

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #818
Difficulty: Medium
In CSS, E > F matches an F element child of an E element.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to print multiple specific column after a specific word?

Hello.... Pls help me (and sorry my english) :) So I have a file (test.txt) with 1 long line.... for example: isgc jsfh udgf osff 8462 error iwzr 653 idchisfb isfbisfb sihfjfeb isfhsi gcz eifh How to print after the "error" word the 2nd 4th 5th and 7th word?? output well be: 653 isfbisfb... (2 Replies)
Discussion started by: marvinandco
2 Replies

2. Shell Programming and Scripting

Count frequency of unique values in specific column

Hi, I have tab-deliminated data similar to the following: dot is-big 2 dot is-round 3 dot is-gray 4 cat is-big 3 hot in-summer 5 I want to count the frequency of each individual "unique" value in the 1st column. Thus, the desired output would be as follows: dot 3 cat 1 hot 1 is... (5 Replies)
Discussion started by: owwow14
5 Replies

3. Programming

Query to SELECT only Column Names that Contain a Specific String?

Hey Guys, I'm using SQuirreL SQL v3.5 GUI to fetch some data that I need for something I'm working on. I'm also using the IBM Informix Driver (*Version 3.5) to connect to the Database. What I want to do, if it's even possible, is to show all COLUMNS if they contain the word "Email". So in... (2 Replies)
Discussion started by: mrm5102
2 Replies

4. Shell Programming and Scripting

awk Print New Column For Every Two Lines and Match On Multiple Column Values to print another column

Hi, My input files is like this axis1 0 1 10 axis2 0 1 5 axis1 1 2 -4 axis2 2 3 -3 axis1 3 4 5 axis2 3 4 -1 axis1 4 5 -6 axis2 4 5 1 Now, these are my following tasks 1. Print a first column for every two rows that has the same value followed by a string. 2. Match on the... (3 Replies)
Discussion started by: jacobs.smith
3 Replies

5. Shell Programming and Scripting

Extract lines with unique value using a specific column

Hi there, I need a help with extracting data from tab delimited file which look like this #CHROM POS ID REF ALT Human Cow Dog Mouse Lizard chr2 3033 . G C 0/0 0/0 0/0 1/1 0/0 chr3 35040 . G T 0/0 0/0 ./. 1/1 0/1 chr4 60584 . T G 1/1 1/1 0/1 1/1 0/0 chr10 7147815 . G A 0/0 1/1 0/0 0/0... (9 Replies)
Discussion started by: houkto
9 Replies

6. Shell Programming and Scripting

Print unique names in each row of a specific column using awk

Is it possible to remove redundant names in the 4th column? input cqWE 100 200 singapore;singapore AZO 300 400 brazil;america;germany;ireland;germany .... .... output cqWE 100 200 singapore AZO 300 400 brazil;america;germany;ireland (4 Replies)
Discussion started by: quincyjones
4 Replies

7. UNIX for Dummies Questions & Answers

Delete rows with unique value for specific column

Hi all I have a file which looks like this 1234|1|Jon|some text|some text 1234|2|Jon|some text|some text 3453|5|Jon|some text|some text 6533|2|Kate|some text|some text 4567|3|Chris|some text|some text 4567|4|Maggie|some text|some text 8764|6|Maggie|some text|some text My third column is my... (9 Replies)
Discussion started by: A-V
9 Replies

8. Shell Programming and Scripting

print first few lines, then apply regex on a specific column to print results.

abc.dat tty cpu tin tout us sy wt id 0 0 7 3 19 71 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 133.2 0.0 682.9 0.0 1.0 0.0 7.2 0 79 c1t0d0 0.2 180.4 0.1 5471.2 3.0 2.8 16.4 15.6 15 52 aaaaaa1-xx I want to skip first 5 line... (4 Replies)
Discussion started by: kchinnam
4 Replies

9. Shell Programming and Scripting

Question about sort specific column and print other column at the same time !

Hi, This is my input file: ali 5 usa abc abu 4 uk bca alan 6 brazil bac pinky 10 utah sdc My desired output: pinky 10 utah sdc alan 6 brazil bac ali 5 usa abc abu 4 uk bca Based on the column two, I want to do the descending order and print out other related column at the... (3 Replies)
Discussion started by: patrick87
3 Replies

10. Shell Programming and Scripting

Print column names along with values from SQL

Hi, Can anyone tell me how to print the column name anong with the value from the table in shell script e.g #!/bin/ksh var=`sqlplus scott/tiger << -e set heading off feedback off select * from emp; quit; e` echo $var My output should be; ... (5 Replies)
Discussion started by: thana
5 Replies

Featured Tech Videos