Search for string dublicates in column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Search for string dublicates in column
# 1  
Old 02-19-2008
Search for string dublicates in column

Hi

I have a file with one column. There are a few replicas in this column, that is some lines look exactly the same. I want to know the ones that occur twice.

Inputfile.xml
"AAH.dbEUR"
"ECT.dbEUR"
"AEGN.dbEUR"
"AAH.dbEUR"
"AKZO.dbEUR"
...

Here I would like to be informed that "AAH.dbEUR" is occuring twice.

Thanks
lulle
# 2  
Old 02-19-2008
Code:
sort filename|uniq -d

# 3  
Old 02-19-2008
Code:
awk '{ arr[$0]++}
       END{ for (i in arr) { if (arr[i]>1) {print i, arr[i]} } }' file

# 4  
Old 02-19-2008
Or (if I'm not missing something):

Code:
awk 'x[$0]++==1' filename

# 5  
Old 02-19-2008
Thanks alot. Works perfectly!
lulle
# 6  
Old 02-19-2008
Quote:
Originally Posted by radoulov
Or (if I'm not missing something):

Code:
awk 'x[$0]++==1' filename

Wow.... So the output is any line that appears more than once - but only printed once.

Can you explain what's going on here?
# 7  
Old 02-19-2008
awk arrays are associative - they hash aray indexes.
The syntax says: add one to the array element indexed zero.
But, since the ++ is after the arr[] it means evaluate the value of arr[] before you add one.

So - if arr[ $0 ] is one -- meaning it has been seen before - print $0 because it is a duplicate, then add one to arr[ $0 ]. Now: arr[ $0 ] == 2 so we never print it again no matter how many times it appears.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script to find string based on pattern and search for its corresponding rows in column

Experts, Need your support for this awk script. we have only one input file, all these column 1 and column 2 are in same file and have to do lookup for values in one file(column1 and column2) but output we need in another file Need to grep row whose string contains 9K from column 1. When found... (6 Replies)
Discussion started by: as7951
6 Replies

2. Shell Programming and Scripting

Search for string in column using variable: awk

I'm interested to match column pattern through awk using an external variable for data: -9 1:751343:T:A -9 0 T A 0.726 -5.408837e-03 9.576603e-03 7.967536e-01 5.722312e-01 -9 1:751756:T:C -9 0 T C 0.727 -5.360458e-03 9.579447e-03 7.966977e-01 5.757858e-01... (7 Replies)
Discussion started by: genome
7 Replies

3. Shell Programming and Scripting

Search string in multiple files and display column wise

I have 3 files. Each of those files have the same number of records, however certain records have different values. I would like to grep the field in ALL 3 files and display the output with only the differences in column wise and if possible line number File1 Name = Joe Age = 33... (3 Replies)
Discussion started by: sidnow
3 Replies

4. Shell Programming and Scripting

How to search and replace string from nth column from a file?

I wanted to search for a string and replace it with other string from nth column of a file which is comma seperated which I am able to do with below # For Comma seperated file without quotes awk 'BEGIN{OFS=FS=","}$"'"$ColumnNo"'"=="'"$PPK"'"{$"'"$ColumnNo"'"="'"$NPK"'"}{print}' ${FileName} ... (5 Replies)
Discussion started by: Amit Joshi
5 Replies

5. Shell Programming and Scripting

How to search and replace string in column in file with command sed?

how to search and replace string in column in file with command sed or other search "INC0000003.in" and replace column 4 = "W" $ cat file.txt INC0000001.in|20150120|Y|N|N INC0000002.in|20150120|Y|N|N INC0000003.in|20150120|Y|N|N INC0000004.in|20150120|Y|N|Noutput... (4 Replies)
Discussion started by: ppmanja3
4 Replies

6. UNIX for Advanced & Expert Users

Recursively search the string from a column in no. of files

i have a file named keyword.csv(contains around 8k records) which contains a no. of columns. The 5th column contains all the keywords. I want to recursively search these keywords in all .pl files(around 1k) and display the filename....Afterthat i will use the filename and some of the column from... (3 Replies)
Discussion started by: millan
3 Replies

7. Shell Programming and Scripting

Search several string and convert into a single line for each search string using awk command AIX?.

I need to search the file using strings "Request Type" , " Request Method" , "Response Type" and by using result set find the xml tags and convert into a single line?. below are the scenarios. Cat test Nov 10, 2012 5:17:53 AM INFO: Request Type Line 1.... (5 Replies)
Discussion started by: laknar
5 Replies

8. UNIX for Dummies Questions & Answers

Search and replace string only in a particular column in a delimited file

I have file with multiple columns. Column values for a record may be same. Now i have to replace a column value(this can be same for the other columns) with new value. File.txt A,B,C,D,A,B,C,D,A,B,C,D A,B,C,D,A,B,C,D,A,B,C,D A,B,C,D,A,B,C,D,A,B,C,D A,B,C,D,A,B,C,D,A,B,C,D... (1 Reply)
Discussion started by: ksailesh
1 Replies

9. Shell Programming and Scripting

Search in a column by a string

Hi All, My file looks like : hsdhj dsajhf jshdfajkh jksdhfj jkdhsfj shfjhd shdf hdsfjkh jsdfhj hdshf sdjh dhs foot dsjhfj jdshf dasfh jdsh dsjfh jdfshj david Now, I want to search entire column by a string... (10 Replies)
Discussion started by: naw_deepak
10 Replies

10. Shell Programming and Scripting

String search and return value from column

Dear All I had below mention file as my input file. 87980457 Jan 12 2008 2:00AM 1 60 BSC1 81164713 Jan 12 2008 3:00AM 1 60 BSC2 78084521 Jan 12 2008 4:00AM 1 60 BSC3 68385193... (3 Replies)
Discussion started by: jaydeep_sadaria
3 Replies
Login or Register to Ask a Question