Identify duplicate words in a line using command


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Identify duplicate words in a line using command
# 1  
Old 04-27-2007
Error Identify duplicate words in a line using command

Hi,
Let me explain the problem clearly:
Let the entries in my file be:
Code:
lion,tiger,bear
apple,mango,orange,apple,grape
unix,windows,solaris,windows,linux
red,blue,green,yellow
orange,maroon,pink,violet,orange,pink

Can we detect the lines in which one of the words(separated by field separator) occurs more than once, using a command (or command pipe)?
In this case, the command should detect the lines 2,3,5.

I accomplished it using a perl script (cited below), although i wonder whether this could be done through a command (the difficulty is that the no. of columns is not constant).

Perl program that I used:
Code:
$fname=<STDIN>;
chomp $fname;
open(file,"<$fname");
$found_dups=0;

for $line(<file>)
{
  chomp $line;
  @arr=split(/,/,$line);
  for($i=1;$i<=$#arr;$i++)
  {
     for($j=$i+1;$j<=$#arr;$j++)
     {
        if($arr[$i] eq $arr[$j])
        {
           print "tid $arr[0]\n";
           $found_dups++;
        }
     }
  }
}
print "Found $found_dups duplicates\n";

Thanks,
Srini
# 2  
Old 04-27-2007
If you have Python, here's a neater alternative:
Code:
#!/usr/bin/python
for line in open("file"):
    line = line.strip().split(",")
    if len(line) == len(set(line)):
        print "No change"
    else:
        print ','.join(line)

output:
Code:
# ./test.py
No change
apple,mango,orange,apple,grape
unix,windows,solaris,windows,linux
No change
orange,maroon,pink,violet,orange,pink

# 3  
Old 04-27-2007
awk -F, '{
for (I=1;I<NF;I++)
{
for (J=I+1;J<=NF;J++)
{
if ($I == $J ) { print $I": " $0 }
}
}
}' << ENDOFFILE
lion,tiger,bear
apple,mango,orange,apple,grape
unix,windows,solaris,windows,linux
red,blue,green,yellow
orange,maroon,pink,violet,orange,pink
ENDOFFILE
apple: apple,mango,orange,apple,grape
windows: unix,windows,solaris,windows,linux
orange: orange,maroon,pink,violet,orange,pink
pink: orange,maroon,pink,violet,orange,pink
# 4  
Old 04-27-2007
Data

Hi,
Thanx for the suggestions. I understand that the job can be done by different variations of scripts, but what I am eager about is "a single command/command pipe" which can do the job. If there are only specific number of entries in each line, i can manually compare them in command-line using awk/perl. But since I dont know the no. of entries in each line, the task is cumbersome. Smilie
I would be enlightened if I get a command pipe version of these scripts.

Thanks
Srini
# 5  
Old 04-30-2007
Srini, I'm not sure I understand your reluctance to use the scripts posted. Having said that, you could try the script below. It is not very efficient but is short.
Code:
perl -nle 'print if /(^|,)([^,]+)(,|,.*,)\2(,|$)/;' <file

# 6  
Old 04-30-2007
Again with perl,
but much simpler Smilie

Code:
#! /opt/third-party/bin/perl

open(FILE, "<", "file") || die "Unable to open file <$!> \n";

while(chomp($var=<FILE>)) {
  @arr = split(/,/, $var);
  foreach(@arr) {
    if( exists $fileHash{$_} ) {
      print $var . "\n";
      last;
    }
    else {
      $fileHash{$_} = $i++;
    }
  }
  %fileHash = ();
}

close(FILE);

exit 0

# 7  
Old 05-01-2007
Try...
Code:
$ grep -En '(^|,)([^,]+).*,\2($|,)' file
2:apple,mango,orange,apple,grape
3:unix,windows,solaris,windows,linux
5:orange,maroon,pink,violet,orange,pink

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Regex to identify unique words in a dictionary database

Hello, I have a dictionary which I am building for the Open Source Community. The data structure is as under HEADWORD=PARTOFSPEECH=ENGLISH MEANING as shown in the example below अ=m=Prefix signifying negation. अँहँ=ind=Interjection expressing disapprobation. अं=int=An interjection... (2 Replies)
Discussion started by: gimley
2 Replies

2. Shell Programming and Scripting

Command line: add text wrapper around words

I am trying to build a sinkhole for BIND. I created a master zone file for malicious domains and created a separate conf file, but I am stuck. I have a list of known bd domains that is updated nightly. The file simply contains the list of domains, one on each line: Bad.com Bad2.com... (4 Replies)
Discussion started by: uuallan
4 Replies

3. Shell Programming and Scripting

Scripting help to identify words count in lines

Hi everybody, i have this biological situation to fix: > Id.1 ACGTACANNNNNNNNNNNACGTGCNNNNNNNACTGTGGT >Id.2 ACGGGT >Id.3 ACGTNNNNNNNNNNNNACTGGGGG >Id.4 ACGTGCGNNNNNNNNGGTCANNNNNNNNCGTGCAAANNNNN ........ .... These are nucleotidic sequences with some "NNNN..." always of the same... (4 Replies)
Discussion started by: Giorgio C
4 Replies

4. UNIX for Dummies Questions & Answers

help to identify duplicate columns adjacent value

Hi friends, I have a xlsheet like below first column having id ABCfollowed by 7digit numbers and the next column have title against the ids. Titles are unique and duplicateboth, but ids are unique even for duplicate title.Now I need to identify those duplicate title having the highest id for... (9 Replies)
Discussion started by: umapearl
9 Replies

5. Shell Programming and Scripting

how to identify duplicate columns in a row

Hi, How to identify duplicate columns in a row? Input data: may have 30 columns 9211480750 LK 120070417 920091030 9211480893 AZ 120070607 9205323621 O7 120090914 120090914 1420090914 2020090914 2020090914 9211479568 AZ 120070327 320090730 9211479571 MM 120070326 9211480892 MM 120070324... (3 Replies)
Discussion started by: suresh3566
3 Replies

6. Shell Programming and Scripting

How to set mutliple words variable from command line

I'm writing a script (C shell) to search for a pattern in file. For example scriptname pattern file1 file2 filenN I use for loop to loop through arguments argv, and it does the job if all arguments are supplied. However if only one argument is supplied (in that case pattern ) it should ask to... (5 Replies)
Discussion started by: patryk44
5 Replies

7. Shell Programming and Scripting

alias two words command line

Hello, i would like to alias aptitude install for sudo aptitude install, is it possible, and how ? i read the man alias page, but i think i have to use something with \ or { but i don't know exactly what. (3 Replies)
Discussion started by: harlock59
3 Replies

8. Shell Programming and Scripting

remove duplicate words in a line

Hi, Please help! I have a file having duplicate words in some line and I want to remove the duplicate words. The order of the words in the output file doesn't matter. INPUT_FILE pink_kite red_pen ball pink_kite ball yellow_flower white no white no cloud nine_pen pink cloud pink nine_pen... (6 Replies)
Discussion started by: sam_2921
6 Replies

9. UNIX for Dummies Questions & Answers

how to extend words on a command line ?

within a unix window, how do you setup your session to extend a word, by hitting the "esc" key twice. e.g. ls -la scri (esc key, esc key) thankyou (6 Replies)
Discussion started by: venhart
6 Replies

10. UNIX for Dummies Questions & Answers

overlapping words on command line

i tried resize command , but it's not working...... (4 Replies)
Discussion started by: gaurav123
4 Replies
Login or Register to Ask a Question