Identify duplicate words in a line using command

04-27-2007

Registered User

28, 0

Join Date: Jan 2007

Last Activity: 12 May 2010, 12:11 AM EDT

Posts: 28

Thanks Given: 0

Thanked 0 Times in 0 Posts

Identify duplicate words in a line using command

Hi,
Let me explain the problem clearly:
Let the entries in my file be:

Code:

lion,tiger,bear
apple,mango,orange,apple,grape
unix,windows,solaris,windows,linux
red,blue,green,yellow
orange,maroon,pink,violet,orange,pink

Can we detect the lines in which one of the words(separated by field separator) occurs more than once, using a command (or command pipe)?
In this case, the command should detect the lines 2,3,5.

I accomplished it using a perl script (cited below), although i wonder whether this could be done through a command (the difficulty is that the no. of columns is not constant).

Perl program that I used:

Code:

$fname=<STDIN>;
chomp $fname;
open(file,"<$fname");
$found_dups=0;

for $line(<file>)
{
  chomp $line;
  @arr=split(/,/,$line);
  for($i=1;$i<=$#arr;$i++)
  {
     for($j=$i+1;$j<=$#arr;$j++)
     {
        if($arr[$i] eq $arr[$j])
        {
           print "tid $arr[0]\n";
           $found_dups++;
        }
     }
  }
}
print "Found $found_dups duplicates\n";

Thanks,
Srini

srinivasan_85

View Public Profile for srinivasan_85

Find all posts by srinivasan_85

04-27-2007

Registered User

2,669, 20

Join Date: Sep 2006

Last Activity: 28 January 2015, 8:30 AM EST

Posts: 2,669

Thanks Given: 0

Thanked 20 Times in 20 Posts

If you have Python, here's a neater alternative:

Code:

#!/usr/bin/python
for line in open("file"):
    line = line.strip().split(",")
    if len(line) == len(set(line)):
        print "No change"
    else:
        print ','.join(line)

output:

Code:

# ./test.py
No change
apple,mango,orange,apple,grape
unix,windows,solaris,windows,linux
No change
orange,maroon,pink,violet,orange,pink

ghostdog74

View Public Profile for ghostdog74

Find all posts by ghostdog74

04-27-2007

Registered User

135, 0

Join Date: Feb 2007

Last Activity: 29 May 2012, 11:56 AM EDT

Posts: 135

Thanks Given: 0

Thanked 0 Times in 0 Posts

awk -F, '{
for (I=1;I<NF;I++)
{
for (J=I+1;J<=NF;J++)
{
if ($I == $J ) { print $I": " $0 }
}
}
}' << ENDOFFILE
lion,tiger,bear
apple,mango,orange,apple,grape
unix,windows,solaris,windows,linux
red,blue,green,yellow
orange,maroon,pink,violet,orange,pink
ENDOFFILE
apple: apple,mango,orange,apple,grape
windows: unix,windows,solaris,windows,linux
orange: orange,maroon,pink,violet,orange,pink
pink: orange,maroon,pink,violet,orange,pink

awk

View Public Profile for awk

Find all posts by awk

04-27-2007

Registered User

28, 0

Join Date: Jan 2007

Last Activity: 12 May 2010, 12:11 AM EDT

Posts: 28

Thanks Given: 0

Thanked 0 Times in 0 Posts

Hi,
Thanx for the suggestions. I understand that the job can be done by different variations of scripts, but what I am eager about is "a single command/command pipe" which can do the job. If there are only specific number of entries in each line, i can manually compare them in command-line using awk/perl. But since I dont know the no. of entries in each line, the task is cumbersome.

I would be enlightened if I get a command pipe version of these scripts.

Thanks
Srini

srinivasan_85

View Public Profile for srinivasan_85

Find all posts by srinivasan_85

04-30-2007

Registered User

149, 1

Join Date: Apr 2007

Last Activity: 12 October 2014, 12:11 PM EDT

Posts: 149

Thanks Given: 0

Thanked 1 Time in 1 Post

Srini, I'm not sure I understand your reluctance to use the scripts posted. Having said that, you could try the script below. It is not very efficient but is short.

Code:

perl -nle 'print if /(^|,)([^,]+)(,|,.*,)\2(,|$)/;' <file

kahuna

View Public Profile for kahuna

Find all posts by kahuna

04-30-2007

Registered User

3,216, 33

Join Date: Mar 2005

Last Activity: 4 September 2020, 7:11 AM EDT

Location: classification algos

Posts: 3,216

Thanks Given: 19

Thanked 33 Times in 30 Posts

Again with perl,
but much simpler

Code:

#! /opt/third-party/bin/perl

open(FILE, "<", "file") || die "Unable to open file <$!> \n";

while(chomp($var=<FILE>)) {
  @arr = split(/,/, $var);
  foreach(@arr) {
    if( exists $fileHash{$_} ) {
      print $var . "\n";
      last;
    }
    else {
      $fileHash{$_} = $i++;
    }
  }
  %fileHash = ();
}

close(FILE);

exit 0

matrixmadhan

View Public Profile for matrixmadhan

Find all posts by matrixmadhan

05-01-2007

Registered User

1,801, 116

Join Date: Oct 2003

Last Activity: 15 May 2015, 11:55 AM EDT

Location: 54.23, -4.53

Posts: 1,801

Thanks Given: 1

Thanked 116 Times in 101 Posts

Try...

Code:

$ grep -En '(^|,)([^,]+).*,\2($|,)' file
2:apple,mango,orange,apple,grape
3:unix,windows,solaris,windows,linux
5:orange,maroon,pink,violet,orange,pink

Ygor

View Public Profile for Ygor

Find all posts by Ygor

UNIX for Dummies Questions & Answers

Identify duplicate words in a line using command

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Regex to identify unique words in a dictionary database

Discussion started by: gimley

2. Shell Programming and Scripting

Command line: add text wrapper around words

Discussion started by: uuallan

3. Shell Programming and Scripting

Scripting help to identify words count in lines

Discussion started by: Giorgio C

4. UNIX for Dummies Questions & Answers

help to identify duplicate columns adjacent value

Discussion started by: umapearl

5. Shell Programming and Scripting

how to identify duplicate columns in a row

Discussion started by: suresh3566

6. Shell Programming and Scripting

How to set mutliple words variable from command line

Discussion started by: patryk44

7. Shell Programming and Scripting

alias two words command line

Discussion started by: harlock59

8. Shell Programming and Scripting

remove duplicate words in a line

Discussion started by: sam_2921

9. UNIX for Dummies Questions & Answers

how to extend words on a command line ?

Discussion started by: venhart

10. UNIX for Dummies Questions & Answers

overlapping words on command line

Discussion started by: gaurav123