Unix/Linux Go Back    


UNIX for Dummies Questions & Answers This forum is closed for new posts. Please post beginner questions to learn unix and learn linux in this forum UNIX for Beginners Questions & Answers

Identify duplicate words in a line using command

UNIX for Dummies Questions & Answers


Tags
linux

 
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 04-27-2007
srinivasan_85 srinivasan_85 is offline
Registered User
 
Join Date: Jan 2007
Last Activity: 12 May 2010, 12:11 AM EDT
Posts: 28
Thanks: 0
Thanked 0 Times in 0 Posts
Error Identify duplicate words in a line using command

Hi,
Let me explain the problem clearly:
Let the entries in my file be:

Code:
lion,tiger,bear
apple,mango,orange,apple,grape
unix,windows,solaris,windows,linux
red,blue,green,yellow
orange,maroon,pink,violet,orange,pink

Can we detect the lines in which one of the words(separated by field separator) occurs more than once, using a command (or command pipe)?
In this case, the command should detect the lines 2,3,5.

I accomplished it using a perl script (cited below), although i wonder whether this could be done through a command (the difficulty is that the no. of columns is not constant).

Perl program that I used:

Code:
$fname=<STDIN>;
chomp $fname;
open(file,"<$fname");
$found_dups=0;

for $line(<file>)
{
  chomp $line;
  @arr=split(/,/,$line);
  for($i=1;$i<=$#arr;$i++)
  {
     for($j=$i+1;$j<=$#arr;$j++)
     {
        if($arr[$i] eq $arr[$j])
        {
           print "tid $arr[0]\n";
           $found_dups++;
        }
     }
  }
}
print "Found $found_dups duplicates\n";

Thanks,
Srini
Sponsored Links
    #2  
Old Unix and Linux 04-27-2007
ghostdog74 ghostdog74 is offline
Registered User
 
Join Date: Sep 2006
Last Activity: 28 January 2015, 8:30 AM EST
Posts: 2,669
Thanks: 0
Thanked 18 Times in 18 Posts
If you have Python, here's a neater alternative:

Code:
#!/usr/bin/python
for line in open("file"):
    line = line.strip().split(",")
    if len(line) == len(set(line)):
        print "No change"
    else:
        print ','.join(line)

output:

Code:
# ./test.py
No change
apple,mango,orange,apple,grape
unix,windows,solaris,windows,linux
No change
orange,maroon,pink,violet,orange,pink

Sponsored Links
    #3  
Old Unix and Linux 04-27-2007
awk awk is offline
Registered User
 
Join Date: Feb 2007
Last Activity: 29 May 2012, 11:56 AM EDT
Posts: 135
Thanks: 0
Thanked 0 Times in 0 Posts
awk -F, '{
for (I=1;I<NF;I++)
{
for (J=I+1;J<=NF;J++)
{
if ($I == $J ) { print $I": " $0 }
}
}
}' << ENDOFFILE
lion,tiger,bear
apple,mango,orange,apple,grape
unix,windows,solaris,windows,linux
red,blue,green,yellow
orange,maroon,pink,violet,orange,pink
ENDOFFILE
apple: apple,mango,orange,apple,grape
windows: unix,windows,solaris,windows,linux
orange: orange,maroon,pink,violet,orange,pink
pink: orange,maroon,pink,violet,orange,pink
    #4  
Old Unix and Linux 04-27-2007
srinivasan_85 srinivasan_85 is offline
Registered User
 
Join Date: Jan 2007
Last Activity: 12 May 2010, 12:11 AM EDT
Posts: 28
Thanks: 0
Thanked 0 Times in 0 Posts
Data

Hi,
Thanx for the suggestions. I understand that the job can be done by different variations of scripts, but what I am eager about is "a single command/command pipe" which can do the job. If there are only specific number of entries in each line, i can manually compare them in command-line using awk/perl. But since I dont know the no. of entries in each line, the task is cumbersome. Linux
I would be enlightened if I get a command pipe version of these scripts.

Thanks
Srini
Sponsored Links
    #5  
Old Unix and Linux 04-30-2007
kahuna's Unix or Linux Image
kahuna kahuna is offline
Registered User
 
Join Date: Apr 2007
Last Activity: 12 October 2014, 12:11 PM EDT
Posts: 149
Thanks: 0
Thanked 1 Time in 1 Post
Srini, I'm not sure I understand your reluctance to use the scripts posted. Having said that, you could try the script below. It is not very efficient but is short.

Code:
perl -nle 'print if /(^|,)([^,]+)(,|,.*,)\2(,|$)/;' <file

Sponsored Links
    #6  
Old Unix and Linux 04-30-2007
matrixmadhan matrixmadhan is offline Forum Advisor  
Technorati Master
 
Join Date: Mar 2005
Last Activity: 10 January 2017, 8:10 AM EST
Location: classification algos
Posts: 3,215
Thanks: 19
Thanked 31 Times in 28 Posts
Again with perl,
but much simpler Linux


Code:
#! /opt/third-party/bin/perl

open(FILE, "<", "file") || die "Unable to open file <$!> \n";

while(chomp($var=<FILE>)) {
  @arr = split(/,/, $var);
  foreach(@arr) {
    if( exists $fileHash{$_} ) {
      print $var . "\n";
      last;
    }
    else {
      $fileHash{$_} = $i++;
    }
  }
  %fileHash = ();
}

close(FILE);

exit 0

Sponsored Links
    #7  
Old Unix and Linux 04-30-2007
Ygor's Unix or Linux Image
Ygor Ygor is offline Forum Advisor  
Advisor
 
Join Date: Oct 2003
Last Activity: 15 May 2015, 11:55 AM EDT
Location: 54.23, -4.53
Posts: 1,801
Thanks: 1
Thanked 114 Times in 99 Posts
Try...
Code:
$ grep -En '(^|,)([^,]+).*,\2($|,)' file
2:apple,mango,orange,apple,grape
3:unix,windows,solaris,windows,linux
5:orange,maroon,pink,violet,orange,pink

Sponsored Links
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
help to identify duplicate columns adjacent value umapearl UNIX for Dummies Questions & Answers 9 04-15-2011 12:59 AM
how to identify duplicate columns in a row suresh3566 Shell Programming and Scripting 3 11-16-2009 01:02 AM
remove duplicate words in a line sam_2921 Shell Programming and Scripting 6 03-19-2009 05:52 PM
how to extend words on a command line ? venhart UNIX for Dummies Questions & Answers 6 07-16-2008 10:22 AM
overlapping words on command line gaurav123 UNIX for Dummies Questions & Answers 4 07-01-2008 11:21 AM



All times are GMT -4. The time now is 08:08 AM.