Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
Search Forums:



UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 04-27-2007
Registered User
 

Join Date: Jan 2007
Posts: 28
Thanks: 0
Thanked 0 Times in 0 Posts
Exclamation Identify duplicate words in a line using command

Hi,
Let me explain the problem clearly:
Let the entries in my file be:

Code:
lion,tiger,bear
apple,mango,orange,apple,grape
unix,windows,solaris,windows,linux
red,blue,green,yellow
orange,maroon,pink,violet,orange,pink

Can we detect the lines in which one of the words(separated by field separator) occurs more than once, using a command (or command pipe)?
In this case, the command should detect the lines 2,3,5.

I accomplished it using a perl script (cited below), although i wonder whether this could be done through a command (the difficulty is that the no. of columns is not constant).

Perl program that I used:

Code:
$fname=<STDIN>;
chomp $fname;
open(file,"<$fname");
$found_dups=0;

for $line(<file>)
{
  chomp $line;
  @arr=split(/,/,$line);
  for($i=1;$i<=$#arr;$i++)
  {
     for($j=$i+1;$j<=$#arr;$j++)
     {
        if($arr[$i] eq $arr[$j])
        {
           print "tid $arr[0]\n";
           $found_dups++;
        }
     }
  }
}
print "Found $found_dups duplicates\n";

Thanks,
Srini
Sponsored Links
    #2  
Old 04-27-2007
Registered User
 

Join Date: Sep 2006
Posts: 2,651
Thanks: 0
Thanked 14 Times in 14 Posts
If you have Python, here's a neater alternative:

Code:
#!/usr/bin/python
for line in open("file"):
    line = line.strip().split(",")
    if len(line) == len(set(line)):
        print "No change"
    else:
        print ','.join(line)

output:

Code:
# ./test.py
No change
apple,mango,orange,apple,grape
unix,windows,solaris,windows,linux
No change
orange,maroon,pink,violet,orange,pink

Sponsored Links
    #3  
Old 04-27-2007
awk awk is offline
Registered User
 

Join Date: Feb 2007
Posts: 135
Thanks: 0
Thanked 0 Times in 0 Posts
awk -F, '{
for (I=1;I<NF;I++)
{
for (J=I+1;J<=NF;J++)
{
if ($I == $J ) { print $I": " $0 }
}
}
}' << ENDOFFILE
lion,tiger,bear
apple,mango,orange,apple,grape
unix,windows,solaris,windows,linux
red,blue,green,yellow
orange,maroon,pink,violet,orange,pink
ENDOFFILE
apple: apple,mango,orange,apple,grape
windows: unix,windows,solaris,windows,linux
orange: orange,maroon,pink,violet,orange,pink
pink: orange,maroon,pink,violet,orange,pink
    #4  
Old 04-27-2007
Registered User
 

Join Date: Jan 2007
Posts: 28
Thanks: 0
Thanked 0 Times in 0 Posts
Unhappy

Hi,
Thanx for the suggestions. I understand that the job can be done by different variations of scripts, but what I am eager about is "a single command/command pipe" which can do the job. If there are only specific number of entries in each line, i can manually compare them in command-line using awk/perl. But since I dont know the no. of entries in each line, the task is cumbersome.
I would be enlightened if I get a command pipe version of these scripts.

Thanks
Srini
Sponsored Links
    #5  
Old 04-30-2007
kahuna's Avatar
Registered User
 

Join Date: Apr 2007
Posts: 149
Thanks: 0
Thanked 1 Time in 1 Post
Srini, I'm not sure I understand your reluctance to use the scripts posted. Having said that, you could try the script below. It is not very efficient but is short.

Code:
perl -nle 'print if /(^|,)([^,]+)(,|,.*,)\2(,|$)/;' <file

Sponsored Links
    #6  
Old 04-30-2007
Technorati Master
 

Join Date: Mar 2005
Location: classification algos
Posts: 3,174
Thanks: 17
Thanked 26 Times in 24 Posts
Again with perl,
but much simpler


Code:
#! /opt/third-party/bin/perl

open(FILE, "<", "file") || die "Unable to open file <$!> \n";

while(chomp($var=<FILE>)) {
  @arr = split(/,/, $var);
  foreach(@arr) {
    if( exists $fileHash{$_} ) {
      print $var . "\n";
      last;
    }
    else {
      $fileHash{$_} = $i++;
    }
  }
  %fileHash = ();
}

close(FILE);

exit 0

Sponsored Links
    #7  
Old 04-30-2007
Ygor's Avatar
Ygor Ygor is offline Forum Staff  
Moderator
 

Join Date: Oct 2003
Location: 54.23, -4.53
Posts: 1,694
Thanks: 1
Thanked 61 Times in 56 Posts
Try...
Code:
$ grep -En '(^|,)([^,]+).*,\2($|,)' file
2:apple,mango,orange,apple,grape
3:unix,windows,solaris,windows,linux
5:orange,maroon,pink,violet,orange,pink

Sponsored Links
Closed Thread

Tags
linux

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
count no of words in a line satish@123 Shell Programming and Scripting 7 05-21-2008 02:59 AM
Need to identify the line containing @ in between the line of a file b.paramanatti UNIX for Dummies Questions & Answers 4 11-04-2007 09:50 PM
seperating the words from a line?? skyineyes Shell Programming and Scripting 3 06-26-2007 09:00 AM
removing line and duplicate line ocelot UNIX for Dummies Questions & Answers 11 01-30-2007 11:44 AM
Duplicate words zulander UNIX for Dummies Questions & Answers 1 04-01-2001 03:11 AM



All times are GMT -4. The time now is 03:52 AM.