The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
.
google unix.com



UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
count no of words in a line satish@123 Shell Programming and Scripting 7 05-21-2008 02:59 AM
Need to identify the line containing @ in between the line of a file b.paramanatti UNIX for Dummies Questions & Answers 4 11-04-2007 10:50 PM
seperating the words from a line?? skyineyes Shell Programming and Scripting 3 06-26-2007 09:00 AM
removing line and duplicate line ocelot UNIX for Dummies Questions & Answers 11 01-30-2007 12:44 PM
Duplicate words zulander UNIX for Dummies Questions & Answers 1 04-01-2001 03:11 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 04-27-2007
srinivasan_85 srinivasan_85 is offline
Registered User
  
 

Join Date: Jan 2007
Posts: 28
Exclamation Identify duplicate words in a line using command

Hi,
Let me explain the problem clearly:
Let the entries in my file be:
Code:
lion,tiger,bear
apple,mango,orange,apple,grape
unix,windows,solaris,windows,linux
red,blue,green,yellow
orange,maroon,pink,violet,orange,pink
Can we detect the lines in which one of the words(separated by field separator) occurs more than once, using a command (or command pipe)?
In this case, the command should detect the lines 2,3,5.

I accomplished it using a perl script (cited below), although i wonder whether this could be done through a command (the difficulty is that the no. of columns is not constant).

Perl program that I used:
Code:
$fname=<STDIN>;
chomp $fname;
open(file,"<$fname");
$found_dups=0;

for $line(<file>)
{
  chomp $line;
  @arr=split(/,/,$line);
  for($i=1;$i<=$#arr;$i++)
  {
     for($j=$i+1;$j<=$#arr;$j++)
     {
        if($arr[$i] eq $arr[$j])
        {
           print "tid $arr[0]\n";
           $found_dups++;
        }
     }
  }
}
print "Found $found_dups duplicates\n";
Thanks,
Srini
  #2 (permalink)  
Old 04-27-2007
ghostdog74 ghostdog74 is offline Forum Advisor  
Registered User
  
 

Join Date: Sep 2006
Posts: 2,422
If you have Python, here's a neater alternative:
Code:
#!/usr/bin/python
for line in open("file"):
    line = line.strip().split(",")
    if len(line) == len(set(line)):
        print "No change"
    else:
        print ','.join(line)
output:
Code:
# ./test.py
No change
apple,mango,orange,apple,grape
unix,windows,solaris,windows,linux
No change
orange,maroon,pink,violet,orange,pink
  #3 (permalink)  
Old 04-27-2007
awk awk is offline
Registered User
  
 

Join Date: Feb 2007
Posts: 134
awk -F, '{
for (I=1;I<NF;I++)
{
for (J=I+1;J<=NF;J++)
{
if ($I == $J ) { print $I": " $0 }
}
}
}' << ENDOFFILE
lion,tiger,bear
apple,mango,orange,apple,grape
unix,windows,solaris,windows,linux
red,blue,green,yellow
orange,maroon,pink,violet,orange,pink
ENDOFFILE
apple: apple,mango,orange,apple,grape
windows: unix,windows,solaris,windows,linux
orange: orange,maroon,pink,violet,orange,pink
pink: orange,maroon,pink,violet,orange,pink
  #4 (permalink)  
Old 04-27-2007
srinivasan_85 srinivasan_85 is offline
Registered User
  
 

Join Date: Jan 2007
Posts: 28
Unhappy

Hi,
Thanx for the suggestions. I understand that the job can be done by different variations of scripts, but what I am eager about is "a single command/command pipe" which can do the job. If there are only specific number of entries in each line, i can manually compare them in command-line using awk/perl. But since I dont know the no. of entries in each line, the task is cumbersome.
I would be enlightened if I get a command pipe version of these scripts.

Thanks
Srini
  #5 (permalink)  
Old 04-30-2007
kahuna's Avatar
kahuna kahuna is offline
Registered User
  
 

Join Date: Apr 2007
Posts: 149
Srini, I'm not sure I understand your reluctance to use the scripts posted. Having said that, you could try the script below. It is not very efficient but is short.
Code:
perl -nle 'print if /(^|,)([^,]+)(,|,.*,)\2(,|$)/;' <file
  #6 (permalink)  
Old 04-30-2007
matrixmadhan matrixmadhan is offline Forum Advisor  
Technorati Master
  
 

Join Date: Mar 2005
Location: leaf node in B+ tree
Posts: 2,930
Again with perl,
but much simpler

Code:
#! /opt/third-party/bin/perl

open(FILE, "<", "file") || die "Unable to open file <$!> \n";

while(chomp($var=<FILE>)) {
  @arr = split(/,/, $var);
  foreach(@arr) {
    if( exists $fileHash{$_} ) {
      print $var . "\n";
      last;
    }
    else {
      $fileHash{$_} = $i++;
    }
  }
  %fileHash = ();
}

close(FILE);

exit 0
  #7 (permalink)  
Old 04-30-2007
Ygor's Avatar
Ygor Ygor is offline Forum Staff  
Moderator
  
 

Join Date: Oct 2003
Location: -31.96,115.84
Posts: 1,402
Try...
Code:
$ grep -En '(^|,)([^,]+).*,\2($|,)' file
2:apple,mango,orange,apple,grape
3:unix,windows,solaris,windows,linux
5:orange,maroon,pink,violet,orange,pink
Sponsored Links
Closed Thread

Bookmarks

Tags
linux

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 04:40 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language translation by Google.
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0