The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
removing duplicates and sort -k orahi001 UNIX for Dummies Questions & Answers 3 01-25-2008 09:59 AM
Removing duplicates [sort , uniq] sharatz83 Shell Programming and Scripting 4 07-14-2006 05:12 PM
sort and uniq in perl reggiej Shell Programming and Scripting 4 05-18-2006 10:46 PM
Help with Last,uniq, sort and cut jay1228 UNIX for Dummies Questions & Answers 1 02-16-2005 01:33 AM
sort/uniq jimmyflip UNIX for Dummies Questions & Answers 3 10-17-2002 05:09 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 05-16-2007
Amruta Pitkar Amruta Pitkar is offline
Registered User
  
 

Join Date: Aug 2006
Posts: 54
Sort, Uniq, Duplicates

Input File is :
-------------
25060008,0040,03,
25136437,0030,03,
25069457,0040,02,
80303438,0014,03,1st
80321837,0009,03,1st
80321977,0009,03,1st
80341345,0007,03,1st
84176527,0047,03,1st
84176527,0047,03,
20000735,0018,03,1st
25060008,0040,03,

I am using the following in the script :
------------------------------------
cat InputFile | sort -t, -k1,2 | uniq -d > "Duplicates"

This gets 25060008,0040,03, into the Duplicates file.
But I also want 84176527,0047,03, in the Duplicates file.

Basically I want the script to sort on the first 2 fields (delimited by comma) and if duplicates are found for first 2 fields I want it to be written to "Duplicates" file.

Please guide.
  #2 (permalink)  
Old 05-16-2007
aigles's Avatar
aigles aigles is offline Forum Advisor  
Registered User
  
 

Join Date: Apr 2004
Location: Bordeaux, France
Posts: 1,423
Try that:
Code:
sort -t, -k1,2 InputFile | awk -F, '{ if ((key=$1 "," $2)==prv_key) print; prv_key=key}' > "Duplicates"
Jean-Pierre.
  #3 (permalink)  
Old 05-16-2007
matrixmadhan matrixmadhan is offline Forum Advisor  
Technorati Master
  
 

Join Date: Mar 2005
Location: leaf node in B+ tree
Posts: 2,953
25060008,0040,03,


this is the only line that is duplicate
  #4 (permalink)  
Old 05-16-2007
matrixmadhan matrixmadhan is offline Forum Advisor  
Technorati Master
  
 

Join Date: Mar 2005
Location: leaf node in B+ tree
Posts: 2,953
Quote:
This gets 25060008,0040,03, into the Duplicates file.
But I also want 84176527,0047,03, in the Duplicates file.

Basically I want the script to sort on the first 2 fields (delimited by comma) and if duplicates are found for first 2 fields I want it to be written to "Duplicates" file.

In the above sample of records only the third field is common '03'
and not the first or the second field.

How would you expect that to be termed as duplicates based on two fields ?
  #5 (permalink)  
Old 05-16-2007
Amruta Pitkar Amruta Pitkar is offline
Registered User
  
 

Join Date: Aug 2006
Posts: 54
Sort, Uniq, Duplicates

Hi MatrixMadhan,
Please look at the inputfile :
84176527,0047,03,1st
84176527,0047,03,
Is a duplicate record if I want to sort on 1st and 2nd field.

I sorted the issue with :
cat inputfile | sort -t -k1,2 -u > unq
cat inputfile | sort -t -k1,2 > non-unq
comm -23 non-unq unq > duplicates

MatrixMadhan, Jean-Pierre : Thanks.

Thanks.
  #6 (permalink)  
Old 05-17-2007
ghostdog74 ghostdog74 is offline Forum Advisor  
Registered User
  
 

Join Date: Sep 2006
Posts: 2,533
Code:
awk -F"," '{ line[$1.$2] = $0
             arr[$1.$2]++
           }
END{     for (i in arr) {
            if ( arr[i] > 1 ){
	       print line[i] > "duplicates"
	    }
	 } 
 }' file
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 01:58 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0