The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
.
google unix.com



UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
removing duplicates based on key pukars4u Shell Programming and Scripting 1 05-21-2008 03:50 PM
removing duplicates from a file trichyselva UNIX for Dummies Questions & Answers 2 03-25-2008 10:49 AM
Sort, Uniq, Duplicates Amruta Pitkar Shell Programming and Scripting 5 05-17-2007 01:49 AM
Removing duplicates [sort , uniq] sharatz83 Shell Programming and Scripting 4 07-14-2006 05:12 PM
Removing duplicates giannicello Shell Programming and Scripting 12 09-14-2005 06:12 PM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 01-24-2008
orahi001 orahi001 is offline
Registered User
  
 

Join Date: Dec 2007
Posts: 48
removing duplicates and sort -k

Hello experts,

I am trying to remove all lines in a csv file where the 2nd columns is a duplicate. I am try to use sort with the key parameter


sort -u -k 2,2 File.csv > Output.csv


File.csv
File Name|Document Name|Document Title|Organization
Word Doc 1.doc|Word Document|Sample Doc|Org 1
Exl Doc 1.xls|Excel Sheet|Sample Sheet|Org 2
Pdf File 1.pdf|Pdf|Sample pdf|Org3
Exl Sheet 2.xls|Excel Sheet|Test Spreadsheet|Org 2



I want Output.csv to remove the 2nd Excell Sheet line
Output.csv
File Name|Document Name|Document Title|Organization
Word Doc 1.doc|Word Document|Sample Doc|Org 1
Exl Doc 1.xls|Excel Sheet|Sample Sheet|Org 2
Pdf File 1.pdf|Pdf|Sample pdf|Org3


I believe the -k option uses spaces to determine the start and end fields

My file seperator is a '|' so I want to remove the line with the duplicate Document Name (2nd column).

Can this be done using the -k option of sort or is there another way to perform this task?


thanks
  #2 (permalink)  
Old 01-24-2008
awk awk is offline
Registered User
  
 

Join Date: Feb 2007
Posts: 134
try and "man sort" to see what the options for sort are on you system.
  #3 (permalink)  
Old 01-25-2008
Yogesh Sawant's Avatar
Yogesh Sawant Yogesh Sawant is offline Forum Staff  
Part Time Moderator and Full Time Dad
  
 

Join Date: Sep 2006
Location: Rossem, Tazenda
Posts: 1,086
you need to specify the delimiter:
Code:
-t '|'
  #4 (permalink)  
Old 01-25-2008
orahi001 orahi001 is offline
Registered User
  
 

Join Date: Dec 2007
Posts: 48
The -t option worked great

I also got it working using nawk

nawk -F'|' '!a[$3]++'


thanks
Closed Thread

Bookmarks

Tags
nawk, sort

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 05:49 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0