![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| finding duplicates in columns and removing lines | totus | Shell Programming and Scripting | 17 | 11-29-2008 11:27 AM |
| Remove duplicates from File from specific location | gopikgunda | Shell Programming and Scripting | 1 | 04-09-2008 02:16 AM |
| Deleting specific columns from a file | premar | Shell Programming and Scripting | 11 | 02-14-2006 07:02 AM |
| merging few columns of two text files to a new file | kolvi | Shell Programming and Scripting | 4 | 09-15-2005 04:34 AM |
| Searching for text in files | GandalfWhite | Linux | 2 | 01-21-2004 01:26 PM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
searching text files on specific columns for duplicates
Is it possible to search through a large file full of rows and columns of text and retrieve only the rows that contain duplicates fields,
searchiing for duplicates on col4 & col6 Sample below Col1 col2 col3 col4 col5 col6 G405H SURG FERGUSON SG00308258 01/16/52 GGHB G405H ORTHO FERGUSON SG00308258 05/21/23 A&C G405H ENT HOUGHTON SG03102407 04/22/70 GGHB G405H ENT HOUGHTON SG00308258 10/08/60 GGHB G405H GYN TAGGART SG03132070 05/15/53 GGHB I would expect it the output to be G405H SURG FERGUSON SG00308258 01/16/52 GGHB G405H ENT HOUGHTON SG00308258 10/08/60 GGHB |
|
||||
|
input file = filename
Code:
G405H SURG FERGUSON SG00308258 01/16/52 GGHB G405H ORTHO FERGUSON SG00308258 05/21/23 A&C G405H ENT HOUGHTON SG03102407 04/22/70 GGHB G405H ENT HOUGHTON SG00308258 10/08/60 GGHB G405H GYN TAGGART SG03132070 05/15/53 GGHB Code:
G405H ENT HOUGHTON SG00308258 10/08/60 GGHB G405H SURG FERGUSON SG00308258 01/16/52 GGHB Code:
sort -k 4.1,4.10 -k 6.1,6.4 filename |
awk ' {
if (arr[ $4 $6 ])
{print arr[ $4 $6 ];print $0}
else { arr [$4 $6 ] = $0 }
}' filename | sort -u
Last edited by jim mcnamara; 08-17-2005 at 05:45 PM.. |
|
||||
|
sorting
Jim,
Once again ...a big thanks to you, Unfortunately for me though I use Data General - Unix and the commands don't seem to have the -k option, but I'll play around with it once I have figured out what parts of your code is doing what...the other thing is the real input file has many date columns and commas etc ... Which makes it a little more complicated cheers Last edited by Gerry405; 08-18-2005 at 11:12 AM.. |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|