Sponsored Content
Top Forums Shell Programming and Scripting remove duplicates based on single column Post 302525118 by Diya123 on Wednesday 25th of May 2011 07:11:03 PM
Old 05-25-2011
remove duplicates based on single column

Hello,

I am new to shell scripting. I have a huge file with multiple columns for example:

I have 5 columns below.

HWUSI-EAS000_29:1:105 + chr5 76654650 AATTGGAA HHHHG
HWUSI-EAS000_29:1:106 + chr5 76654650 AATTGGAA B@HYL
HWUSI-EAS000_29:1:108 + chr5 76654650 AATTGGAA CSmilieADH
HWUSI-EAS000_29:1:110 - chr6 86754325 GATCGTAA YYCHY

I want to remove duplicates based on column 4 (7664650). In the above case it should list me only row1 and row 4

Any help on this is greatly appreciated.

Thanks,

Diya
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How can i delete the duplicates based on one column of a line

I have my data something like this (08/03/2009 22:57:42.414)(:) king aaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbb (08/03/2009 22:57:42.416)(:) John cccccccccccc cccccvssssssssss baaaaa (08/03/2009 22:57:42.417)(:) Michael ddddddd tststststtststts (08/03/2009 22:57:42.425)(:) Ravi... (11 Replies)
Discussion started by: rdhanek
11 Replies

2. UNIX for Dummies Questions & Answers

Remove duplicates based on a column in fixed width file

Hi, How to output the duplicate record to another file. We say the record is duplicate based on a column whose position is from 2 and its length is 11 characters. The file is a fixed width file. ex of Record: DTYU12333567opert tjhi kkklTRG9012 The data in bold is the key on which... (1 Reply)
Discussion started by: Qwerty123
1 Replies

3. Shell Programming and Scripting

Remove duplicates based on the two key columns

Hi All, I needs to fetch unique records based on a keycolumn(ie., first column1) and also I needs to get the records which are having max value on column2 in sorted manner... and duplicates have to store in another output file. Input : Input.txt 1234,0,x 1234,1,y 5678,10,z 9999,10,k... (7 Replies)
Discussion started by: kmsekhar
7 Replies

4. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies

5. UNIX for Dummies Questions & Answers

Remove duplicate rows when >10 based on single column value

Hello, I'm trying to delete duplicates when there are more than 10 duplicates, based on the value of the first column. e.g. a 1 a 2 a 3 b 1 c 1 gives b 1 c 1 but requires 11 duplicates before it deletes. Thanks for the help Video tutorial on how to use code tags in The UNIX... (11 Replies)
Discussion started by: informaticist
11 Replies

6. UNIX for Dummies Questions & Answers

remove duplicates based on a field and criteria

Hi, I have a file with fields like below: A;XYZ;102345;222 B;XYZ;123243;333 C;ABC;234234;444 D;MNO;103345;222 E;DEF;124243;333 desired output: C;ABC;234234;444 D;MNO;103345;222 E;DEF;124243;333 ie, if the 4rth field is a duplicate.. i need only those records where... (5 Replies)
Discussion started by: wanderingmind16
5 Replies

7. Shell Programming and Scripting

Remove duplicates based on a field's value

Hi All, I have a text file with three columns. I would like a simple script that removes lines in which column 1 has duplicate entries, but use the largest value in column 3 to decide which one to keep. For example: Input file: 12345a rerere.rerere len=23 11111c fsdfdf.dfsdfdsf len=33 ... (3 Replies)
Discussion started by: anniecarv
3 Replies

8. Shell Programming and Scripting

Trying to remove duplicates based on field and row

I am trying to see if I can use awk to remove duplicates from a file. This is the file: -==> Listvol <== deleting /vol/eng_rmd_0941 deleting /vol/eng_rmd_0943 deleting /vol/eng_rmd_0943 deleting /vol/eng_rmd_1006 deleting /vol/eng_rmd_1012 rearrange /vol/eng_rmd_0943 ... (6 Replies)
Discussion started by: newbie2010
6 Replies

9. Shell Programming and Scripting

Remove duplicates according to their frequency in column

Hi all, I have huge a tab-delimited file with the following format and I want to remove the duplicates according to their frequency based on Column2 and Column3. Column1 Column2 Column3 Column4 Column5 Column6 Column7 1 user1 access1 word word 3 2 2 user2 access2 ... (10 Replies)
Discussion started by: corfuitl
10 Replies
COLUMN(1)						    BSD General Commands Manual 						 COLUMN(1)

NAME
column -- columnate lists SYNOPSIS
column [-tx] [-c columns] [-s sep] [file ...] DESCRIPTION
The column utility formats its input into multiple columns. Rows are filled before columns. Input is taken from file operands, or, by default, from the standard input. Empty lines are ignored. The options are as follows: -c Output is formatted for a display columns wide. -s Specify a set of characters to be used to delimit columns for the -t option. -t Determine the number of columns the input contains and create a table. Columns are delimited with whitespace, by default, or with the characters supplied using the -s option. Useful for pretty-printing displays. -x Fill columns before filling rows. column exits 0 on success, >0 if an error occurred. ENVIRONMENT
COLUMNS The environment variable COLUMNS is used to determine the size of the screen if no other information is available. EXAMPLES
(echo "PERM LINKS OWNER GROUP SIZE MONTH DAY HH:MM/YEAR NAME"; ls -l | sed 1d) | column -t SEE ALSO
colrm(1), ls(1), paste(1), sort(1) HISTORY
The column command appeared in 4.3BSD-Reno. BSD
March 9, 2008 BSD
All times are GMT -4. The time now is 08:04 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy