Sponsored Content
Top Forums UNIX for Dummies Questions & Answers remove duplicate lines based on two columns and judging from a third one Post 302555265 by TheTransporter on Wednesday 14th of September 2011 09:40:30 AM
Old 09-14-2011
remove duplicate lines based on two columns and judging from a third one

hello all,

I have an input file with four columns like this with a lot of lines

Quote:
2GOX03.output:Apol-Pol 10.64 -.79 (B)ALA3TRP
1R6Q20.output:Char-Pol 13.40 -.78 (B)ASP14SER
3SGB19.output:Char-Pol 13.40 -.58 (A)GLU177ATYR
2GOX13.output:Char-Pol 10.40 -.55 (B)ARG65GLN
2GOX14.output:Apol-Pol 10.40 -.55 (B)ALA3TRP
...
...
and for example, line 1 and line 5 match because the first 4 characters match and the fourth column matches too. I want to keep the line that has the lowest number in the third column. So I discard line 5. Is there a way to do this with awk for every possible match? note that in the file i might have more than two matches.

thanks
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove lines, Sorted with Time based columns using AWK & SORT

Hi having a file as follows MediaErr.log 84 Server1 Policy1 Schedule1 master1 05/08/2008 02:12:16 84 Server1 Policy1 Schedule1 master1 05/08/2008 02:22:47 84 Server1 Policy1 Schedule1 master1 05/08/2008 03:41:26 84 Server1 Policy1 ... (1 Reply)
Discussion started by: karthikn7974
1 Replies

2. UNIX for Dummies Questions & Answers

Duplicate columns and lines

Hi all, I have a tab-delimited file and want to remove identical lines, i.e. all of line 1,2,4 because the columns are the same as the columns in other lines. Any input is appreciated. abc gi4597 9997 cgcgtgcg $%^&*()()* abc gi4597 9997 cgcgtgcg $%^&*()()* ttt ... (1 Reply)
Discussion started by: dr_sabz
1 Replies

3. Shell Programming and Scripting

Remove duplicate columns in input file

hello, I have an input file which looks like this: 2 C:G 17 -0.14 8.75 33.35 3 G:C 16 -2.28 0.98 28.22 4 C:G 15 0.39 11.06 29.31 5 G:C 14 2.64 5.17 36.07 6 G:C 13 -0.65 2.05 21.94 7 C:G 11 138.96 21.64 14.40 9 C:G 27 -2.40 6.95 27.98 10 C:G 26 2.89 15.60 34.33 11 G:C... (7 Replies)
Discussion started by: linux_usr
7 Replies

4. Shell Programming and Scripting

Remove duplicate lines based on field and sort

I have a csv file that I would like to remove duplicate lines based on field 1 and sort. I don't care about any of the other fields but I still wanna keep there data intact. I was thinking I could do something like this but I have no idea how to print the full line with this. Please show any method... (8 Replies)
Discussion started by: cokedude
8 Replies

5. Shell Programming and Scripting

Remove duplicate based on Group

Hi, How can I remove duplicates from a file based on group on other column? for example: Test1|Test2|Test3|Test4|Test5 Test1|Test6|Test7|Test8|Test5 Test1|Test9|Test10|Test11|Test12 Test1|Test13|Test14|Test15|Test16 Test17|Test18|Test19|Test20|Test21 Test17|Test22|Test23|Test24|Test5 ... (2 Replies)
Discussion started by: yale_work
2 Replies

6. Shell Programming and Scripting

Remove Duplicate by considering multiple columns

hi friends, my input chr1 exon 35204 35266 gene_id "GOLGB1"; transcript_id "GOLGB1"; chr1 exon 42357 42473 gene_id "GOLGB1"; transcript_id "GOLGB1"; chr1 exon 45261 45404 gene_id "GOLGB1"; transcript_id "GOLGB1"; chr1 exon 50701 50778 gene_id "GOLGB1"; transcript_id "GOLGB1";... (2 Replies)
Discussion started by: jacobs.smith
2 Replies

7. Shell Programming and Scripting

Remove duplicate value based on two field $4 and $5

Hi All, i have input file like below... CA009156;20091003;M;AWBKCA72;123;;CANADIAN WESTERN BANK;EDMONTON;;2300, 10303, JASPER AVENUE;;T5J 3X6;; CA009156;20091003;M;AWBKCA72;321;;CANADIAN WESTERN BANK;EDMONTON;;2300, 10303, JASPER AVENUE;;T5J 3X6;; CA009156;20091003;M;AWBKCA72;231;;CANADIAN... (2 Replies)
Discussion started by: mohan sharma
2 Replies

8. Shell Programming and Scripting

How To Remove Duplicate Based on the Value?

Hi , Some time i got duplicated value in my files , bundle_identifier= B Sometext=ABC bundle_identifier= A bundle_unit=500 Sometext123=ABCD bundle_unit=400 i need to check if there is a duplicated values or not if yes , i need to check if the value is A or B when Bundle_Identified ,... (2 Replies)
Discussion started by: OTNA
2 Replies

9. Shell Programming and Scripting

Remove columns with duplicate entries

I have a 13gb file. It has the following columns: The 3rd column is basically correlation values. I want to delete those rows which are repeated between the columns: A B 0.04 B C 0.56 B B 1 A A 1 C D 1 C C 1 Desired Output: (preferably in a .csv format A,B,0.04 B,C,0.56 C,D,1... (3 Replies)
Discussion started by: Sanchari
3 Replies

10. Shell Programming and Scripting

Remove duplicate lines from file based on fields

Dear community, I have to remove duplicate lines from a file contains a very big ammount of rows (milions?) based on 1st and 3rd columns The data are like this: Region 23/11/2014 09:11:36 41752 Medio 23/11/2014 03:11:38 4132 Info 23/11/2014 05:11:09 4323... (2 Replies)
Discussion started by: Lord Spectre
2 Replies
PR(1)							      General Commands Manual							     PR(1)

NAME
pr - print file SYNOPSIS
pr [ option ... ] [ file ... ] DESCRIPTION
Pr produces a printed listing of one or more files on its standard output. The output is separated into pages headed by a date, the name of the file or a specified header, and the page number. With no file arguments, pr prints its standard input. Options apply to all following files but may be reset between files: -n Produce n-column output. +n Begin printing with page n. -b Balance columns on last page, in case of multi-column output. -d Double space. -en Set the tab stops for input text every n spaces. -h Take the next argument as a page header (file by default). -in Replace sequences of blanks in the output by tabs, using tab stops set every n spaces. -f Use formfeeds to separate pages. -ln Take the length of the page to be n lines instead of the default 66. -m Print all files simultaneously, each in one column. -n Number the lines of each file. -on Offset the left margin n character positions. -sc Separate columns by the single character c instead of aligning them with white space. A missing c is taken to be a tab. -t Do not print the 5-line header or the 5-line trailer normally supplied for each page. -wn For multi-column output, take the width of the page to be n characters instead of the default 72. SOURCE
/sys/src/cmd/pr.c SEE ALSO
cat(1), lp(1) PR(1)
All times are GMT -4. The time now is 02:04 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy