Remove columns with duplicate entries


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove columns with duplicate entries
# 1  
Old 04-23-2014
Remove columns with duplicate entries

I have a 13gb file. It has the following columns:
The 3rd column is basically correlation values. I want to delete those rows which are repeated between the columns:
Code:
A B 0.04
B C 0.56
B B 1
A A 1
C D 1
C C 1

Desired Output: (preferably in a .csv format

Code:
A,B,0.04
B,C,0.56
C,D,1

not able to use any editor because of the large size. Kindly help.

Last edited by Sanchari; 04-23-2014 at 01:41 PM..
# 2  
Old 04-23-2014
Code:
awk '$1 != $2' myFile

# 3  
Old 04-23-2014
Thanks, is it possible to redirect the output in a csv format as well ?

like A,B,0.04 ?
# 4  
Old 04-23-2014
Quote:
Originally Posted by Sanchari
Thanks, is it possible to redirect the output in a csv format as well ?

like A,B,0.04 ?
Code:
awk '$1 != $2 && $1=$1' OFS=, myFile

This User Gave Thanks to vgersh99 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate entries based on the range

I have file like this: chr start end chr15 99874874 99875874 chr15 99875173 99876173 aa1 chr15 99874923 99875923 chr15 99875173 99876173 aa1 chr15 99874962 99875962 chr15 99875173 99876173 aa1 chr1 ... (7 Replies)
Discussion started by: raj_k
7 Replies

2. Shell Programming and Scripting

Remove duplicate entries from the same line

Hello, I have a file which have several duplicate entries on the same line: File ID source 1 GM GF GM 2 GM GF GM GF GM GF GM GF GM GF 3 GM GF GM SF GM GF GM SF 4 FF FF FF FF 5 FF GM FF ... (2 Replies)
Discussion started by: nans
2 Replies

3. Shell Programming and Scripting

Request to check:remove entries with duplicate numbers in first row

Hi I have a file 1 xyz 456 1 xyz 456 1 xyz 456 2 abc 8459 3 gfd 657 4 ghf 658 4 ghf 658 I want the output 1 xyz 456 2 abc 8459 3 gfd 657 4 ghf 658 (3 Replies)
Discussion started by: manigrover
3 Replies

4. Shell Programming and Scripting

Remove Duplicate by considering multiple columns

hi friends, my input chr1 exon 35204 35266 gene_id "GOLGB1"; transcript_id "GOLGB1"; chr1 exon 42357 42473 gene_id "GOLGB1"; transcript_id "GOLGB1"; chr1 exon 45261 45404 gene_id "GOLGB1"; transcript_id "GOLGB1"; chr1 exon 50701 50778 gene_id "GOLGB1"; transcript_id "GOLGB1";... (2 Replies)
Discussion started by: jacobs.smith
2 Replies

5. UNIX for Dummies Questions & Answers

remove duplicate entries from dhcp.lease

Hi, I have to parse the dhcp.lease file and have to keep the most recent entry and remove the rest and also the number of lines between any two leases might not always be the same. eg: lease 5.5.5.252 { starts Wed Jul 27 09:48:39 2011 ends Wed Jul 27 21:48:39 2011 tstp Wed Jul... (1 Reply)
Discussion started by: bitspradp
1 Replies

6. UNIX for Dummies Questions & Answers

remove duplicate lines based on two columns and judging from a third one

hello all, I have an input file with four columns like this with a lot of lines and for example, line 1 and line 5 match because the first 4 characters match and the fourth column matches too. I want to keep the line that has the lowest number in the third column. So I discard line 5.... (5 Replies)
Discussion started by: TheTransporter
5 Replies

7. UNIX for Dummies Questions & Answers

help to identify duplicate columns adjacent value

Hi friends, I have a xlsheet like below first column having id ABCfollowed by 7digit numbers and the next column have title against the ids. Titles are unique and duplicateboth, but ids are unique even for duplicate title.Now I need to identify those duplicate title having the highest id for... (9 Replies)
Discussion started by: umapearl
9 Replies

8. Shell Programming and Scripting

Remove duplicate columns in input file

hello, I have an input file which looks like this: 2 C:G 17 -0.14 8.75 33.35 3 G:C 16 -2.28 0.98 28.22 4 C:G 15 0.39 11.06 29.31 5 G:C 14 2.64 5.17 36.07 6 G:C 13 -0.65 2.05 21.94 7 C:G 11 138.96 21.64 14.40 9 C:G 27 -2.40 6.95 27.98 10 C:G 26 2.89 15.60 34.33 11 G:C... (7 Replies)
Discussion started by: linux_usr
7 Replies

9. Shell Programming and Scripting

Single command for add 2 columns and remove 2 columns in unix/performance tuning

Hi all, I have created a script which adding two columns and removing two columns for all files. Filename: Cust_information_1200_201010.txt Source Data: "1","Cust information","123","106001","street","1-203 high street" "1","Cust information","124","105001","street","1-203 high street" ... (0 Replies)
Discussion started by: onesuri
0 Replies

10. UNIX for Dummies Questions & Answers

Duplicate columns and lines

Hi all, I have a tab-delimited file and want to remove identical lines, i.e. all of line 1,2,4 because the columns are the same as the columns in other lines. Any input is appreciated. abc gi4597 9997 cgcgtgcg $%^&*()()* abc gi4597 9997 cgcgtgcg $%^&*()()* ttt ... (1 Reply)
Discussion started by: dr_sabz
1 Replies
Login or Register to Ask a Question