Visit Our UNIX and Linux User Community


Help with remove the column that appear twice


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with remove the column that appear twice
# 1  
Old 09-05-2013
Help with remove the column that appear twice

Input file 1
Code:
                    S1                            S2          S3
comp95_c1    1.00      comp95_c1     1.00       3.00
comp4_c0      6.00      comp4_c0       6.00      6.00
comp3_c0      0.00      comp3_c0       0.00      4.00
comp15_c1    3.00      comp15_c1      3.00      3.00
comp28_c0    33.00    comp28_c0      33.00     2.00
comp23_c0    4.00      comp23_c0      4.00       3.00

Desired output file 1
Code:
                    S1        S2          S3
comp95_c1    1.00      1.00       3.00
comp4_c0      6.00      6.00      6.00
comp3_c0      0.00      0.00      4.00
comp15_c1    3.00      3.00      3.00
comp28_c0    33.00    33.00     2.00
comp23_c0    4.00      4.00       3.00

Input file 2
Code:
                       S1             S2                             S3
comp5_c1         1.00           1.00       comp5_c1      3.00
comp40_c0       6.00            6.00      comp40_c0     6.00
comp31_c0       0.00            0.00      comp31_c0     4.00
comp51_c1       3.00            3.00      comp51_c1     3.00
comp82_c0       33.00          33.00     comp82_c0     2.00
comp3_c0        4.00            4.00       comp3_c0      3.00

Desired output file 2
Code:
                       S1             S2         S3
comp5_c1         1.00           1.00      3.00
comp40_c0       6.00            6.00      6.00
comp31_c0       0.00            0.00      4.00
comp51_c1       3.00            3.00      3.00
comp82_c0       33.00          33.00     2.00
comp3_c0        4.00            4.00       3.00

I hope can remove the column (compXXX) that appear twice.
All the files are tab delimited.

Thanks for any advice.
# 2  
Old 09-05-2013
Try
Assuming you want to compare with column 1 only.

Code:
awk '{S=$1;for(i=2;i<=NF;i++){if($i != $1){S=S OFS $i}}print S;}' OFS="\t" file

# 3  
Old 09-05-2013
Hi pamu,

I did try your awk command for Input file 1.
It return the following result:
Code:
S1                S3
comp95_c1    1.00      1.00       3.00
comp4_c0      6.00      6.00      6.00
comp3_c0      0.00      0.00      4.00
comp15_c1    3.00      3.00      3.00
comp28_c0    33.00    33.00     2.00
comp23_c0    4.00      4.00       3.00

It seems like slightly different with desired output.
The line above "compXXXX" is a "\t" delimited and the content below "S1", "S2", "S3" are number etc.

Sorry for troubling you again.
# 4  
Old 09-05-2013
Is this what you want..?

Code:
awk '{T=NR==1?"\t":"";S=T $1;for(i=2;i<=NF;i++){if($i != $1){S=S OFS $i}}print S;}' OFS="\t" file

        S1      S2      S3
comp95_c1       1.00    1.00    3.00
comp4_c0        6.00    6.00    6.00
comp3_c0        0.00    0.00    4.00
comp15_c1       3.00    3.00    3.00
comp28_c0       33.00   33.00   2.00
comp23_c0       4.00    4.00    3.00

# 5  
Old 09-05-2013
Hi pamu,

It is almost there Smilie
But I just curious if my S1, S2, S3 is becomes like S1, S1, S3
Is it possible that you make it still print out the following result
Code:
                      S1      S1      S3
comp95_c1       1.00    1.00    3.00
comp4_c0        6.00    6.00    6.00
comp3_c0        0.00    0.00    4.00
comp15_c1       3.00    3.00    3.00
comp28_c0       33.00   33.00   2.00
comp23_c0       4.00    4.00    3.00

Sorry again.
I just notice some case work fine but some case won't work perfect if the S1,S2,S3 is becomes like S1,S1,S3 Smilie
# 6  
Old 09-05-2013
What abt this..?

Code:
 awk 'NR==1{$1=OFS OFS $1}1 NR>1{S=$1;for(i=2;i<=NF;i++){if($i != $1){S=S OFS $i}}print S;}' OFS="\t" file

                S1      S1      S3
comp95_c1       1.00    1.00    3.00
comp4_c0        6.00    6.00    6.00
comp3_c0        0.00    0.00    4.00
comp15_c1       3.00    3.00    3.00
comp28_c0       33.00   33.00   2.00
comp23_c0       4.00    4.00    3.00

# 7  
Old 09-05-2013
Hi pamu,

When I try to issue the following command:
Code:
awk 'NR==1{$1=OFS OFS $1}1 NR>1{S=$1;for(i=2;i<=NF;i++){if($i != $1){S=S OFS $i}}print S;}' OFS="\t" file > file.out

awk -F"\t" '{print $1"\t"}' file.out

comp95_c1       
comp4_c0        
comp3_c0        
comp15_c1       
comp28_c0       
comp23_c0       

awk -F"\t" '{print $2"\t"}' file.out

1.00   
6.00    
0.00   
3.00    
33.00   
4.00    

awk -F"\t" '{print $3"\t"}' file.out
S1
1.00   
6.00    
0.00   
3.00    
33.00   
4.00    

awk -F"\t" '{print $4"\t"}' file.out
S1
3.00
6.00
4.00
3.00
2.00
3.00

awk -F"\t" '{print $5"\t"}' file.out
S3

I will expect the following result:
Code:
awk -F"\t" '{print $1"\t"}' file.out

comp95_c1       
comp4_c0        
comp3_c0        
comp15_c1       
comp28_c0       
comp23_c0       

awk -F"\t" '{print $2"\t"}' file.out
S1
1.00   
6.00    
0.00   
3.00    
33.00   
4.00    

awk -F"\t" '{print $3"\t"}' file.out
S1
1.00   
6.00    
0.00   
3.00    
33.00   
4.00    

awk -F"\t" '{print $4"\t"}' file.out
S3
3.00
6.00
4.00
3.00
2.00
3.00

awk -F"\t" '{print $5"\t"}' file.out

Thanks for your advice regarding the arrangement of "S1, S1, S3" and their corresponding record for further analysis.

Previous Thread | Next Thread
Test Your Knowledge in Computers #725
Difficulty: Medium
Barbara Liskov developed the Liskov substitution principle, which guarantees semantic interoperability of data types in a hierarchy.
True or False?

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to remove a value from first column in the second column?

HI, I have a file like this: 5_2207830114 5_2207830114,13_2207830312,15_2207830336 8_2207830145 8_2207830145,15_2207830336 10_2207830220 5_2207830114,7_2207830138,8_2207830145,10_2207830220,12_2207830244,13_2207830312,15_2207830336,16_2207830343... (4 Replies)
Discussion started by: niki0211
4 Replies

2. UNIX for Dummies Questions & Answers

Remove a column using vi editor

How do i remove a column using vi editor Assuming the file to be of format 1: 010 0xad45 sp1 - 11:29:51.498583949 500249144 Event1 rst bcfe jhv rgc 456: 010 0xadb sp2 - 11:29:51.498600605 4464 Event0abcrd adabc aasd 45: 010 0x10 sp0 - 11:29:51.498614165 13560 Back adxca... (6 Replies)
Discussion started by: sp0
6 Replies

3. Shell Programming and Scripting

Remove the values from a certain column without deleting the Column name in a .CSV file

(14 Replies)
Discussion started by: dhruuv369
14 Replies

4. Shell Programming and Scripting

Remove the first character from the fourth column only if the column has four characters

I have a file as follows ATOM 5181 N AMET K 406 12.440 6.552 25.691 0.50 7.37 N ATOM 5182 CA AMET K 406 13.685 5.798 25.578 0.50 5.87 C ATOM 5183 C AMET K 406 14.045 5.179 26.909 0.50 5.07 C ATOM 5184 O MET K... (14 Replies)
Discussion started by: hasanabdulla
14 Replies

5. Shell Programming and Scripting

remove brackets and put it in a column and remove repeated entry

Hi all, I want to remove the remove bracket sign ( ) and put in the separate column I also want to remove the repeated entry like in first row in below input (PA156) is repeated ESR1 (PA156) leflunomide (PA450192) (PA156) leflunomide (PA450192) CHST3 (PA26503) docetaxel... (2 Replies)
Discussion started by: manigrover
2 Replies

6. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies

7. Shell Programming and Scripting

Remove first column from file

Hi, This is how data in test.txt file | |abc|zxcv|xy12| | |cvs|zzvc|a23p| How can remove first column. abc|zxcv|xy12| cvs|zzvc|a23p| Thanks srimitta (8 Replies)
Discussion started by: srimitta
8 Replies

8. Shell Programming and Scripting

to remove the last column

Hi Guys, I want to remove the last column of my file.. My file looks like this.. UPDATE TRDSTG.STRDCLM2 SET C_TREAD_COMPONENT='NR', X_MEMO_REF='M:LOP8 F 2009' WHERE C_SOURCE='CSC' AND D_QTR_APPLBTY=200902 AND I_DOCUMENT=381917678 AND C_TREAD_COMPONENT='GP' AND C_SFTY_CSQ='W08J01182' AND... (12 Replies)
Discussion started by: mac4rfree
12 Replies

9. Shell Programming and Scripting

remove a column of data

Hi my file has two columns: GAII_4:6:100:548:645/1 GTACACAACCCCCCCCCCCCACCCCACCCCCCCCCCCCCC GAII_4:6:100:1:1242/1 AGTCTGCCCCTCCCCCTNNNNNNNTCTTTTNCCTCCTCCT GAII_4:6:100:444:504/1 GTAACACACACCCTGATACTCCCCCCTCCACAACCGCTCT I want to remove the first column and keep only the scond column so it... (1 Reply)
Discussion started by: jdhahbi
1 Replies

Featured Tech Videos