Print the 1st column and the value in 2nd or 3rd column if that is different from the values in 1st


 
Thread Tools Search this Thread
Operating Systems Linux Print the 1st column and the value in 2nd or 3rd column if that is different from the values in 1st
# 1  
Old 10-29-2015
Print the 1st column and the value in 2nd or 3rd column if that is different from the values in 1st

I have file that looks like this,

Code:
DIP-17571N|refseq:NP_651151   DIP-17460N|refseq:NP_511165|uniprotkb:P45890      DIP-17571N|refseq:NP_651151
DIP-19241N|refseq:NP_524261    DIP-19241N|refseq:NP_524261       DIP-17151N|refseq:NP_524316|uniprotkb:O16797
DIP-19588N|refseq:NP_731165     DIP-19588N|refseq:NP_731165       DIP-19589N|refseq:NP_647684
DIP-20632N|refseq:NP_476602     DIP-492N|refseq:NP_477499|uniprotkb:P23647        DIP-20632N|refseq:NP_476602
DIP-23436N|refseq:NP_536784     DIP-23436N|refseq:NP_536784       DIP-23130N|refseq:NP_652017
DIP-18269N|refseq:NP_523724     DIP-20786N|refseq:NP_649297       DIP-18269N|refseq:NP_523724
DIP-20861N|refseq:NP_647634    DIP-20861N|refseq:NP_647634       DIP-19344N|refseq:NP_572751
DIP-23837N|refseq:NP_573057   DIP-23837N|refseq:NP_573057       DIP-5N|refseq:NP_476859|uniprotkb:P07207
DIP-59926N|refseq:NP_228099     DIP-59926N|refseq:NP_228099       DIP-59927N|refseq:NP_228100
DIP-23655N|refseq:NP_648922    DIP-17971N|refseq:NP_648929       DIP-23655N|refseq:NP_648922
DIP-22713N|refseq:NP_524108    DIP-21138N|refseq:NP_722721       DIP-22713N|refseq:NP_524108
DIP-21320N|refseq:NP_730973     DIP-17533N|refseq:NP_611700       DIP-21320N|refseq:NP_730973
DIP-22051N|refseq:NP_573109     DIP-28047N        DIP-22051N|refseq:NP_573109

I want to print the 1st column and the value in 2nd or 3rd column if that is different from the values in 1st column, side by side.

This is how I want the output to be like,
Code:
DIP-17571N|refseq:NP_651151   DIP-17460N|refseq:NP_511165|uniprotkb:P45890
DIP-19241N|refseq:NP_524261   DIP-17151N|refseq:NP_524316|uniprotkb:O16797
DIP-19588N|refseq:NP_731165    DIP-19589N|refseq:NP_647684
DIP-20632N|refseq:NP_476602    DIP-492N|refseq:NP_477499|uniprotkb:P23647 
DIP-23436N|refseq:NP_536784    DIP-23130N|refseq:NP_652017
DIP-18269N|refseq:NP_523724    DIP-20786N|refseq:NP_649297

and so on...

Any help would be highly appreciated.
# 2  
Old 10-29-2015
Hello Syeda,

Could you please try following and let me know if this helps you.
Code:
awk '{A=$1}($1 != $2){A=$1 OFS $2} ($1 != $3){A=A?A OFS $3:$3} {print A;A=""}'  Input_file

Output will be as follows.
Code:
DIP-17571N|refseq:NP_651151 DIP-17460N|refseq:NP_511165|uniprotkb:P45890
DIP-19241N|refseq:NP_524261 DIP-17151N|refseq:NP_524316|uniprotkb:O16797
DIP-19588N|refseq:NP_731165 DIP-19589N|refseq:NP_647684
DIP-20632N|refseq:NP_476602 DIP-492N|refseq:NP_477499|uniprotkb:P23647
DIP-23436N|refseq:NP_536784 DIP-23130N|refseq:NP_652017
DIP-18269N|refseq:NP_523724 DIP-20786N|refseq:NP_649297
DIP-20861N|refseq:NP_647634 DIP-19344N|refseq:NP_572751
DIP-23837N|refseq:NP_573057 DIP-5N|refseq:NP_476859|uniprotkb:P07207
DIP-59926N|refseq:NP_228099 DIP-59927N|refseq:NP_228100
DIP-23655N|refseq:NP_648922 DIP-17971N|refseq:NP_648929
DIP-22713N|refseq:NP_524108 DIP-21138N|refseq:NP_722721
DIP-21320N|refseq:NP_730973 DIP-17533N|refseq:NP_611700
DIP-22051N|refseq:NP_573109 DIP-28047N

Also want to add here if both columns $2 and $3 are not equal to $1 then complete line will be printed(which is not in provided Input_file).


Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 10-29-2015
Unfortunately, the | char has a special meaning for regexes, so it must be circumvented; else it were much simpler:
Code:
 awk '{T="  *" $1; gsub (/\|/, "\\\|", T); sub (T, " ")} 1' file

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Compare 1st column from 2 file and if match print line from 1st file and append column 7 from 2nd

hi I have 2 file with more than 10 columns for both 1st file apple,0,0,0...... orange,1,2,3..... mango,2,4,5..... 2nd file apple,2,3,4,5,6,7... orange,2,3,4,5,6,8... watermerlon,2,3,4,5,6,abc... mango,5,6,7,4,6,def.... (1 Reply)
Discussion started by: tententen
1 Replies

2. UNIX for Dummies Questions & Answers

Want the UNIX code - I want to sum of the 1st column wherever the first 2nd and 3rd columns r equal

I have the code for the below things.. File1 has the content as below 8859 0 subscriberCreate 18 0 subscriberPaymentMethodChange 1650 0 subscriberProfileUpdate 7668 0 subscriberStatusChange 13 4020100 subscriberProfileUpdate 1 4020129 subscriberStatusChange 2 4020307 subscriberCreate 8831... (5 Replies)
Discussion started by: Mahen
5 Replies

3. Shell Programming and Scripting

Sum column values based in common identifier in 1st column.

Hi, I have a table to be imported for R as matrix or data.frame but I first need to edit it because I've got several lines with the same identifier (1st column), so I want to sum the each column (2nd -nth) of each identifier (1st column) The input is for example, after sorted: K00001 1 1 4 3... (8 Replies)
Discussion started by: sargotrons
8 Replies

4. Shell Programming and Scripting

Changing values only in 3rd column and 4th column

#cat file testing test! nipw asdkjasjdk ok! what !ok host server1 check_ssh_disk!102.56.1.101!30!50!/ other host server 2 des check_ssh_disk!192.6.1.10!40!30!/ #grep check file| awk -F! '{print $3,$4}'|awk '{gsub($1,"",$1)}1' 50 30 # Output: (6 Replies)
Discussion started by: kenshinhimura
6 Replies

5. Shell Programming and Scripting

awk Print New Column For Every Two Lines and Match On Multiple Column Values to print another column

Hi, My input files is like this axis1 0 1 10 axis2 0 1 5 axis1 1 2 -4 axis2 2 3 -3 axis1 3 4 5 axis2 3 4 -1 axis1 4 5 -6 axis2 4 5 1 Now, these are my following tasks 1. Print a first column for every two rows that has the same value followed by a string. 2. Match on the... (3 Replies)
Discussion started by: jacobs.smith
3 Replies

6. Shell Programming and Scripting

Print every 5 4th column values as separate row with different first column

Hi, I have the following file, chr1 100 200 20 chr1 201 300 22 chr1 220 345 23 chr1 230 456 33.5 chr1 243 567 90 chr1 345 600 20 chr1 430 619 21.78 chr1 870 910 112.3 chr1 914 920 12 chr1 930 999 13 My output would be peak1 20 22 23 33.5 90 peak2 20 21.78 112.3 12 13 Here the... (3 Replies)
Discussion started by: jacobs.smith
3 Replies

7. Shell Programming and Scripting

Calculate 2nd Column Based on 1st Column

Dear All, I have input file like this. input.txt CE2_12-15 3950.00 589221.0 9849709.0 768.0 CE2_12_2012 CE2_12-15 3949.00 589199.0 9849721.0 768.0 CE2_12_2012 CE2_12-15 3948.00 589178.0 9849734.0 768.0 CE2_12_2012 CE2_12-52 1157.00 ... (3 Replies)
Discussion started by: attila
3 Replies

8. Shell Programming and Scripting

1st column,2nd column on first line 3rd,4th on second line ect...

I need to take one column of data and put it into the following format: 1st line,2nd line 3rd line,4th line 5th line,6th line ... Thanks! (6 Replies)
Discussion started by: batcho
6 Replies

9. Shell Programming and Scripting

comparing column of two different files and print the column from in order of 2nd file

Hi friends, My file is like: Second file is : I need to print the rows present in file one, but in order present in second file....I used while read gh;do awk ' $1=="' $gh'" {print >> FILENAME"output"} ' cat listoffirstfile done < secondfile but the output I am... (14 Replies)
Discussion started by: CAch
14 Replies

10. Shell Programming and Scripting

print unique values of a column and sum up the corresponding values in next column

Hi All, I have a file which is having 3 columns as (string string integer) a b 1 x y 2 p k 5 y y 4 ..... ..... Question: I want get the unique value of column 2 in a sorted way(on column 2) and the sum of the 3rd column of the corresponding rows. e.g the above file should return the... (6 Replies)
Discussion started by: amigarus
6 Replies
Login or Register to Ask a Question