Filling a tab-separated file with known missing entries in columns


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Filling a tab-separated file with known missing entries in columns
# 1  
Old 04-10-2012
Filling a tab-separated file with known missing entries in columns

Hello all,

I have a file which is tab separated like that:


Code:
PHE_205_A    TIP_127_W    ARG_150_B
MET_1150_A    TIP_12_W    VAL_11_B
GLU_60_A    TIP_130_W    ARG_143_B
LEU_1033_A    TIP_203_W    ARG_14_B
SER_1092_A    TIP_203_W    
THR_1090_A    TIP_203_W    
SER_1092_A    TIP_25_W    SER_104_B
TYR_15_B    TIP_25_W    
ASP_61_A    TIP_34_W    THR_134_B
SER_1204_A    TIP_46_W    ASP_8_B
ASP_61_A    TIP_63_W    ARG_131_B
THR_90_A    TIP_76_W    TYR_49_B
THR_1090_A    TIP_91_W    SER_100_B


and I want it to be like that:
Code:
PHE_205_A    TIP_127_W    ARG_150_B
MET_1150_A    TIP_12_W    VAL_11_B
GLU_60_A    TIP_130_W    ARG_143_B
LEU_1033_A    TIP_203_W    ARG_14_B
SER_1092_A    TIP_203_W    ARG_14_B
THR_1090_A    TIP_203_W    ARG_14_B
SER_1092_A    TIP_25_W    SER_104_B
SER_1092_A    TIP_25_W    TYR_15_B   
ASP_61_A    TIP_34_W    THR_134_B
SER_1204_A    TIP_46_W    ASP_8_B
ASP_61_A    TIP_63_W    ARG_131_B
THR_90_A    TIP_76_W    TYR_49_B
THR_1090_A    TIP_91_W    SER_100_B

The missing entry is always the one found in the line before it (if it is _B, then it is always _B, similar is for _A)

I would really appreciate if somebody would give me an idea on how to re-format it....

Thanks!

Moderator's Comments:
Mod Comment Please use code tags instead of quote tags (one button to the right)

Last edited by Scrutinizer; 04-10-2012 at 11:00 AM..
# 2  
Old 04-10-2012
Try
Code:
awk '
NF == 3 { f = $3 }
NF == 2 { $3 = f }
1' FILE

# 3  
Old 04-10-2012
Quote:
Originally Posted by yazu
Try
Code:
awk '
NF == 3 { f = $3 }
NF == 2 { $3 = f }
1' FILE

this code is doing the job well for the third column, but is screwing the first one...
output:

Code:
PHE_205_A    TIP_127_W    ARG_150_B
MET_1150_A    TIP_12_W    VAL_11_B
GLU_60_A    TIP_130_W    ARG_143_B
LEU_1033_A    TIP_203_W    ARG_14_B
SER_1092_A    TIP_203_W    ARG_14_B
THR_1090_A    TIP_203_W    ARG_14_B
SER_1092_A    TIP_25_W    SER_104_B
TYR_15_B    TIP_25_W    SER_104_B
ASP_61_A    TIP_34_W    THR_134_B
SER_1204_A    TIP_46_W    ASP_8_B
ASP_61_A    TIP_63_W    ARG_131_B
THR_90_A    TIP_76_W    TYR_49_B
THR_1090_A    TIP_91_W    SER_100_B

what is wrong is the bold entry, where SER_1092_A should be....
to exlpain, the output of this line should be

SER_1092_A TIP_25_W TYR_15_B
*Always *_A should be in the first column, TIP* in the second and *_B in the third....

Last edited by Scrutinizer; 04-10-2012 at 11:00 AM..
# 4  
Old 04-10-2012
Oh... Well, I didn't notice. But I think it's impossible without additional information - how should the program determine what field is absent? Is the second field always present and it is TIPxxx?
# 5  
Old 04-10-2012
Yes, in the second field TIP is there.

if a field is absent, either entries ending in _A or in _B will be missing.

So the program can read the line with the missing information (has two columns only):
-If it misses entry _B,
then it can go to the previous line,
find the _B entry and copy it in the third column.
Else if it misses _A,
go to the previous line,
find the _A entry and copy it in the first field and then swap TIP column and *_B
so that always the three column format should be *_A TIP* *_B

That's the idea.....

Thanks a lot!
# 6  
Old 04-10-2012
Use three Array elements to hold on to the last A,B or W value. The part after the last _ is the designator. Empty fields get stored in a 4th element array[" "], that never gets printed.. Try something like this..
Code:
awk -F'\t' '{for(i=1;i<=NF;i++){c=$i; sub(/.*_/,x,c); P[c]=$i} print P["A"],P["W"],P["B"]}' OFS='\t' infile

Output:
Code:
PHE_205_A	TIP_127_W	ARG_150_B
MET_1150_A	TIP_12_W	VAL_11_B
GLU_60_A	TIP_130_W	ARG_143_B
LEU_1033_A	TIP_203_W	ARG_14_B
SER_1092_A	TIP_203_W	ARG_14_B
THR_1090_A	TIP_203_W	ARG_14_B
SER_1092_A	TIP_25_W	SER_104_B
SER_1092_A	TIP_25_W	TYR_15_B
ASP_61_A	TIP_34_W	THR_134_B
SER_1204_A	TIP_46_W	ASP_8_B
ASP_61_A	TIP_63_W	ARG_131_B
THR_90_A	TIP_76_W	TYR_49_B
THR_1090_A	TIP_91_W	SER_100_B


Last edited by Scrutinizer; 04-10-2012 at 11:49 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 7  
Old 04-10-2012
the code is functional and extremely fast! Although I get the idea, still I couldn't write it! Will look into it.

Thanks Scrutinizer.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Read a tab separated file with empty column

Hi all, I'm trying to read a tab separated file and apply some functions on each column. I have an issue with empty column. Exemple: $ #cat with the sed to allow you to see my tab $ cat foo.txt| sed 's/\t/;/g' a;1;x b;;yI wanted to something like that: while read col1 col2 col3 do ... (4 Replies)
Discussion started by: maturix
4 Replies

2. Shell Programming and Scripting

How to replace & with and in tab separated file?

Hi, I have a tab separated. I want to replace all the "&" in 8th column of the file with "and" .I am trying with awk -F, -vOFS=\\t '{$8=($8=="&")?"and":$8}1' test> test1.txt My file is abc def ghk hjk lkm hgb jkluy acvf & bhj hihuhu fgg me mine he her go went has has & had hgf hgy ... (1 Reply)
Discussion started by: jagdishrout
1 Replies

3. Shell Programming and Scripting

Problem with a tab separated file

Hi, I have created a tab separated file from the following input file. ADDRESS1 CITY STATE POSTAL COUNTRY LON LAT 32 PRINZREGENTENSTRASSE ROSENHEIM BAYERN 83022 DEU 1212182 4785699 263 VIA DANTE ALIGHIERI BARI PUGLIA 70122 ITA 1686233 4112154 30 VIA MILANO ... (1 Reply)
Discussion started by: ramky79
1 Replies

4. UNIX for Dummies Questions & Answers

Filling the empty columns in a fixed column file

Hi, I have a file with fixed number of columns (total 58 columns) delimeted by pipe (|). Due to a bug in the application the export file does not come with fixed number of columns. The missing data columns are being replaced by blank in the output file. In one line I can have 25 columns (33... (1 Reply)
Discussion started by: yale_work
1 Replies

5. UNIX for Dummies Questions & Answers

tab-separated file to matrix conversion

hello all, i have an input file like that A A X0 A B X1 A C X2 ... A Z Xx B A X1 B B X3 .... Z A Xx Z B X4 and i want to have an output like that A B C D A X0 X1 X2 Xy B X1 X3 X4 (4 Replies)
Discussion started by: TheTransporter
4 Replies

6. Shell Programming and Scripting

Convert a tab separated file using bash

Dear all, I have a file in this format (like a matrix) - A B C .. X A 1 4 2 .. 2 B 2 6 4 .. 8 C 3 5 5 .. 4 . . . ... . X . . ... . and want to convert it into a file with this format: A A = 1 A B = 4 A C = 2 ... A X = 2 B A = 2 B B = 6 etc (2 Replies)
Discussion started by: TheTransporter
2 Replies

7. Shell Programming and Scripting

Compare two columns separated by a tab

witam potrzebuje polecenia porownujacego koumny na podstawie n-ostatnich znakow danej linnijki tj mam 2 koumny AiB zawierajace ciag dowolnych znakow (dlugosci w kazdej linijce mga byc rozne wiec uzycie substra odpada) A B ewewewabc nbgujnnabc... (3 Replies)
Discussion started by: Toudi
3 Replies

8. UNIX for Dummies Questions & Answers

Sum up a decimal column in a tab separated text file and error handling

Hi, I have a small requirement where i need to sum up a column in a text file. Input file 66ab 000000 534385 -00000106350.00 66cd 000000 534485 -00013364511.00 66ad 000000 534485 -00000426548.00 672a 000000 534485 000000650339.82... (5 Replies)
Discussion started by: pssandeep
5 Replies

9. Shell Programming and Scripting

Filling in missing columns

Hi all, I have a file that contains about 1000 rows and 800 columns. Nearly every row has 800 columns but some DONT. I want to extend the rows that dont have values with NA's. Here is an example: my file bob 2 4 5 6 8 9 4 5 tar 2 4 5 4 3 2 9 1 bro 3 5 3 4 yar 2 ... (7 Replies)
Discussion started by: gisele_l
7 Replies

10. Shell Programming and Scripting

parse file into tab separated columns

Hello, I am trying to parse a file that resembles the last three groupings into something looking like the first two lines. I've fiddled with sed and awk a bit, but can't get anything to work properly. I need them separated by some delimiter. The file is some 23,000 lines of the stuff.... ... (9 Replies)
Discussion started by: dkozel
9 Replies
Login or Register to Ask a Question