Fill in missing rows with zero to have uniform table


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Fill in missing rows with zero to have uniform table
# 1  
Old 03-23-2015
Fill in missing rows with zero to have uniform table

Hello,
I have two files of same structure except some rows are missing randomly in each file. How do I fill the missing rows to have the exact ID column (S01 ~ S96) and rest columns filled with "0" with awk? The purpose of this step is to join the two files side by side. The closest thread is here.
Code:
file1:
S01    36407    53706    88540
S02    69343    87098    87316
S03    50133    59721    107923
S05    1290    41074    135810
S06    11285    30164    40361
S07    11285    30164    40361
S10    11298    30165    40361
S11    18311    37266    135798
S12    14567    35958    62691

Code:
file2:
S02    14644    37964    70990
S04    52922    84177    87225
S05    43427    68368    109344
S06    4212    15654    15664
S08    16257    41558    45595
S10    1290    41074    135810
S12    30731    60117    77166

Code:
file1_padded
S01    36407    53706    88540
S02    69343    87098    87316
S03    50133    59721    107923
S04    0    0    0
S05    0    41074    135810       #There was a typo 
S05   1290    41074    135810     #This is correct   
S06    11285    30164    40361
S07    11285    30164    40361
S08    0    0    0
S09    0    0    0
S10    11298    30165    40361
S11    18311    37266    135798
S12    14567    35958    62691

Code:
 file2_padded:
S01    0    0    0
S02    14644    37964    70990
S03    0    0    0
S04    52922    84177    87225
S05    43427    68368    109344
S06    4212    15654    15664
S07    0    0    0
S08    16257    41558    45595
S09    0    0    0
S10    1290    41074    135810
S11    0    0    0
S12    30731    60117    77166

Code:
file_joined
S01    36407    53706    88540        S01    0    0    0
S02    69343    87098    87316        S02    14644    37964    70990
S03    50133    59721    107923       S03    0    0    0
S04    0    0    0                    S04    52922    84177    87225
S05    1290    41074    135810           S05    43427    68368    109344    # Corrected
S06    11285    30164    40361        S06    4212    15654    15664
S07    11285    30164    40361        S07    0    0    0
S08    0    0    0                    S08    16257    41558    45595
S09    0    0    0                    S09    0    0    0
S10    11298    30165    40361        S10    1290    41074    135810
S11    18311    37266    135798       S11    0    0    0
S12    14567    35958    62691        S12    30731    60117    77166

I am not sure if there is a trick in awk to do this job,at least to fill the first column (the ID column). Thanks a lot!

Last edited by yifangt; 03-24-2015 at 04:43 PM.. Reason: add link
# 2  
Old 03-23-2015
Hi, try something like this:

Code:
awk '
  BEGIN {
    OFS="\t"
    zero=OFS 0 OFS 0 OFS 0
  }
  {
    i=$1
    $1=""
    C[i]
  }
  NR==FNR {
    A[i]=$0
    next
  } 
  {
    B[i]=$0
  }
  END {
    for(i in C)
      print i (i in A?A[i]:zero), "", i (i in B?B[i]:zero)
  }
' file1 file2

If it is not in the right order you could pipe the output through sort..


---
I assumed that the "padded" S05 line in file1 is incorrect in your example:
Code:
S05    0    41074    135810


Last edited by Scrutinizer; 03-23-2015 at 02:26 PM..
# 3  
Old 03-23-2015
I suppose that you could read each line assuming that the first column is meant to increase sequentially, however another method might be to use the some reverse logic. Could you:-
  • Get every first column from file1 into a file index1, perhaps with cut
  • Edit the file to insert a leading carat ^ to each line
  • Use this to find records in file2 that are not in file1 using grep -vf
  • Append new records (with zero values) to file1
  • Sort file1 to put the records in order
  • Repeat the process for file2 selecting records listed in file1 (original or the updated one)
It seems a bit convoluted but I'm trying to think of the fewest I/O operations to achieve your goal, especially if you have very large files.
Would that give you an option, or are there likely to be records which are missing from both files?


Do let me know if I've missed the point entirely.

Kind regards
Robin
This User Gave Thanks to rbatte1 For This Post:
# 4  
Old 03-23-2015
Thanks!
Could you please elaborate this part?
Code:
 {
    i=$1
    C[i]
    $1=""
  }

I was thinking to process each file and fill in all the missing IDs in the column first, then do "paste" any two files.
The good part is all the files have same IDs from S01 to S96. Thanks again!
------------------------------------
Hi, Robin,
Thanks for your reply. I tried exactly what you described, but that is not what I want. I was thinking awk would be easier to combine all your steps into a single script.
By the way, 1) Yes, all the IDs would be sorted. 2)There was a typo in S05 which has been corrected as indicated.

Last edited by yifangt; 03-23-2015 at 05:52 PM..
# 5  
Old 03-23-2015
Hi, that piece of awk code should do the following:
Code:
 {
    i=$1    set i to the first field ($1)
    $1=""   empty the first field
    C[i]    Create an index i in array C (which will contain all possible $1 values in the END part)
  }

set i to the first field ($1)
# 6  
Old 03-23-2015
Does this block before NR==FNR only apply to file1?
There is a bug for one situation for your script: say if S23 is missing in both file, the merged file will not have S23. That's why I thought of padding each file separately first with IDs from S01 to S96.
Maybe I should ask the question in another way: pad each file with IDs from S01 to S96, fill the missing field with "0" if the ID is missing in the original file.
Thanks!

Last edited by yifangt; 03-24-2015 at 12:11 PM..
# 7  
Old 03-23-2015
You could try an adaptation to the script:
Code:
awk -v n=96 '
  BEGIN {
    OFS="\t"
    zero=OFS 0 OFS 0 OFS 0
  }
  {
    i=$1
    $1=""
  }
  NR==FNR {
    A[i]=$0
    next
  }
  {
    B[i]=$0
  }
  END {
    for(j=1; j<=n; j++) {
      i=sprintf("S%02d",j)
      print i (i in A?A[i]:zero), "", i (i in B?B[i]:zero)
    }
  }
' file1 file2

This User Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Fill in missing values

Hi, I have a data sample as shown below. I want to fill in the left column so that the line will be continuous. For example, between 1 and 5 should be 2,3,4. And corresponding values in the right column will be 0. Thus the expected data should look like that: 1 1 1 10 1 2 1 3 1 5 1 6 2 0... (6 Replies)
Discussion started by: theanh0508
6 Replies

2. UNIX for Beginners Questions & Answers

Fill in missing hours and interpolate values using awk.

I have a time series data like this 40754,35.6931,51.3092,201610160700,21.0 40754,35.6931,51.3092,201610160800,23.0 40754,35.6931,51.3092,201610160900,24.0 40754,35.6931,51.3092,201610161000,24.0 40754,35.6931,51.3092,201610161300,25.0 40754,35.6931,51.3092,201610161400,23.0... (6 Replies)
Discussion started by: emirzaei
6 Replies

3. Shell Programming and Scripting

How to get the missing date and day in a table?

Hi Am using unix Aix Ksh Have Created table called vv and i have inserted two date Select * from vv; Output :- New_date 21/02/2013 24/02/2013 I have tried Using One query but Unsuccessful so far.. SELECT l.new_date + '1 day' as miss from vv as l (7 Replies)
Discussion started by: Venkatesh1
7 Replies

4. Shell Programming and Scripting

Fill in missing Data

hello everyone, I have a task to input missing data into a file. example of my data below: Wed Feb 01 09:00:02 EST 2012,,,0.4,0.3,,0.3,,0.3,,0.5,,0.3,,,0.4,0.3, Wed Feb 01 09:00:11 EST 2012,,,,,,,0.2,,,,,,,,,, Wed Feb 01 09:00:22 EST... (23 Replies)
Discussion started by: Nolph
23 Replies

5. Shell Programming and Scripting

Fill missing values with 2

Hi All, I have 100 .txt files which look like this: 3 4 5 6 7 Now, some files have some numbers missing in them and they look like this: 4 5 6 (6 Replies)
Discussion started by: shoaibjameel123
6 Replies

6. Shell Programming and Scripting

Fill missing numbers in second column with zeros

Hi All, I have 100 files with names like this: 1.dat, 2.dat, 3.dat until 100.dat. My dat files look like this: 42323 0 438939 1 434 0 0.9383 3434 120.23 3 234 As you can see in the second column, some numbers are missing. I want to fill those missing places with 0's in all... (3 Replies)
Discussion started by: shoaibjameel123
3 Replies

7. Shell Programming and Scripting

Compare columns and rows with template, and fill empty slots.

Hi, I'm working on a script that will take the contents of a file, that is in a row and column format, and compare it to a arrangment file. Such that if there is any or all blanks in my content file, the blank will be filled with a flag and will retain the row and column configuration. Ex. ... (2 Replies)
Discussion started by: hizzle
2 Replies

8. Shell Programming and Scripting

fill in missing columns

It can't be that hard, but I just can't figure it out: I have file like: File Sub-brick M_1 S_1 M_2 S_2 M_4 S_4 ... xxx 1 214 731 228 621 132 578 ... and would like to get 0 0 where M_3 S_3 is missing xxx 1 214 731 228 621 0 0 132 578 ... I wrote following script, but can't figure out... (3 Replies)
Discussion started by: avvk
3 Replies

9. Shell Programming and Scripting

Fill in missing numbers in range

I need to edit a list of numbers on the following form: 1 1.0 2 1.4 5 2.1 7 1.9 I want: 1 1.0 2 1.4 3 0.0 4 0.0 5 2.1 6 0.0 7 1.9 (i want to add the missing number in column 1 together with 0.0 in column 2). I guess it is rather trivial but i didn't even manage to read column... (5 Replies)
Discussion started by: bistru
5 Replies
Login or Register to Ask a Question