How to combine and insert missing consecutive numbers - awk or script?


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers How to combine and insert missing consecutive numbers - awk or script?
# 1  
Old 05-19-2013
How to combine and insert missing consecutive numbers - awk or script?

Hi all,

I have two (2) sets of files that are based on some snapshots of database that I want to merge and insert any missing sequential number.

Below are example representation of these files:

Code:
 
file1:
DATE          TIME    COL1    COL2  COL3  COL4   ID
01/10/2013    0800    100     200   300   401    112
01/31/2013    1000    201     123   345   456    107
03/05/2013    1100    150     789   311   789    109
02/15/2013    1500    199     456   234   555    105
 
file2:
DATE          TIME    COL1    COL2  COL3  COL4   ID
02/10/2013    0800    100     200   300   401    115
07/31/2013    1000    201     123   345   456    111
08/05/2013    1100    150     789   311   789    108
12/15/2013    1500    199     456   234   555    101
 
Desired output
DATE          TIME    COL1    COL2  COL3  COL4   ID
12/15/2013    1500    199     456   234   555    101
----------    ----    ----    ----  ----  ----   102
----------    ----    ----    ----  ----  ----   103
----------    ----    ----    ----  ----  ----   104
02/15/2013    1500    199     456   234   555    105
----------    ----    ----    ----  ----  ----   106
01/31/2013    1000    201     123   345   456    107
08/05/2013    1100    150     789   311   789    108
03/05/2013    1100    150     789   311   789    109
----------    ----    ----    ----  ----  ----   110
07/31/2013    1000    201     123   345   456    111
01/10/2013    0800    100     200   300   401    112
----------    ----    ----    ----  ----  ----   113
----------    ----    ----    ----  ----  ----   114
02/10/2013    0800    100     200   300   401    115

I did a sort -n -k7 file1 > file1.a and sort -n -k7 file2 > file2.a

Code:
 
$:>  cat file1
DATE          TIME    COL1    COL2  COL3  COL4   ID
01/10/2013    0800    100     200   300   401    112
01/31/2013    1000    201     123   345   456    107
03/05/2013    1100    150     789   311   789    109
02/15/2013    1500    199     456   234   555    105
$:>  cat file2
DATE          TIME    COL1    COL2  COL3  COL4   ID
02/10/2013    0800    100     200   300   401    115
07/31/2013    1000    201     123   345   456    111
08/05/2013    1100    150     789   311   789    108
12/15/2013    1500    199     456   234   555    101
$:>  sort -n -k7 file1 > file1.a
$:>  sort -n -k7 file2 > file2.a
$:>  cat file1.a
DATE          TIME    COL1    COL2  COL3  COL4   ID
02/15/2013    1500    199     456   234   555    105
01/31/2013    1000    201     123   345   456    107
03/05/2013    1100    150     789   311   789    109
01/10/2013    0800    100     200   300   401    112
$:>  cat file2.a
DATE          TIME    COL1    COL2  COL3  COL4   ID
12/15/2013    1500    199     456   234   555    101
08/05/2013    1100    150     789   311   789    108
07/31/2013    1000    201     123   345   456    111
02/10/2013    0800    100     200   300   401    115
$:>  comm -3 file1.a file2.a
02/15/2013    1500    199     456   234   555    105
01/31/2013    1000    201     123   345   456    107
03/05/2013    1100    150     789   311   789    109
01/10/2013    0800    100     200   300   401    112
        12/15/2013    1500    199     456   234   555    101
        08/05/2013    1100    150     789   311   789    108
        07/31/2013    1000    201     123   345   456    111
        02/10/2013    0800    100     200   300   401    115
$:>  comm -3 file1.a file2.a | awk '{ print $1,$2,$3,$4,$5,$6,$7 }' | sort -n -k 7 > file3.a
$:>  cat file3.a
12/15/2013 1500 199 456 234 555 101
02/15/2013 1500 199 456 234 555 105
01/31/2013 1000 201 123 345 456 107
08/05/2013 1100 150 789 311 789 108
03/05/2013 1100 150 789 311 789 109
07/31/2013 1000 201 123 345 456 111
01/10/2013 0800 100 200 300 401 112
02/10/2013 0800 100 200 300 401 115

Not sure why comm -3 file1.a file2.a give the spaces so I had to do the comm | awk thing instead.

Now I need to include the missing number sequence, can this be done via awk or does it has to be scripted, i.e. using head -1 to get the starting ID number and tail -1 to get the ending ID number and then check for the missing ID number in the sequence?

Perhaps there is a more intelligent way of doing what am trying to do? Smilie

Any advise much appreciated. Thanks in advance.
# 2  
Old 05-19-2013
You don't need comm nor multiple sort commands. Just one sort with both files as arguments whose output is piped directly into awk.

Within awk, you would have to track the value of the last field, $NF, to determine when to generate the missing lines.

Regards,
Alister
# 3  
Old 05-20-2013
Code:
awk 'FNR==1 {next} length($0)' file1 file2 |sort -n -k7 > tmp
awk 'BEGIN{dummy="--------      ------  ----    ----  ----  ----  "; n=0}
     FNR==1 {n=$(NF); 
             print "DATE          TIME    COL1    COL2  COL3  COL4   ID"
             print $0
             next}
            { while(++n < $(NF)) {print dummy, n}
              print $0
            } ' tmp > newfile

output:
Code:
$ cat newfile

DATE          TIME    COL1    COL2  COL3  COL4   ID
12/15/2013    1500    199     456   234   555    101
--------      ------  ----    ----  ----  ----   102
--------      ------  ----    ----  ----  ----   103
--------      ------  ----    ----  ----  ----   104
02/15/2013    1500    199     456   234   555    105
--------      ------  ----    ----  ----  ----   106
01/31/2013    1000    201     123   345   456    107
08/05/2013    1100    150     789   311   789    108
03/05/2013    1100    150     789   311   789    109
--------      ------  ----    ----  ----  ----   110
07/31/2013    1000    201     123   345   456    111
01/10/2013    0800    100     200   300   401    112
--------      ------  ----    ----  ----  ----   113
--------      ------  ----    ----  ----  ----   114
02/10/2013    0800    100     200   300   401    115

This User Gave Thanks to jim mcnamara For This Post:
# 4  
Old 05-21-2013
Hi Jim,

Thanks a lot.

I tried yours and it works exactly the way I wanted it to. It was giving errors until I used nawk and it comes out exactly like the way you have on the screen.

---------- Post updated at 01:25 AM ---------- Previous update was at 01:11 AM ----------

Quote:
Originally Posted by jim mcnamara
Code:
awk 'FNR==1 {next} length($0)' file1 file2 |sort -n -k7 > tmp
awk 'BEGIN{dummy="--------      ------  ----    ----  ----  ----  "; n=0}
     FNR==1 {n=$(NF); 
             print "DATE          TIME    COL1    COL2  COL3  COL4   ID"
             print $0
             next}
            { while(++n < $(NF)) {print dummy, n}
              print $0
            } ' tmp > newfile

output:
Code:
 
 
$ cat newfile
 
DATE          TIME    COL1    COL2  COL3  COL4   ID
12/15/2013    1500    199     456   234   555    101
--------      ------  ----    ----  ----  ----   102
--------      ------  ----    ----  ----  ----   103
--------      ------  ----    ----  ----  ----   104
02/15/2013    1500    199     456   234   555    105
--------      ------  ----    ----  ----  ----   106
01/31/2013    1000    201     123   345   456    107
08/05/2013    1100    150     789   311   789    108
03/05/2013    1100    150     789   311   789    109
--------      ------  ----    ----  ----  ----   110
07/31/2013    1000    201     123   345   456    111
01/10/2013    0800    100     200   300   401    112
--------      ------  ----    ----  ----  ----   113
--------      ------  ----    ----  ----  ----   114
02/10/2013    0800    100     200   300   401    115

Hi Jim,

- Just curious, what is the purpose of length($0)?

---------- Post updated at 01:29 AM ---------- Previous update was at 01:25 AM ----------

Quote:
Originally Posted by alister
You don't need comm nor multiple sort commands. Just one sort with both files as arguments whose output is piped directly into awk.

Within awk, you would have to track the value of the last field, $NF, to determine when to generate the missing lines.

Regards,
Alister
Thanks Alister, didn't realize I can do sort file1 file2 Smilie

I've tried Jim's suggestion and it works fine, albeit with nawk. Just trying to see how I can get it to work usinig just awk this time 'coz some of the servers has only awk.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Check/print missing number in a consecutive range and remove duplicate numbers

Hi, In an ideal scenario, I will have a listing of db transaction log that gets copied to a DR site and if I have them all, they will be numbered consecutively like below. 1_79811_01234567.arc 1_79812_01234567.arc 1_79813_01234567.arc 1_79814_01234567.arc 1_79815_01234567.arc... (3 Replies)
Discussion started by: newbie_01
3 Replies

2. Shell Programming and Scripting

awk to insert missing string based on pattern in file

Using the file below, which will always have the first indicated by the digit after the - and last id in it, indicated by the digit after the -, I am trying to use awk to print the missing line or lines in file following the pattern of the previous line. For example, in the file below the next... (4 Replies)
Discussion started by: cmccabe
4 Replies

3. UNIX for Dummies Questions & Answers

Sum every 3 consecutive numbers in a column

Dear All, I have a file with only one column. And I want to add every 3 consecutive numbers together and print the result. Input File: 21.1 10 10 55 11 99 10 8 4 Expected Output: 41.1 (5 Replies)
Discussion started by: NamS
5 Replies

4. Shell Programming and Scripting

Help with awk script to get missing numbers in column 1

Hello to all, I have show below a file separated by commas. In first column has numbers where the last number is 13. 1,4 2,6 3,7 5,2 6,5 7,5 8,65 9,10 11,78 13,2 What I want to know is which numbers are missing from 1 to 13 (in this case 13 is last number in column 1). My real... (17 Replies)
Discussion started by: Ophiuchus
17 Replies

5. Shell Programming and Scripting

Adding the corresponding values for every 5th consecutive numbers

Dear All, I have a file which is as follows: Input File: 231 100.1 233 99 235 200.9 238 80.1 239 90.2 240 77.0 243 99.5 245 16.20 246 13.55 247 11.8 249 13.7 250 99.6 (1 Reply)
Discussion started by: NamS
1 Replies

6. Shell Programming and Scripting

Disruption of consecutive numbers

I do have a tab delimited file with the following format 200 46 201 67 204 89 205 98 206 89 208 890 210 23 .. ... 100's of rows I would like to output the missing consecutive number of the first column. The expected output will be: (1 Reply)
Discussion started by: Lucky Ali
1 Replies

7. Shell Programming and Scripting

Print consecutive numbers in column2

Hi, I have an input file of the following style input.txt The 4000 at the end indicates the total no. of columns in that row. I would like to replace all -1s with consecutive 1 and 2 and print the whole line again. So, the output would be output.txt Thanks in advance. (7 Replies)
Discussion started by: jacobs.smith
7 Replies

8. Shell Programming and Scripting

Insert missing field using perl,sed,awk

sample file (comma as field separators) MessageFlow,1,BusIntBatchMgr,a OOBEvent,1,BusIntBatchMgr,a TaskEvents,1,,a MTTrace,1,,a MTWarning,,1,a MessageFlow,1,Batch,a OOBEvent,1,Batch,a TaskEvents,1,,a EAISAPIdocWizard,1,BusIntMgr,a EAISAPBAPIWizard,1,BusIntMgr,a... (3 Replies)
Discussion started by: vrclm
3 Replies

9. Shell Programming and Scripting

Script in SED and AWK so that it treats consecutive delimiters as one

Hi All, I am trying to cut to do a cut operation, but since there are seems to be more than one deltimiters in some occasion I am not able to get the exact field. Can you please provide an SED and AWK script for treating the source file in such a way that all consecutive delimiters are treated... (3 Replies)
Discussion started by: rakesh.su30
3 Replies

10. Shell Programming and Scripting

ksh/awk help - output missing numbers

Here is what I am trying to do: I have a list of numbers that I pulled from an awk command in a column like so: 1 3 4 7 8 I want to find which numbers in the list are missing out of a range. So let's say I want to find out from the list above which numbers are missing from the... (6 Replies)
Discussion started by: afavis
6 Replies
Login or Register to Ask a Question