How to combine and insert missing consecutive numbers

05-19-2013

Registered User

295, 6

Join Date: May 2009

Last Activity: 7 May 2020, 5:18 PM EDT

Posts: 295

Thanks Given: 62

Thanked 6 Times in 6 Posts

How to combine and insert missing consecutive numbers - awk or script?

Hi all,

I have two (2) sets of files that are based on some snapshots of database that I want to merge and insert any missing sequential number.

Below are example representation of these files:

Code:

 
file1:
DATE          TIME    COL1    COL2  COL3  COL4   ID
01/10/2013    0800    100     200   300   401    112
01/31/2013    1000    201     123   345   456    107
03/05/2013    1100    150     789   311   789    109
02/15/2013    1500    199     456   234   555    105
 
file2:
DATE          TIME    COL1    COL2  COL3  COL4   ID
02/10/2013    0800    100     200   300   401    115
07/31/2013    1000    201     123   345   456    111
08/05/2013    1100    150     789   311   789    108
12/15/2013    1500    199     456   234   555    101
 
Desired output
DATE          TIME    COL1    COL2  COL3  COL4   ID
12/15/2013    1500    199     456   234   555    101
----------    ----    ----    ----  ----  ----   102
----------    ----    ----    ----  ----  ----   103
----------    ----    ----    ----  ----  ----   104
02/15/2013    1500    199     456   234   555    105
----------    ----    ----    ----  ----  ----   106
01/31/2013    1000    201     123   345   456    107
08/05/2013    1100    150     789   311   789    108
03/05/2013    1100    150     789   311   789    109
----------    ----    ----    ----  ----  ----   110
07/31/2013    1000    201     123   345   456    111
01/10/2013    0800    100     200   300   401    112
----------    ----    ----    ----  ----  ----   113
----------    ----    ----    ----  ----  ----   114
02/10/2013    0800    100     200   300   401    115

I did a sort -n -k7 file1 > file1.a and sort -n -k7 file2 > file2.a

Code:

 
$:>  cat file1
DATE          TIME    COL1    COL2  COL3  COL4   ID
01/10/2013    0800    100     200   300   401    112
01/31/2013    1000    201     123   345   456    107
03/05/2013    1100    150     789   311   789    109
02/15/2013    1500    199     456   234   555    105
$:>  cat file2
DATE          TIME    COL1    COL2  COL3  COL4   ID
02/10/2013    0800    100     200   300   401    115
07/31/2013    1000    201     123   345   456    111
08/05/2013    1100    150     789   311   789    108
12/15/2013    1500    199     456   234   555    101
$:>  sort -n -k7 file1 > file1.a
$:>  sort -n -k7 file2 > file2.a
$:>  cat file1.a
DATE          TIME    COL1    COL2  COL3  COL4   ID
02/15/2013    1500    199     456   234   555    105
01/31/2013    1000    201     123   345   456    107
03/05/2013    1100    150     789   311   789    109
01/10/2013    0800    100     200   300   401    112
$:>  cat file2.a
DATE          TIME    COL1    COL2  COL3  COL4   ID
12/15/2013    1500    199     456   234   555    101
08/05/2013    1100    150     789   311   789    108
07/31/2013    1000    201     123   345   456    111
02/10/2013    0800    100     200   300   401    115
$:>  comm -3 file1.a file2.a
02/15/2013    1500    199     456   234   555    105
01/31/2013    1000    201     123   345   456    107
03/05/2013    1100    150     789   311   789    109
01/10/2013    0800    100     200   300   401    112
        12/15/2013    1500    199     456   234   555    101
        08/05/2013    1100    150     789   311   789    108
        07/31/2013    1000    201     123   345   456    111
        02/10/2013    0800    100     200   300   401    115
$:>  comm -3 file1.a file2.a | awk '{ print $1,$2,$3,$4,$5,$6,$7 }' | sort -n -k 7 > file3.a
$:>  cat file3.a
12/15/2013 1500 199 456 234 555 101
02/15/2013 1500 199 456 234 555 105
01/31/2013 1000 201 123 345 456 107
08/05/2013 1100 150 789 311 789 108
03/05/2013 1100 150 789 311 789 109
07/31/2013 1000 201 123 345 456 111
01/10/2013 0800 100 200 300 401 112
02/10/2013 0800 100 200 300 401 115

Not sure why comm -3 file1.a file2.a give the spaces so I had to do the comm | awk thing instead.

Now I need to include the missing number sequence, can this be done via awk or does it has to be scripted, i.e. using head -1 to get the starting ID number and tail -1 to get the ending ID number and then check for the missing ID number in the sequence?

Perhaps there is a more intelligent way of doing what am trying to do?

Any advise much appreciated. Thanks in advance.

newbie_01

View Public Profile for newbie_01

Find all posts by newbie_01

05-19-2013

Registered User

3,231, 978

Join Date: Dec 2009

Last Activity: 11 June 2014, 8:40 PM EDT

Posts: 3,231

Thanks Given: 179

Thanked 978 Times in 791 Posts

You don't need comm nor multiple sort commands. Just one sort with both files as arguments whose output is piped directly into awk.

Within awk, you would have to track the value of the last field, $NF, to determine when to generate the missing lines.

Regards,
Alister

alister

View Public Profile for alister

Find all posts by alister

05-20-2013

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

Code:

awk 'FNR==1 {next} length($0)' file1 file2 |sort -n -k7 > tmp
awk 'BEGIN{dummy="--------      ------  ----    ----  ----  ----  "; n=0}
     FNR==1 {n=$(NF); 
             print "DATE          TIME    COL1    COL2  COL3  COL4   ID"
             print $0
             next}
            { while(++n < $(NF)) {print dummy, n}
              print $0
            } ' tmp > newfile

output:

Code:

$ cat newfile

DATE          TIME    COL1    COL2  COL3  COL4   ID
12/15/2013    1500    199     456   234   555    101
--------      ------  ----    ----  ----  ----   102
--------      ------  ----    ----  ----  ----   103
--------      ------  ----    ----  ----  ----   104
02/15/2013    1500    199     456   234   555    105
--------      ------  ----    ----  ----  ----   106
01/31/2013    1000    201     123   345   456    107
08/05/2013    1100    150     789   311   789    108
03/05/2013    1100    150     789   311   789    109
--------      ------  ----    ----  ----  ----   110
07/31/2013    1000    201     123   345   456    111
01/10/2013    0800    100     200   300   401    112
--------      ------  ----    ----  ----  ----   113
--------      ------  ----    ----  ----  ----   114
02/10/2013    0800    100     200   300   401    115

This User Gave Thanks to jim mcnamara For This Post:

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

05-21-2013

Registered User

295, 6

Join Date: May 2009

Last Activity: 7 May 2020, 5:18 PM EDT

Posts: 295

Thanks Given: 62

Thanked 6 Times in 6 Posts

Hi Jim,

Thanks a lot.

I tried yours and it works exactly the way I wanted it to. It was giving errors until I used nawk and it comes out exactly like the way you have on the screen.

---------- Post updated at 01:25 AM ---------- Previous update was at 01:11 AM ----------

Quote:

Originally Posted by jim mcnamara

Code:

awk 'FNR==1 {next} length($0)' file1 file2 |sort -n -k7 > tmp
awk 'BEGIN{dummy="--------      ------  ----    ----  ----  ----  "; n=0}
     FNR==1 {n=$(NF); 
             print "DATE          TIME    COL1    COL2  COL3  COL4   ID"
             print $0
             next}
            { while(++n < $(NF)) {print dummy, n}
              print $0
            } ' tmp > newfile

output:

Code:

 
 
$ cat newfile
 
DATE          TIME    COL1    COL2  COL3  COL4   ID
12/15/2013    1500    199     456   234   555    101
--------      ------  ----    ----  ----  ----   102
--------      ------  ----    ----  ----  ----   103
--------      ------  ----    ----  ----  ----   104
02/15/2013    1500    199     456   234   555    105
--------      ------  ----    ----  ----  ----   106
01/31/2013    1000    201     123   345   456    107
08/05/2013    1100    150     789   311   789    108
03/05/2013    1100    150     789   311   789    109
--------      ------  ----    ----  ----  ----   110
07/31/2013    1000    201     123   345   456    111
01/10/2013    0800    100     200   300   401    112
--------      ------  ----    ----  ----  ----   113
--------      ------  ----    ----  ----  ----   114
02/10/2013    0800    100     200   300   401    115

Hi Jim,

- Just curious, what is the purpose of length($0)?

---------- Post updated at 01:29 AM ---------- Previous update was at 01:25 AM ----------

Quote:

Originally Posted by alister

Thanks Alister, didn't realize I can do sort file1 file2

I've tried Jim's suggestion and it works fine, albeit with nawk. Just trying to see how I can get it to work usinig just awk this time 'coz some of the servers has only awk.

newbie_01

View Public Profile for newbie_01

Find all posts by newbie_01

UNIX for Dummies Questions & Answers

How to combine and insert missing consecutive numbers - awk or script?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Check/print missing number in a consecutive range and remove duplicate numbers

Discussion started by: newbie_01

2. Shell Programming and Scripting

awk to insert missing string based on pattern in file

Discussion started by: cmccabe

3. UNIX for Dummies Questions & Answers

Sum every 3 consecutive numbers in a column

Discussion started by: NamS

4. Shell Programming and Scripting

Help with awk script to get missing numbers in column 1

Discussion started by: Ophiuchus

5. Shell Programming and Scripting

Adding the corresponding values for every 5th consecutive numbers

Discussion started by: NamS

6. Shell Programming and Scripting

Disruption of consecutive numbers

Discussion started by: Lucky Ali

7. Shell Programming and Scripting

Print consecutive numbers in column2

Discussion started by: jacobs.smith

8. Shell Programming and Scripting

Insert missing field using perl,sed,awk

Discussion started by: vrclm

9. Shell Programming and Scripting

Script in SED and AWK so that it treats consecutive delimiters as one

Discussion started by: rakesh.su30

10. Shell Programming and Scripting

ksh/awk help - output missing numbers

Discussion started by: afavis