Keep only the closet match of timestamped row (include headers) from file1 to precede file2 row/s

08-11-2016

Registered User

446, 232

Join Date: May 2016

Last Activity: 12 May 2020, 4:52 AM EDT

Posts: 446

Thanks Given: 51

Thanked 232 Times in 163 Posts

Quote:

Is this a specific scenario question, because my examples are different?

I'm just asking. If what I'm asking will never happen, just say it.

What I'm making out of your response is that the output file may contain exactly one or more of this group of lines:

Code:

G_HEADERF1_FIELD1,G_HEADERF1_FIELD2,....
F1_FIELD1,F1_FIELD2,...
HEADERF2_FIELD1,HEADERF2_FIELD2,...
F2_FIELD1,F2_FIELD2,...
F2_FIELD1,F2_FIELD2,...

Now that the task defined, a solution will come shortly

stomp

View Public Profile for stomp

Find all posts by stomp

08-11-2016

Registered User

29, 0

Join Date: Jul 2016

Last Activity: 14 July 2017, 9:57 AM EDT

Posts: 29

Thanks Given: 6

Thanked 0 Times in 0 Posts

Here a simple example with pseudo file data where the column 1 is the virtual timestamp. file1 time is every 5 seconds and file2 time is every 1 second.

Code:

File1:

Header1
1
5
10
15
20

File2:

Header2
1
2
3
4
5
6
7
8
9
10
11
12

Desired  Output:

Header1
1
Header2
1
2
3
4

Header1
5
Header2
5
6
7
8
9

Header1
10
Header2
10
11
12

aachave1

View Public Profile for aachave1

Find all posts by aachave1

08-11-2016

Registered User

446, 232

Join Date: May 2016

Last Activity: 12 May 2020, 4:52 AM EDT

Posts: 446

Thanks Given: 51

Thanked 232 Times in 163 Posts

4 is closer to 5 than to 1. Is 1 nevertheless the correct line for 4?

stomp

View Public Profile for stomp

Find all posts by stomp

08-11-2016

Registered User

29, 0

Join Date: Jul 2016

Last Activity: 14 July 2017, 9:57 AM EDT

Posts: 29

Thanks Given: 6

Thanked 0 Times in 0 Posts

"4 is closer to 5 than to 1. Is 1 nevertheless the correct line for 4?"

Yes, 4 is closer to 5, but it has to be the nearest preceding file1 timestamp to file2's timestamp - not the timestamp after. So yes, 1 timestamp line is the correct one for 4.

---------- Post updated at 06:16 PM ---------- Previous update was at 05:52 PM ----------

More examples: Sometimes the timestamps will be different. For example below.

Code:

File1:

Header1
1
1.2.
1.4
1.6
1.8
2
2.2

Flie2:

Header2
1.1
1.3
1.5
1.7
1.9
2.1
2.3


Output would be:

Header1
1
Header2
1.1
1.3
1.5
1.7
1.9

Header1
2
Header2
2.1
2.3

Last edited by aachave1; 08-11-2016 at 07:32 PM.. Reason: more examples

aachave1

View Public Profile for aachave1

Find all posts by aachave1

08-11-2016

Registered User

446, 232

Join Date: May 2016

Last Activity: 12 May 2020, 4:52 AM EDT

Posts: 446

Thanks Given: 51

Thanked 232 Times in 163 Posts

I hadn't looked at your solution yet. When thinking about it, awk may not be such a good solution here, as it can not read 2 files simultaneously(AFAIK) but only sequentially. So you have to store the data of one file completely into memory. Maybe the amount of data is small, then this is irrelevant.

With a scripting language you can read as needed from either file.

Here's a try in perl:

Code:

#!/usr/bin/env perl

use Time::Local;

open($f1,"<",$ARGV[0]);
open($f2,"<",$ARGV[1]);

our $last_printed_data1;

sub read_file {
        $fh=shift;
        my $data = <$fh>;
        my ($y,$m,$d,$H,$M,$S) = $data =~ m#([0-9]{4})/([0-9]{2})/([0-9]{2}) ([0-9]{2}):([0-9]{2}):([0-9]{2})#;
        return timelocal($S,$M,$H,$d,$m,$y), $data if($y);
}

sub out {
        my ($header1, $data1, $header2, $data2) = @_;
        if($last_printed_data1 eq $data1) {
                print("$data2");
        } else {
                print($header1,$data1,$header2,$data2);
                $last_printed_data1 = $data1;
        }
}

$header1=<$f1>; ($time1, $data1)=read_file($f1);
$header2=<$f2>; ($time2, $data2)=read_file($f2);

while(!eof($f2)) {

        if($time1==$time2) {
                out($header1,$data1,$header2,$data2);
                ($time2, $data2) = read_file($f2);
        }elsif($time1>$time2) {
                if($time1_old) {
                        out($header1,$data1_old,$header2,$data2);
                        ($time2, $data2) = read_file($f2);
                } else {
                        out("No preceding f1-value\n","",$header2,$data2);
                        ($time2, $data2) = read_file($f2);
                }
        } else {
                if(eof($f1)) {
                        out($header1,$data1,$header2,$data2);
                        ($time2, $data2) = read_file($f2);
                } else {
                        $time1_old = $time1;
                        $data1_old = $data1;
                        ($time1, $data1) = read_file($f1);
                }
        }
}
out($header1,$data1,$header2,$data2);

Use it like this:

Code:

./prog.pl file1 file2

With your data file1...

Code:

TIMEFORMATTED,G_CCSDS_VERSION,G_CCSDS_TYPE,G_CCSDS_2HDR_FLAG,G_CCSDS_APID,G_CCSDS_GRP_FLAGS,G_CCSDS_SEQ_COUNT,G_CCSDS_PKT_LEN,G_CCSDS_DOY,G_CCSDS_MSEC
2014/04/07 16:02:55,0,0,1,572,3,0,1917,20550,57775339
2014/04/07 16:03:00,0,0,1,572,3,0,1917,20550,57780339
2014/04/07 16:03:05,0,0,1,572,3,0,1917,20550,57785339
2014/04/07 16:03:10,0,0,1,572,3,0,1917,20550,57790339
2014/04/07 16:03:15,0,0,1,572,3,0,1917,20550,57795339

and file2 ...

Code:

TIMEFORMATTED,CCSDS_VERSION,CCSDS_TYPE,CCSDS_2HDR_FLAG,CCSDS_APID,CCSDS_GRP_FLAGS,CCSDS_SEQ_COUNT,CCSDS_PKT_LEN,CCSDS_DOY,CCSDS_MSEC
2014/04/07 16:03:12,0,0,1,544,3,0,985,20550,57788894
2014/04/07 16:03:13,0,0,1,544,3,0,985,20550,57793894
2014/04/07 16:03:14,0,0,1,544,3,0,985,20550,57794894
2014/04/07 16:03:15,0,0,1,544,3,0,985,20550,57795894
2014/04/07 16:03:16,0,0,1,544,3,0,985,20550,57796894
2014/04/07 16:03:17,0,0,1,544,3,0,985,20550,57797894

the output is...

Code:

TIMEFORMATTED,G_CCSDS_VERSION,G_CCSDS_TYPE,G_CCSDS_2HDR_FLAG,G_CCSDS_APID,G_CCSDS_GRP_FLAGS,G_CCSDS_SEQ_COUNT,G_CCSDS_PKT_LEN,G_CCSDS_DOY,G_CCSDS_MSEC
2014/04/07 16:03:10,0,0,1,572,3,0,1917,20550,57790339
TIMEFORMATTED,CCSDS_VERSION,CCSDS_TYPE,CCSDS_2HDR_FLAG,CCSDS_APID,CCSDS_GRP_FLAGS,CCSDS_SEQ_COUNT,CCSDS_PKT_LEN,CCSDS_DOY,CCSDS_MSEC
2014/04/07 16:03:12,0,0,1,544,3,0,985,20550,57788894
2014/04/07 16:03:13,0,0,1,544,3,0,985,20550,57793894
2014/04/07 16:03:14,0,0,1,544,3,0,985,20550,57794894
TIMEFORMATTED,G_CCSDS_VERSION,G_CCSDS_TYPE,G_CCSDS_2HDR_FLAG,G_CCSDS_APID,G_CCSDS_GRP_FLAGS,G_CCSDS_SEQ_COUNT,G_CCSDS_PKT_LEN,G_CCSDS_DOY,G_CCSDS_MSEC
2014/04/07 16:03:15,0,0,1,572,3,0,1917,20550,57795339
TIMEFORMATTED,CCSDS_VERSION,CCSDS_TYPE,CCSDS_2HDR_FLAG,CCSDS_APID,CCSDS_GRP_FLAGS,CCSDS_SEQ_COUNT,CCSDS_PKT_LEN,CCSDS_DOY,CCSDS_MSEC
2014/04/07 16:03:15,0,0,1,544,3,0,985,20550,57795894
2014/04/07 16:03:16,0,0,1,544,3,0,985,20550,57796894
2014/04/07 16:03:17,0,0,1,544,3,0,985,20550,57797894

Last edited by stomp; 08-11-2016 at 10:56 PM..

stomp

View Public Profile for stomp

Find all posts by stomp

08-12-2016

Registered User

29, 0

Join Date: Jul 2016

Last Activity: 14 July 2017, 9:57 AM EDT

Posts: 29

Thanks Given: 6

Thanked 0 Times in 0 Posts

Stomp, that seemed so close to working with two of my "real" files, except it missed the very last match in the output file. For example, the file1 timestamp "16:31:20" matches the file2 "16:31:20" just fine, however the file1 time "16:31:25" is missing right above the file2 timestamp "16:31:25" (in red).

I verified that file1 does have a "16:31:25" row, but it was left out.

Code:

TIMEFORMATTED,G_CCSDS_VERSION,G_CCSDS_TYPE,G_CCSDS_2HDR_FLAG,G_CCSDS_APID,G_CCSDS_GRP_FLAGS,G_CCSDS_SEQ_COUNT,G_CCSDS_PKT_LEN,G_CCSDS_DOY,G_CCSDS_MSEC
2014/04/07 16:31:20,0,0,1,572,3,0,1917,20550,57795339
TIMEFORMATTED,CCSDS_VERSION,CCSDS_TYPE,CCSDS_2HDR_FLAG,CCSDS_APID,CCSDS_GRP_FLAGS,CCSDS_SEQ_COUNT,CCSDS_PKT_LEN,CCSDS_DOY,CCSDS_MSEC
2014/04/07 16:31:20,0,0,1,544,3,0,985,20550,57795894
2014/04/07 16:31:25,0,0,1,544,3,0,985,20550,57796894

Also, I know you don't have my "real" files, but it would be more accurate sorting off of column 21 (not shown in these snippet examples) of my files since this is a msec column. How would I sort off of column 21 of both of my files in your code? I did it with my awk code above, but not sure how to do it with your pearl code.

Thank you!

Last edited by aachave1; 08-12-2016 at 02:47 AM..

aachave1

View Public Profile for aachave1

Find all posts by aachave1

08-13-2016

Registered User

446, 232

Join Date: May 2016

Last Activity: 12 May 2020, 4:52 AM EDT

Posts: 446

Thanks Given: 51

Thanked 232 Times in 163 Posts

If you show your data files which aren't working, I may take a look.

stomp

View Public Profile for stomp

Find all posts by stomp

UNIX for Beginners Questions & Answers

Keep only the closet match of timestamped row (include headers) from file1 to precede file2 row/s

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Keep only the closet match of timestamped row (include headers) from file1 to precede file2 row/s

Discussion started by: aachave1

2. Shell Programming and Scripting

awk to search field2 in file2 using range of fields file1 and using match to another field in file1

Discussion started by: cmccabe

3. Shell Programming and Scripting

Reading and appending a row from file1 to file2 using awk or sed

Discussion started by: ida1215

4. Shell Programming and Scripting

Print sequences from file2 based on match to, AND in same order as, file1

Discussion started by: pathunkathunk

5. Shell Programming and Scripting

Match single line in file1 to groups of lines in file2

Discussion started by: pathunkathunk

6. Shell Programming and Scripting

Get row number from file1 and print that row of file2

Discussion started by: Abhiraj Singh

7. Shell Programming and Scripting

Match part of string in file2 based on column in file1

Discussion started by: phoebus

8. UNIX for Dummies Questions & Answers

if matching strings in file1 and file2, add column from file1 to file2

Discussion started by: pathunkathunk

9. Shell Programming and Scripting

Match one column of file1 with that of file2

Discussion started by: polsum

10. Shell Programming and Scripting

match value from file1 in file2

Discussion started by: myguess21