Keep only the closet match of timestamped row (include headers) from file1 to precede file2 row/s

08-13-2016

Registered User

29, 0

Join Date: Jul 2016

Last Activity: 14 July 2017, 9:57 AM EDT

Posts: 29

Thanks Given: 6

Thanked 0 Times in 0 Posts

I'm trying to remote to my work PC at the moment and having connection issues.

As far as my files, I found out that it seems to work fine if file1 is smaller than file2, but if file1 is larger than file2, then it misses the last match of the files (like described above).

In my situation, there will be times when file1 is larger than file2 and situations where file2 is larger, so I need it to be adaptable to varying sizes. Thanks!

aachave1

View Public Profile for aachave1

Find all posts by aachave1

08-14-2016

Registered User

446, 232

Join Date: May 2016

Last Activity: 12 May 2020, 4:52 AM EDT

Posts: 446

Thanks Given: 51

Thanked 232 Times in 163 Posts

Here another possibillity, a bit more stream oriented ...

Code:

perl -MTime::Local -pe '
BEGIN{$fc=1}
s#(([0-9]{4})/([0-9]{2})/([0-9]{2}) ([0-9]{2}):([0-9]{2}):([0-9]{2}))#timelocal($7,$6,$5,$4,$3,$2).",$fc,$1"#e; 
$fc++ if eof;' FILE1 FILE2 | \
\
sort -n | \
\
awk -F, 'FNR<3{print $0;next} 
SRC_OLD==1 && SRC_OLD!=$2 {print LINE_OLD} 
$2==2 {print $0} {SRC_OLD=$2;LINE_OLD=$0}' | \
\
awk -F, 'BEGIN{OFS="\n"} 
FNR==1{H2=$0;next} 
FNR==2{H1=$0;next}  
{FID=$2;sub(/^([^,]+,){2}/,"")} 
FID==1{print H1,$0,H2;next} 
{print}'

Explanation of the steps:

Use Perl to add timestamps as epoch and filenumber at beginning of each line
sort lines using epoch and file nr (Headers stay first, If The Headers change this may break)
remove duplicate file1 lines and keep only the newest
detect the headers and insert them before and after every file1 line and print out the data without the additional fields

...and a bit optimized too(no need to have 2 awk calls)...

Code:

perl -MTime::Local -pe '
BEGIN{$fc=1}
s#(([0-9]{4})/([0-9]{2})/([0-9]{2}) ([0-9]{2}):([0-9]{2}):([0-9]{2}))#timelocal($7,$6,$5,$4,$3,$2).",$fc,$1"#e;
$fc++ if eof;' FILE1 FILE2 | \
\
sort -n  | \
\
awk -F, 'BEGIN{OFS="\n"}
FNR==1{H2=$0;next}
FNR==2{H1=$0;next}
{FID=$2;sub(/^([^,]+,){2}/,"")}
FID_OLD==1 && FID!=1 {print H1,LINE_OLD,H2}
FID==2 {print $0}
{FID_OLD=FID;LINE_OLD=$0}'

Last edited by stomp; 08-14-2016 at 12:50 PM.. Reason: Compacted Code + separated the different steps

This User Gave Thanks to stomp For This Post:

stomp

View Public Profile for stomp

Find all posts by stomp

08-14-2016

Registered User

29, 0

Join Date: Jul 2016

Last Activity: 14 July 2017, 9:57 AM EDT

Posts: 29

Thanks Given: 6

Thanked 0 Times in 0 Posts

Thanks Stomp for all these options! This latest script above works with the proper matching of data, however, if file2 is larger than file1, The headers are opposite (file1 headers are with file2 data and vice versa). It works fine though if file1 is larger than file2.

I will try to fix code so that it adapts to either file being larger/smaller than the other. I will reply if I have any luck

aachave1

View Public Profile for aachave1

Find all posts by aachave1

08-14-2016

Registered User

446, 232

Join Date: May 2016

Last Activity: 12 May 2020, 4:52 AM EDT

Posts: 446

Thanks Given: 51

Thanked 232 Times in 163 Posts

After finally checking out your original try, I'm realizing that creating an epoch timestamp is absolutely not necessary here. The date given in file1/file2 is sortable without transformation.

Quote:

I will try to fix code so that it adapts to either file being larger/smaller than the other.

Ok. I leave some of the fun for you. But it has nothing to do with either file being larger or smaller. It is probably point 2 of the list of my last post

Last edited by stomp; 08-14-2016 at 04:38 PM..

This User Gave Thanks to stomp For This Post:

stomp

View Public Profile for stomp

Find all posts by stomp

08-14-2016

Registered User

29, 0

Join Date: Jul 2016

Last Activity: 14 July 2017, 9:57 AM EDT

Posts: 29

Thanks Given: 6

Thanked 0 Times in 0 Posts

Okay, but why does it only have issues when one file is larger than the other, but works fine the other way around? This is the case for both sets of code that you had. Any hints as to why?

Where can I actually attach my "real" files to this forum?

I will try a few things now. Thanks!

aachave1

View Public Profile for aachave1

Find all posts by aachave1

08-14-2016

Registered User

446, 232

Join Date: May 2016

Last Activity: 12 May 2020, 4:52 AM EDT

Posts: 446

Thanks Given: 51

Thanked 232 Times in 163 Posts

Quote:

Okay, but why does it only have issues when one file is larger than the other

I suppose that's coincidence.

Code:

TIMEFORMATTED,CCSDS_VERSION,CCSDS_TYPE,CCSDS_2HDR_FLAG,CCSDS_APID,CCSDS_GRP_FLAGS,CCSDS_SEQ_COUNT,CCSDS_PKT_LEN,CCSDS_DOY,CCSDS_MSEC
TIMEFORMATTED,G_CCSDS_VERSION,G_CCSDS_TYPE,G_CCSDS_2HDR_FLAG,G_CCSDS_APID,G_CCSDS_GRP_FLAGS,G_CCSDS_SEQ_COUNT,G_CCSDS_PKT_LEN,G_CCSDS_DOY,G_CCSDS_MSEC
1399471375,1,2014/04/07 16:02:55,0,0,1,572,3,0,1917,20550,57775339
1399471380,1,2014/04/07 16:03:00,0,0,1,572,3,0,1917,20550,57780339
1399471385,1,2014/04/07 16:03:05,0,0,1,572,3,0,1917,20550,57785339

If you sort the above with numeric sort which I chose(sort -n), the numeric value of both of the first two lines is 0. Then a fallback of a string sort is used and hereby the header line of FILE2(CCSDS_VERSION) is smaller than the header line of FILE1(G_CCSDS_VERSION) which is decided at the colored character(C<G).If the C Character is something greater than G (H,I,....), the header lines are wrongly switched.

The rest of the file is correctly sorted because my generated epoch timestamps should exactly be sorted numerically.

You can verify that, if you for example insert an "A" at the beginning of the second field in your problematic file 2 in the first header line. Then the output should be correct.

If my diagnosis is correct, the obvious question is: What can be done about this error here?

---

You can attach files to your post in "Advanced mode".

Last edited by stomp; 08-14-2016 at 06:37 PM..

stomp

View Public Profile for stomp

Find all posts by stomp

08-14-2016

Registered User

29, 0

Join Date: Jul 2016

Last Activity: 14 July 2017, 9:57 AM EDT

Posts: 29

Thanks Given: 6

Thanked 0 Times in 0 Posts

Yeah, I think it may be a coincidence (file size) due to my headers. And yes, the epoch is probably unnecessary, but I'm not that familiar with perl time manipulation. To remove the epoch conversion below do I just remove the "red" portion?

Code:

perl -MTime::Local -pe '
BEGIN{$fc=1}
s#(([0-9]{4})/([0-9]{2})/([0-9]{2}) ([0-9]{2}):([0-9]{2}):([0-9]{2}))#timelocal($7,$6,$5,$4,$3,$2).",$fc,$1"#e;
$fc++ if eof;' FILE1 FILE2 | \
\
sort -n  | \
\
awk -F, 'BEGIN{OFS="\n"}
FNR==1{H2=$0;next}
FNR==2{H1=$0;next}
{FID=$2;sub(/^([^,]+,){2}/,"")}
FID_OLD==1 && FID!=1 {print H1,LINE_OLD,H2}
FID==2 {print $0}
{FID_OLD=FID;LINE_OLD=$0}'

Also, my headers are a little different from some files. some have 3 more headers added after timeformatted column.

aachave1

View Public Profile for aachave1

Find all posts by aachave1

UNIX for Beginners Questions & Answers

Keep only the closet match of timestamped row (include headers) from file1 to precede file2 row/s

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Keep only the closet match of timestamped row (include headers) from file1 to precede file2 row/s

Discussion started by: aachave1

2. Shell Programming and Scripting

awk to search field2 in file2 using range of fields file1 and using match to another field in file1

Discussion started by: cmccabe

3. Shell Programming and Scripting

Reading and appending a row from file1 to file2 using awk or sed

Discussion started by: ida1215

4. Shell Programming and Scripting

Print sequences from file2 based on match to, AND in same order as, file1

Discussion started by: pathunkathunk

5. Shell Programming and Scripting

Match single line in file1 to groups of lines in file2

Discussion started by: pathunkathunk

6. Shell Programming and Scripting

Get row number from file1 and print that row of file2

Discussion started by: Abhiraj Singh

7. Shell Programming and Scripting

Match part of string in file2 based on column in file1

Discussion started by: phoebus

8. UNIX for Dummies Questions & Answers

if matching strings in file1 and file2, add column from file1 to file2

Discussion started by: pathunkathunk

9. Shell Programming and Scripting

Match one column of file1 with that of file2

Discussion started by: polsum

10. Shell Programming and Scripting

match value from file1 in file2

Discussion started by: myguess21