Keep only the closet match of timestamped row (include headers) from file1 to precede file2 row/s

 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Keep only the closet match of timestamped row (include headers) from file1 to precede file2 row/s
# 15  
Old 08-13-2016
I'm trying to remote to my work PC at the moment and having connection issues.

As far as my files, I found out that it seems to work fine if file1 is smaller than file2, but if file1 is larger than file2, then it misses the last match of the files (like described above).

In my situation, there will be times when file1 is larger than file2 and situations where file2 is larger, so I need it to be adaptable to varying sizes. Thanks!
# 16  
Old 08-14-2016
Here another possibillity, a bit more stream oriented ...
Code:
perl -MTime::Local -pe '
BEGIN{$fc=1}
s#(([0-9]{4})/([0-9]{2})/([0-9]{2}) ([0-9]{2}):([0-9]{2}):([0-9]{2}))#timelocal($7,$6,$5,$4,$3,$2).",$fc,$1"#e; 
$fc++ if eof;' FILE1 FILE2 | \
\
sort -n | \
\
awk -F, 'FNR<3{print $0;next} 
SRC_OLD==1 && SRC_OLD!=$2 {print LINE_OLD} 
$2==2 {print $0} {SRC_OLD=$2;LINE_OLD=$0}' | \
\
awk -F, 'BEGIN{OFS="\n"} 
FNR==1{H2=$0;next} 
FNR==2{H1=$0;next}  
{FID=$2;sub(/^([^,]+,){2}/,"")} 
FID==1{print H1,$0,H2;next} 
{print}'

Explanation of the steps:
  1. Use Perl to add timestamps as epoch and filenumber at beginning of each line
  2. sort lines using epoch and file nr (Headers stay first, If The Headers change this may break)
  3. remove duplicate file1 lines and keep only the newest
  4. detect the headers and insert them before and after every file1 line and print out the data without the additional fields
...and a bit optimized too(no need to have 2 awk calls)...


Code:
perl -MTime::Local -pe '
BEGIN{$fc=1}
s#(([0-9]{4})/([0-9]{2})/([0-9]{2}) ([0-9]{2}):([0-9]{2}):([0-9]{2}))#timelocal($7,$6,$5,$4,$3,$2).",$fc,$1"#e;
$fc++ if eof;' FILE1 FILE2 | \
\
sort -n  | \
\
awk -F, 'BEGIN{OFS="\n"}
FNR==1{H2=$0;next}
FNR==2{H1=$0;next}
{FID=$2;sub(/^([^,]+,){2}/,"")}
FID_OLD==1 && FID!=1 {print H1,LINE_OLD,H2}
FID==2 {print $0}
{FID_OLD=FID;LINE_OLD=$0}'


Last edited by stomp; 08-14-2016 at 12:50 PM.. Reason: Compacted Code + separated the different steps
This User Gave Thanks to stomp For This Post:
# 17  
Old 08-14-2016
Thanks Stomp for all these options! This latest script above works with the proper matching of data, however, if file2 is larger than file1, The headers are opposite (file1 headers are with file2 data and vice versa). It works fine though if file1 is larger than file2.

I will try to fix code so that it adapts to either file being larger/smaller than the other. I will reply if I have any luckSmilie
# 18  
Old 08-14-2016
After finally checking out your original try, I'm realizing that creating an epoch timestamp is absolutely not necessary here. The date given in file1/file2 is sortable without transformation.

Quote:
I will try to fix code so that it adapts to either file being larger/smaller than the other.
Ok. I leave some of the fun for you. But it has nothing to do with either file being larger or smaller. It is probably point 2 of the list of my last post

Last edited by stomp; 08-14-2016 at 04:38 PM..
This User Gave Thanks to stomp For This Post:
# 19  
Old 08-14-2016
Okay, but why does it only have issues when one file is larger than the other, but works fine the other way around? This is the case for both sets of code that you had. Any hints as to why?

Where can I actually attach my "real" files to this forum?

I will try a few things now. Thanks!
# 20  
Old 08-14-2016
Quote:
Okay, but why does it only have issues when one file is larger than the other
I suppose that's coincidence.

Code:
TIMEFORMATTED,CCSDS_VERSION,CCSDS_TYPE,CCSDS_2HDR_FLAG,CCSDS_APID,CCSDS_GRP_FLAGS,CCSDS_SEQ_COUNT,CCSDS_PKT_LEN,CCSDS_DOY,CCSDS_MSEC
TIMEFORMATTED,G_CCSDS_VERSION,G_CCSDS_TYPE,G_CCSDS_2HDR_FLAG,G_CCSDS_APID,G_CCSDS_GRP_FLAGS,G_CCSDS_SEQ_COUNT,G_CCSDS_PKT_LEN,G_CCSDS_DOY,G_CCSDS_MSEC
1399471375,1,2014/04/07 16:02:55,0,0,1,572,3,0,1917,20550,57775339
1399471380,1,2014/04/07 16:03:00,0,0,1,572,3,0,1917,20550,57780339
1399471385,1,2014/04/07 16:03:05,0,0,1,572,3,0,1917,20550,57785339

If you sort the above with numeric sort which I chose(sort -n), the numeric value of both of the first two lines is 0. Then a fallback of a string sort is used and hereby the header line of FILE2(CCSDS_VERSION) is smaller than the header line of FILE1(G_CCSDS_VERSION) which is decided at the colored character(C<G).If the C Character is something greater than G (H,I,....), the header lines are wrongly switched.

The rest of the file is correctly sorted because my generated epoch timestamps should exactly be sorted numerically.

You can verify that, if you for example insert an "A" at the beginning of the second field in your problematic file 2 in the first header line. Then the output should be correct.

If my diagnosis is correct, the obvious question is: What can be done about this error here?

---

You can attach files to your post in "Advanced mode".

Last edited by stomp; 08-14-2016 at 06:37 PM..
# 21  
Old 08-14-2016
Yeah, I think it may be a coincidence (file size) due to my headers. And yes, the epoch is probably unnecessary, but I'm not that familiar with perl time manipulation. To remove the epoch conversion below do I just remove the "red" portion?

Code:
perl -MTime::Local -pe '
BEGIN{$fc=1}
s#(([0-9]{4})/([0-9]{2})/([0-9]{2}) ([0-9]{2}):([0-9]{2}):([0-9]{2}))#timelocal($7,$6,$5,$4,$3,$2).",$fc,$1"#e;
$fc++ if eof;' FILE1 FILE2 | \
\
sort -n  | \
\
awk -F, 'BEGIN{OFS="\n"}
FNR==1{H2=$0;next}
FNR==2{H1=$0;next}
{FID=$2;sub(/^([^,]+,){2}/,"")}
FID_OLD==1 && FID!=1 {print H1,LINE_OLD,H2}
FID==2 {print $0}
{FID_OLD=FID;LINE_OLD=$0}'

Also, my headers are a little different from some files. some have 3 more headers added after timeformatted column.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Keep only the closet match of timestamped row (include headers) from file1 to precede file2 row/s

This is a question that is related to one I had last August when I was trying to sort/merge two files by millsecond time column (in this case column 6). The script (below) that helped me last august by RudiC solved the puzzle of sorting/merging two files by time, except it gets lost when the... (0 Replies)
Discussion started by: aachave1
0 Replies

2. Shell Programming and Scripting

awk to search field2 in file2 using range of fields file1 and using match to another field in file1

I am trying to use awk to find all the $2 values in file2 which is ~30MB and tab-delimited, that are between $2 and $3 in file1 which is ~2GB and tab-delimited. I have just found out that I need to use $1 and $2 and $3 from file1 and $1 and $2of file2 must match $1 of file1 and be in the range... (6 Replies)
Discussion started by: cmccabe
6 Replies

3. Shell Programming and Scripting

Reading and appending a row from file1 to file2 using awk or sed

Hi, I wanted to add each row of file2.txt to entire length of file1.txt given the sample data below and save it as new file. Any idea how to efficiently do it. Thank you for any help. input file file1.txt file2.txt 140 30 200006 141 32 140 32 200006 142 33 140 35 200006 142... (5 Replies)
Discussion started by: ida1215
5 Replies

4. Shell Programming and Scripting

Print sequences from file2 based on match to, AND in same order as, file1

I have a list of IDs in file1 and a list of sequences in file2. I can print sequences from file2, but I'm asking for help in printing the sequences in the same order as the IDs appear in file1. file1: EN_comp12952_c0_seq3:367-1668 ES_comp17168_c1_seq6:1-864 EN_comp13395_c3_seq14:231-1088... (5 Replies)
Discussion started by: pathunkathunk
5 Replies

5. Shell Programming and Scripting

Match single line in file1 to groups of lines in file2

I have two files. File 1 is a two-column index file, e.g. comp11084_c0_seq6:130-468(-) comp12746_c0_seq3:140-478(+) comp11084_c0_seq3:201-539(-) comp12746_c0_seq2:191-529(+) File 2 is a sequence file with headers named with the same terms that populate file 1. ... (1 Reply)
Discussion started by: pathunkathunk
1 Replies

6. Shell Programming and Scripting

Get row number from file1 and print that row of file2

Hi. How can we print those rows of file2 which are mentioned in file1. first character of file1 is a row number.. for eg file1 1:abc 3:ghi 6:pqr file2 a abc b def c ghi d jkl e mno f pqr ... (6 Replies)
Discussion started by: Abhiraj Singh
6 Replies

7. Shell Programming and Scripting

Match part of string in file2 based on column in file1

I have a file containing texts and indexes. I need the text between (and including ) INDEX and number "1" alone in line. I have managed this: awk '/INDEX/,/1$/{if (!/1$/)print}' file1.txt It works for all indexes. And then I have second file with years and indexes per year, one per line... (3 Replies)
Discussion started by: phoebus
3 Replies

8. UNIX for Dummies Questions & Answers

if matching strings in file1 and file2, add column from file1 to file2

I have very limited coding skills but I'm wondering if someone could help me with this. There are many threads about matching strings in two files, but I have no idea how to add a column from one file to another based on a matching string. I'm looking to match column1 in file1 to the number... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

9. Shell Programming and Scripting

Match one column of file1 with that of file2

Hi, I have file1 like this aaa ggg ddd vvv eeeand file2 aaa 2 aaa 443 xxx 76 aaa 34 ggg 33 wee 99 ggg 33 ddd 1 ddd 10 ddd 98 sds 23 (4 Replies)
Discussion started by: polsum
4 Replies

10. Shell Programming and Scripting

match value from file1 in file2

Hi, i've two files (file1, file2) i want to take value (in column1) and search in file2 if the they match print the value from file2. this is what i have so far. awk 'FILENAME=="file1"{ arr=$1 } FILENAME=="file2" {print $0} ' file1 file2 (2 Replies)
Discussion started by: myguess21
2 Replies
Login or Register to Ask a Question