Keep only the closet match of timestamped row (include headers) from file1 to precede file2 row/s

 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Keep only the closet match of timestamped row (include headers) from file1 to precede file2 row/s
# 22  
Old 08-14-2016
Hi aachave1,
In post #15 in this thread, you said that you couldn't access your work PC. Is you work PC running a Windows operating system? What operating system and shell are you using?

Please show us the output from the command:
Code:
tail -n 3 filename | od -bc

for both of the files you are using when a line is dropped from your output (replacing filename with the name of one of your files on both invocations).

What is the format of the dates used in your timestamps? Is the timestamp in your input files:
Code:
2014/04/07 16:02:55

for the date April 7, 2014 or for the date July 4, 2014?

Are we correct in assuming that your input files (other than the header line) are sorted in increasing order by timestamp?
# 23  
Old 08-14-2016
Hmmm. Since you're not familiar with perl, a solution won't be easy for you. One solution for the sorting problem is to insert a marker(and remove it later) at line1/file1, so header of file1 will always be at the same position. (file1/line1 will allways be at line2 in the intermediate output of sort)
Code:
perl -MTime::Local -pe '
BEGIN{$fc=1}
s/^/1,/ if($. == 1);
s#(([0-9]{4})/([0-9]{,2})/([0-9]{,2}) ([0-9]{,2}):([0-9]{,2}):([0-9]{,2}))#timelocal($7,$6,$5,$4,$3,$2).",$fc,$1"#e; 
$fc++ if eof;' FILE1 FILE2 | \
\
sort -n | \
\
awk -F, 'BEGIN{OFS="\n"} 
FNR==1{H2=$0;next} 
FNR==2{H1=substr($0,3);next}  
{FID=$2;sub(/^([^,]+,){2}/,"")} 
FID_OLD==1 && FID!=1 {print H1,LINE_OLD,H2} 
FID==2 {print $0} 
{FID_OLD=FID;LINE_OLD=$0}'

The epoch conversion maybe unnescessary, but it does not disturb either. That's why I leave it in place here. Furthermore it would need some code-refactoring when taking away the epoch, because it's important, that the filenr is appended to the date, so that equal file1-date-values are always before file2 values at the sorting phase.

P.S.: Looking forward to see another solution from Don

Last edited by stomp; 08-15-2016 at 07:58 AM..
This User Gave Thanks to stomp For This Post:
# 24  
Old 08-15-2016
Quote:
Originally Posted by stomp
P.S.: Looking forward to see another solution from Don
Hi stomp,
To be honest, it looks to me like you have put a lot of work into this thread and I haven't really checked out your code. When I see that the last line in a file isn't being processed and I see a comment about the data being processed being on a PC, I have to wonder if the data is in DOS text format (with no line terminator on the last line). If there is no line terminator on the final line, the behavior of sort and awk is unspecified. I don't use perl enough to know how it deals with partial lines and its behavior is not specified by the standards.

If the date format in the files is YYYY/MM/DD, then the dates can be compared (without conversions) just by performing string comparisons on field 1 values. But, if the date format is YYYY/DD/MM, some kind of conversion will be required before timestamps can be compared unless we know that all lines in the files being processed are for a single date.

And, for the record, two or more files can be read in parallel in awk using the normal record processing input methods for one of the files and statements of the form:
Code:
getline variable_name < filename

where filename is a variable name or constant string specifying the name of another file and variable_name is a variable that will be assigned the contents of the next line from that file. The return value from any call to getline is 1 for successful input, zero for end-of-file, and −1 for an error.
# 25  
Old 08-15-2016
Quote:
...,that the filenr is appended to the date, so that equal file1-date-values are always before file2 values at the sorting phase.
@aachave1: I tried to fix your original code too later in the thread, but that element was missing, so your short solution wasn't easy to fix. The appending of the header tab-separated to the line however was definitively an interesting move.
# 26  
Old 08-15-2016
Quote:
Originally Posted by Don Cragun
Hi aachave1,
In post #15 in this thread, you said that you couldn't access your work PC. Is you work PC running a Windows operating system? What operating system and shell are you using?

Please show us the output from the command:
Code:
tail -n 3 filename | od -bc

for both of the files you are using when a line is dropped from your output (replacing filename with the name of one of your files on both invocations).

What is the format of the dates used in your timestamps? Is the timestamp in your input files:
Code:
2014/04/07 16:02:55

for the date April 7, 2014 or for the date July 4, 2014?

Are we correct in assuming that your input files (other than the header line) are sorted in increasing order by timestamp?

Hi Don, sometimes IT at my work are doing maintenance on the weekends which causes me to lose remote access to my PC. My PC is running on windows 8, but I remote in to a Linux server (one of many on my program) to run bash.

Yes, my column one timestamp (converted over from columns 17-22ish) is in this format
"
Code:
2014/04/07 16:02:55

for the date April 7, 2014 or for the date July 4, 2014?"

Yes, My files are sorted in ascending order.

Thanks.

---------- Post updated at 09:16 AM ---------- Previous update was at 08:58 AM ----------

Like I mentioned in my original post, this code below almost worked, but it didn't yield the proper row from file one. The bold awk is what I needed changed. As you can see, I sorted off of a time column "21" that is in msec and a little more accurate, however, Stomps last two code examples seem to sort off of column 1 timestamp just fine.

Code:
#!/bin/bash
function f() { awk 'NR==1{h=$0; next} {print $0 "\t" h}' $1; }; sort -t"," -k21,21 <(f file1) <(f file2)  | 
  awk -F'\t' '$2!=p{print $2; p=$2; b++; c=1} !(b%2)||c&&c--{print $1}' > temp5

This was my result with this code, in which the "red" line is what the output was (not correct). The "green" line is what I needed (correct line)

Code:
TIMEFORMATTED,G_CCSDS_VERSION,G_CCSDS_TYPE,G_CCSDS_2HDR_FLAG,G_CCSDS_APID,G_CCSDS_GRP_FLAGS,G_CCSDS_SEQ_COUNT,G_CCSDS_PKT_LEN,G_CCSDS_DOY,G_CCSDS_MSEC
2014/04/07 16:02:55,0,0,1,572,3,0,1917,20550,57775339
2014/04/07 16:03:00,0,0,1,572,3,0,1917,20550,57780339
2014/04/07 16:03:05,0,0,1,572,3,0,1917,20550,57785339
2014/04/07 16:03:10,0,0,1,572,3,0,1917,20550,57790339
TIMEFORMATTED,CCSDS_VERSION,CCSDS_TYPE,CCSDS_2HDR_FLAG,CCSDS_APID,CCSDS_GRP_FLAGS,CCSDS_SEQ_COUNT,CCSDS_PKT_LEN,CCSDS_DOY,CCSDS_MSEC
2014/04/07 16:03:12,0,0,1,544,3,0,985,20550,57788894
2014/04/07 16:03:13,0,0,1,544,3,0,985,20550,57793894
2014/04/07 16:03:14,0,0,1,544,3,0,985,20550,57794894

TIMEFORMATTED,G_CCSDS_VERSION,G_CCSDS_TYPE,G_CCSDS_2HDR_FLAG,G_CCSDS_APID,G_CCSDS_GRP_FLAGS,G_CCSDS_SEQ_COUNT,G_CCSDS_PKT_LEN,G_CCSDS_DOY,G_CCSDS_MSEC
2014/04/07 16:03:15,0,0,1,572,3,0,1917,20550,57795339
TIMEFORMATTED,CCSDS_VERSION,CCSDS_TYPE,CCSDS_2HDR_FLAG,CCSDS_APID,CCSDS_GRP_FLAGS,CCSDS_SEQ_COUNT,CCSDS_PKT_LEN,CCSDS_DOY,CCSDS_MSEC
2014/04/07 16:03:15,0,0,1,544,3,0,985,20550,57795894

# 27  
Old 08-15-2016
Please show us the output from the command:
Code:
tail -n 3 filename | od -bc

as I requested before. I still need to know if the input files are in DOS text file format or UNIX text file format.

I think I can do this with a single awk script (instead of needing two awk scripts and sort or perl, sort, and awk (as suggested by stomp) if I know what the input file format is and where to find the date field(s).

Will the date field be the same field in both of your input files, or do I have to worry about file1 and file2 having the dates in different fields?

Assuming that your input files are in DOS text file format, is it OK for your script to add a <newline> character to the end of the input files on your Windows server? (If not, that is OK, but I need to know if I need to implement an alternative while processing your files.)
# 28  
Old 08-15-2016
Would this come close to what you want (may need some polishing):
Code:
awk '
NR == 1         {getline HD1 < F1
                 HD2 = $0
                 next
                }

$1 >= T[1]      {do     {LAST = TMP
                         ST = getline TMP < F1
                         split (TMP, T, FS)
                        }
                 while (($1 >= T[1]) && (ST == 1))
                 if (ST == 0)   {LAST = TMP
                                 T[1] = "ZZZ"
                                }
                 print HD1
                 print LAST
                 print HD2
                 print
                 next
                }
                {print 
                }

' FS="," F1=file1 file2
TIMEFORMATTED,G_CCSDS_VERSION,G_CCSDS_TYPE,G_CCSDS_2HDR_FLAG,G_CCSDS_APID,G_CCSDS_GRP_FLAGS,G_CCSDS_SEQ_COUNT,G_CCSDS_PKT_LEN,G_CCSDS_DOY,G_CCSDS_MSEC
2014/04/07 16:03:10,0,0,1,572,3,0,1917,20550,57790339
TIMEFORMATTED,CCSDS_VERSION,CCSDS_TYPE,CCSDS_2HDR_FLAG,CCSDS_APID,CCSDS_GRP_FLAGS,CCSDS_SEQ_COUNT,CCSDS_PKT_LEN,CCSDS_DOY,CCSDS_MSEC
2014/04/07 16:03:12,0,0,1,544,3,0,985,20550,57788894
2014/04/07 16:03:13,0,0,1,544,3,0,985,20550,57793894
2014/04/07 16:03:14,0,0,1,544,3,0,985,20550,57794894
TIMEFORMATTED,G_CCSDS_VERSION,G_CCSDS_TYPE,G_CCSDS_2HDR_FLAG,G_CCSDS_APID,G_CCSDS_GRP_FLAGS,G_CCSDS_SEQ_COUNT,G_CCSDS_PKT_LEN,G_CCSDS_DOY,G_CCSDS_MSEC
2014/04/07 16:03:15,0,0,1,572,3,0,1917,20550,57795339
TIMEFORMATTED,CCSDS_VERSION,CCSDS_TYPE,CCSDS_2HDR_FLAG,CCSDS_APID,CCSDS_GRP_FLAGS,CCSDS_SEQ_COUNT,CCSDS_PKT_LEN,CCSDS_DOY,CCSDS_MSEC
2014/04/07 16:03:15,0,0,1,544,3,0,985,20550,57795894
2014/04/07 16:03:16,0,0,1,544,3,0,985,20550,57796894
2014/04/07 16:03:17,0,0,1,544,3,0,985,20550,57797894


Last edited by RudiC; 08-15-2016 at 01:53 PM..
This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Keep only the closet match of timestamped row (include headers) from file1 to precede file2 row/s

This is a question that is related to one I had last August when I was trying to sort/merge two files by millsecond time column (in this case column 6). The script (below) that helped me last august by RudiC solved the puzzle of sorting/merging two files by time, except it gets lost when the... (0 Replies)
Discussion started by: aachave1
0 Replies

2. Shell Programming and Scripting

awk to search field2 in file2 using range of fields file1 and using match to another field in file1

I am trying to use awk to find all the $2 values in file2 which is ~30MB and tab-delimited, that are between $2 and $3 in file1 which is ~2GB and tab-delimited. I have just found out that I need to use $1 and $2 and $3 from file1 and $1 and $2of file2 must match $1 of file1 and be in the range... (6 Replies)
Discussion started by: cmccabe
6 Replies

3. Shell Programming and Scripting

Reading and appending a row from file1 to file2 using awk or sed

Hi, I wanted to add each row of file2.txt to entire length of file1.txt given the sample data below and save it as new file. Any idea how to efficiently do it. Thank you for any help. input file file1.txt file2.txt 140 30 200006 141 32 140 32 200006 142 33 140 35 200006 142... (5 Replies)
Discussion started by: ida1215
5 Replies

4. Shell Programming and Scripting

Print sequences from file2 based on match to, AND in same order as, file1

I have a list of IDs in file1 and a list of sequences in file2. I can print sequences from file2, but I'm asking for help in printing the sequences in the same order as the IDs appear in file1. file1: EN_comp12952_c0_seq3:367-1668 ES_comp17168_c1_seq6:1-864 EN_comp13395_c3_seq14:231-1088... (5 Replies)
Discussion started by: pathunkathunk
5 Replies

5. Shell Programming and Scripting

Match single line in file1 to groups of lines in file2

I have two files. File 1 is a two-column index file, e.g. comp11084_c0_seq6:130-468(-) comp12746_c0_seq3:140-478(+) comp11084_c0_seq3:201-539(-) comp12746_c0_seq2:191-529(+) File 2 is a sequence file with headers named with the same terms that populate file 1. ... (1 Reply)
Discussion started by: pathunkathunk
1 Replies

6. Shell Programming and Scripting

Get row number from file1 and print that row of file2

Hi. How can we print those rows of file2 which are mentioned in file1. first character of file1 is a row number.. for eg file1 1:abc 3:ghi 6:pqr file2 a abc b def c ghi d jkl e mno f pqr ... (6 Replies)
Discussion started by: Abhiraj Singh
6 Replies

7. Shell Programming and Scripting

Match part of string in file2 based on column in file1

I have a file containing texts and indexes. I need the text between (and including ) INDEX and number "1" alone in line. I have managed this: awk '/INDEX/,/1$/{if (!/1$/)print}' file1.txt It works for all indexes. And then I have second file with years and indexes per year, one per line... (3 Replies)
Discussion started by: phoebus
3 Replies

8. UNIX for Dummies Questions & Answers

if matching strings in file1 and file2, add column from file1 to file2

I have very limited coding skills but I'm wondering if someone could help me with this. There are many threads about matching strings in two files, but I have no idea how to add a column from one file to another based on a matching string. I'm looking to match column1 in file1 to the number... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

9. Shell Programming and Scripting

Match one column of file1 with that of file2

Hi, I have file1 like this aaa ggg ddd vvv eeeand file2 aaa 2 aaa 443 xxx 76 aaa 34 ggg 33 wee 99 ggg 33 ddd 1 ddd 10 ddd 98 sds 23 (4 Replies)
Discussion started by: polsum
4 Replies

10. Shell Programming and Scripting

match value from file1 in file2

Hi, i've two files (file1, file2) i want to take value (in column1) and search in file2 if the they match print the value from file2. this is what i have so far. awk 'FILENAME=="file1"{ arr=$1 } FILENAME=="file2" {print $0} ' file1 file2 (2 Replies)
Discussion started by: myguess21
2 Replies
Login or Register to Ask a Question