awk script processing data from 2 files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk script processing data from 2 files
# 1  
Old 10-07-2010
awk script processing data from 2 files

Hi!
I have 2 files containing data that I need to process at the same time, I have problems in reading a different number of lines from the different files.
Here is an explanation of what I need to do (possibly with an awk script).

File "samples.txt" contains data in the format:
time_instant measure
HTML Code:
903.0 -
906.43 18.4
912.7 17.5
918.05 11.2
File "time.txt" contains data in the format
time_instant position
HTML Code:
900 out
901 out
902 out
903 in
904 in
905 in
906 in
907 in
908 out
909 out
910 out
911 in
912 in
913 in
I need to compute temporal averages of the measures in samples.txt, but the temporal averages must be related only to the time instants in which the position in time.txt is "in".

So, I need at the end to have:

a/b
where a=sum(measure_i*duration_measure_i)
b=total_measurement_duration

- measure_i is the second field in every line of samples.txt
- duration_measure_i is the difference between the 1st fields in two consecutive lines in samples.txt and eventually the duration of the period in which the position in time.txt was "out"

So, for example I would do for the first measurement (that is on the second line in samples.txt, since the 1st line only tells me when measurements start):
a=18.4*(906.43-903.0-0)
b=(906.43-903.0-0)

Then for the second:
a+=17.5*(912.7-906.43-2)
b+=(912.7-906.43-2)

I am quite new to awk and don't know how to read two lines at a time from samples.txt and search in time.txt the lines that are between the time instants written from samples.txt

Any suggestion? Thank you very much!!
# 2  
Old 10-07-2010
Quote:
So, for example I would do for the first measurement (that is on the second line in samples.txt, since the 1st line only tells me when measurements start):
a=18.4*(906.43-903.0-0)
b=(906.43-903.0-0)

Then for the second:
a+=17.5*(912.7-906.43-2)
b+=(912.7-906.43-2)
Where the 2 in 912.7-906.43-2 comes from?
# 3  
Old 10-07-2010
2 is the time spent "out".
I know that looking the file it should be 3, but for the periods when the position is "out", I have to consider as length of the period the final_time-initial_time (for example, in case only 1 consecutive line with "out" appeared, then the length of interval would be 0 and I would not subtract any time)
Then, 910-908=2

That's because my records on "out" periods are only instantaneous.. sorry for not explaining that in a clearer way in the previous post. :-)
# 4  
Old 10-07-2010
Sorry, but I still don't understand how the data in times.txt is used.

First we have:

Code:
900 out 
901 out 
902 out

So, the time spent out is 2, not 0, or I'm missing something again?

What about the in values in times.txt, where are they used?
# 5  
Old 10-07-2010
The time intervals in the two files don't coincide: in samples.txt I record my measures, then I have to search for the related time intervals of interest in time.txt.

For example, in samples.txt, measurements started at 903.0, then at 906.43 I got the first measure (2nd line). So, I have to go in time.txt and check if in the time period between 903 and 906.43 I was "in" or "out". Since I've been "in" all the time,itì's ok and I don't need to subtract anything.
The time instants preceding this interval are discarded.

Then, I go to the second measure in samples.txt (3rd line) and check if in time.txt if in the time interval between 906.43 and 912.7 I was "in" or "out". I've been "out" for 2 seconds, so I subtract this value from the time interval duration, and so on.

Time.txt records a very long time, usually with many seconds preceding and following the time instants in which measurements were taken. I refer to samples.txt, considering one sample at every iteration (i.e. the line referring to the current measure and the preceding one in order to get the time interval of interest), then I need to get the correspondent lines related to that time interval in time.txt. If the duration of "out" periods in this interval of interest is zero, it's ok and I don't need to subtract anything. In case it's non-zero, I need to subtract it from the interval duration

Hope is much clearer, thank a lot for your help anyway!



---------- Post updated at 05:11 AM ---------- Previous update was at 04:54 AM ----------

Hope this helps. Suppose you have
Code:
903.0 -
906.43 18.4
912.7 17.5
921.05 11.2

and
Code:
900 out
901 out
902 out
903 in
904 in
905 in
906 in
907 in
908 out
909 out
910 out
911 in
912 in
913 in
914 out
915 out
916 out
917 out
918 in
919 out
920 in
921 in
922 in
923 in

The result I need to get is
{18.4*(906.43-903.0-0)+17.5*(912.7-906.43-2)+11.2*(921.05-912.7-3)}/{(906.43-903.0-0)+(912.7-906.43-2)+(921.05-912.7-3)}

Last edited by Alice236; 10-07-2010 at 08:45 AM..
# 6  
Old 10-07-2010
If I understand correctly, you could use something like this:

Code:
awk 'END { print at/bt }
NR == FNR {
  /out/ && o[++i] = $1
  if (/in/ && $2 != p2) {
    r[o[1]] = o[i]; i = 0
    }
  p1 = $1; p2 = $2; next    
  }
FNR == 1 {
  p2 == "in" || r[o[1]] = o[i] 
  p1 = $1; next
  }
{ 
  _o = 0 
  for (R in r) {
    if (R >= p1 && $1 >= r[R]) 
      _o += r[R] - R      
    }
  b = $1 - p1 - _o
  at += ($2 * b); bt += b
  }
{ p1 = $1 }' times.txt samples.txt

This User Gave Thanks to radoulov For This Post:
# 7  
Old 10-07-2010
Very impressive! :-)
Thank you so much!!!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Processing files one by one using data from pipe

Hi guys, I receive a list from pipe (with fixed number of lines) like this: name1 name2 name3 And in my ./ folder I have three files: 01-oldname.test 02-someoldname.test 03-evenoldername.test How to rename files one by one using while read? Desired result: 01-name1.test 02-name2.test... (3 Replies)
Discussion started by: useretail
3 Replies

2. Shell Programming and Scripting

Passing multiple files to awk for processing in bash script

Hi, I'm using awk command in bash script. I'm able to pass multiple files to awk for processing.The code i can use is as below(sample code) #!/bin/bash awk -F "," 'BEGIN { ... ... ... }' file1 file2 file3 In the above code i'm passing the file names manually and it is fine till my... (7 Replies)
Discussion started by: shree11
7 Replies

3. Shell Programming and Scripting

Converting text files to xls through awk script for specific data format

Dear Friends, I am in urgent need for awk/sed/sh script for converting a specific data format (.txt) to .xls. The input is as follows: >gi|1234|ref| Query = 1 - 65, Target = 1677 - 1733 Score = 8.38, E = 0.6529, P = 0.0001513, GC = 46 fd sdfsdfsdfsdf fsdfdsfdfdfdfdfdf... (6 Replies)
Discussion started by: Amit1
6 Replies

4. Shell Programming and Scripting

awk processing of variable number of fields data file

Hy! I need to post-process some data files which have variable (and periodic) number of fields. For example, I need to square (data -> data*data) the folowing data file: -5.34281E-28 -3.69822E-29 8.19128E-29 9.55444E-29 8.16494E-29 6.23125E-29 4.42106E-29 2.94592E-29 1.84841E-29 ... (5 Replies)
Discussion started by: radudownload
5 Replies

5. Programming

awk processing / Shell Script Processing to remove columns text file

Hello, I extracted a list of files in a directory with the command ls . However this is not my computer, so the ls functionality has been revamped so that it gives the filesizes in front like this : This is the output of ls command : I stored the output in a file filelist 1.1M... (5 Replies)
Discussion started by: ajayram
5 Replies

6. Shell Programming and Scripting

Data processing using awk

Hello, I have some bitrate data in a csv which is in an odd format and is difficult to process in Excel when I have thousands of rows. Therefore, I was thinking of doing this in bash and using awk as the primary application except that due to its complication, I'm a little stuck. ... (24 Replies)
Discussion started by: shadyuk
24 Replies

7. Shell Programming and Scripting

Perl script required for processing the data

I have following result.log file (always has 2 lines) which I need to process, cat result.log name.cmd.method,"result","abc","xyz"; name="hello,mine.12345,"&"tree"&" xyz "&" tree "&" xyz", data="way,"&" 1"&"rate-me"&"1"&"rate-me",str="",ret=""; now I need to extract the strings/data as... (4 Replies)
Discussion started by: perlDiva
4 Replies

8. Shell Programming and Scripting

Processing files using awk

Hi I have files in our UNIX directory like the below -rw-r--r-- 1 devinfo devsupp 872 Sep 14 02:09 IMGBTREE27309_12272_11_1_0_FK.idx0 -rw-r--r-- 1 devinfo devsupp 872 Sep 14 02:09 IMGBTREE27309_12272_11_0_0_PK.idx0 -rw-r--r-- 1 devinfo devsupp 432 Sep 14... (7 Replies)
Discussion started by: rbmuruga
7 Replies

9. Shell Programming and Scripting

Help with data processing, maybe awk

I have a file, first 5 columns are very normal, like "1107",106027,71400,"Y","BIOLOGY",, however, the 6th columns, the user can put comments, anything, just any characters, like new line, double quote, single quote, whatever from the keyboard, like"Please load my previous SOM597G course content in... (3 Replies)
Discussion started by: freelong
3 Replies

10. UNIX for Dummies Questions & Answers

single output of awk script processing multiple files

Helllo UNIX Forum :) Since I am posting on this board, yes, I am new to UNIX! I read a copy of "UNIX made easy" from 1990, which felt like a making a "computer-science time jump" backwards ;) So, basically I have some sort of understanding what the basic concept is. Problem Description:... (6 Replies)
Discussion started by: Kasimir
6 Replies
Login or Register to Ask a Question