awk reading from named pipe (fifo)


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk reading from named pipe (fifo)
# 1  
Old 08-29-2011
awk reading from named pipe (fifo)

I'm trying to read a fifo using awk and comming across some problems. I'm writing to the fifo from multiple processes invoked by GNU Parallel:

Code:
mkfifo my_fifo
awk '{ a[$1] = a[$1] + $2 } END { for (i in a) print i, a[i] }' my_fifo | sort -nk1 > sorted_output
grep -v '^@' massive_file | parallel --max-procs 16 --pipe -N 2500 a_program -h my_fifo > stdout 2> stderr

Reading from the fifo appears to stop prematurely. If I execute the awk command again, the command writing to the fifo appears to continue. Is this an issue with awk reading from fifo's or the fact I'm writing to the fifo from multiple processes invoked through GNU Parallel?

Cheers,
Nathan
# 2  
Old 08-30-2011
The problem seems to be, what happens if the program reading from the pipe, does not get data every time it would like to have them. I am not sure how to come around this in awk, but using perl you could try the following. Create a perl script "namedpipe.pl":
Code:
#!/usr/bin/perl
$tmax=5; # timeout after 5 seconds without data from pipe
open(IN,"< my_fifo") || die("cannot open my_fifo"); # blocks until first data available
$tstart=time();
LOOP: while(1){
  $_=<IN>; # non-blocking read, undef if no data available
  if (length) {
    # line contains something, process it
    chomp;
    @f=split;
    $a{@f[0]} += @f[1];
    # reset timestamp
    $tstart=time();
  }
  # leave loop on timeout
  last LOOP if (time()-$tstart>$tmax);
}
# final print with sorting
foreach $key (sort(keys(%a))) {
  print "$key $a{$key}\n"
}

and then use it like:
Code:
mkfifo my_fifo
perl namedpipe.pl > sorted_output &
grep -v '^@' massive_file | parallel --max-procs 16 --pipe -N 2500 a_program -h my_fifo > stdout 2> stderr; wait

# 3  
Old 08-30-2011
If you cannot accept that a few lines will be mixed together then you need to avoid race conditions like:
Code:
mkfifo fifo
(echo program1_line1; sleep 2; echo program1_line2) >fifo &
(echo program2_line1; sleep 1; echo program2_line2) >fifo &
cat fifo

In this case the lines do not mix up but there is no guarantee against that.

GNU Parallel guarantees the output from GNU Parallel will never be mixed up, but that requires that you can get the output to stdout:
Code:
grep -v '^@' massive_file | parallel --max-procs 16 --pipe -N 2500 a_program -h --no-debug-on-stdout - 2> stderr |
  awk '{ a[$1] = a[$1] + $2 } END { for (i in a) print i, a[i] }' | sort -nk1 > sorted_output

If 'a_program' cannout output to stdout, you should be able to do this:
Code:
grep -v '^@' massive_file | parallel --max-procs 16 --pipe -N 2500 'mkfifo out_{#}; a_program -h out_{#} >stdout_{#} 2> stderr{#} & cat out_{#}' |
  awk '{ a[$1] = a[$1] + $2 } END { for (i in a) print i, a[i] }' | sort -nk1 > sorted_output

That will create a fifo for each job, save the output to the fifo while cat'ting it out from the fifo. GNU Parallel will then catch the output and send it to awk when the job is done.

I have the feeling we are talking a lot of data coming into and out from 'a_program' and that you would prefer not having temporary files (which GNU Parallel will use for buffering the output). In that case consider putting the awk script into parallel with a_program.

The awk script seems to count the frequency of a given input and it should not be too hard to merge several outputs from the awk script.
This User Gave Thanks to tange For This Post:
# 4  
Old 08-31-2011
Quote:
Originally Posted by tange
If 'a_program' cannout output to stdout, you should be able to do this:
Code:
grep -v '^@' massive_file | parallel --max-procs 16 --pipe -N 2500 'mkfifo out_{#}; a_program -h out_{#} >stdout_{#} 2> stderr{#} & cat out_{#}' |
  awk '{ a[$1] = a[$1] + $2 } END { for (i in a) print i, a[i] }' | sort -nk1 > sorted_output

That will create a fifo for each job, save the output to the fifo while cat'ting it out from the fifo. GNU Parallel will then catch the output and send it to awk when the job is done.

I have the feeling we are talking a lot of data coming into and out from 'a_program' and that you would prefer not having temporary files (which GNU Parallel will use for buffering the output). In that case consider putting the awk script into parallel with a_program.
Hi Ole,

You are correct, we are talking 10's millions of lines going into 'a_program' so there are going to be 5,000-10,000 intermediary files created. How do I go about putting the awk script in parallel with 'a_program'?

Quote:
Originally Posted by tange
The awk script seems to count the frequency of a given input and it should not be too hard to merge several outputs from the awk script.
You are correct, the file written by 'a_program' is a 2 column file, for each value in the first column I increment the count by the corresponding value in the send column. I then simply do a numerical sort of the lines by the first column.

Cheers!
Nathan
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

UNIX fifo concurrent read from a named pipe

I have created a fifo named pipe in solaris, which writes the content of a file, line by line, into pipe as below: $ mkfifo namepipe $ cat books.txt "how to write unix code" "how to write oracle code" $ cat books.txt >> namepipe & I have a readpipe.sh script which reads the named... (2 Replies)
Discussion started by: naveen mani
2 Replies

2. Programming

Pipe & fifo size limit

Hi guys. 1. how much is the size of pipe?(i mean the buffer size) 2. is this size different in various UNIX derivations? 3. what happens if we write to a full pipe? does it block until get some free space(the other side receive data) or returns an error? 3. FIFO s are physical files on the... (2 Replies)
Discussion started by: majid.merkava
2 Replies

3. UNIX for Advanced & Expert Users

Why not SIGPIPE for readers of pipe/FIFO?

Hi This is a exercise question from Unix network programming vol2. Why the SIGPIPE signal is generated only for writers when readers disappear. why not it is generated for readers when writer disappears. I guess, if the writer didn't get any response like the reader gets EOF, it will... (4 Replies)
Discussion started by: kumaran_5555
4 Replies

4. UNIX for Dummies Questions & Answers

fifo or named pipe working?

Can someone explain to me the working of fifo() system call using simple C programs so that I can implement them in the UNIX environement? (1 Reply)
Discussion started by: lvkchaitanya
1 Replies

5. Shell Programming and Scripting

Reading from blocking fifo pipe in shell script

Hi!! I have a problem reading from a fifo pipe in shell script. The idea is simple, I have a C program with two pipe files: An input pipe I use to send commands in shell script to the C program (echo "command" > input.pipe) An output pipe that I read the result of the command also in... (4 Replies)
Discussion started by: victorin
4 Replies

6. UNIX for Dummies Questions & Answers

Named PIPE

Gurus, I've a File Transaction Server, which communicates with other servers and performs some processing.It uses many Named PIPE's. By mistake i copied a named PIPE into a text file. I heard that PIPE files shouldn't be copied.Isn't it? Since it's a production box, i'm afraid on... (2 Replies)
Discussion started by: Tamil
2 Replies

7. Shell Programming and Scripting

FIFO named pipes

Hi...Can anyone please guide me on FIFO Pipes in UNIX.I have lerant things like creating fifo pipes,using them for reads and writes etc.I want to know what is the maximum amount of memory that such a pipe may have? Also can anyone guide me on where to get info on this topic from? (1 Reply)
Discussion started by: tej.buch
1 Replies

8. UNIX for Advanced & Expert Users

PIPE and FIFO buffer size

Hello! How I can increase (or decrease) the predefined pipe buffer size? Thanks! (1 Reply)
Discussion started by: Jus
1 Replies

9. Filesystems, Disks and Memory

PIPEs and Named PIPEs (FIFO) Buffer size

Hello! How I can increase or decrease predefined pipe buffer size? System FreeBSD 4.9 and RedHat Linux 9.0 Thanks! (1 Reply)
Discussion started by: Jus
1 Replies

10. Programming

Pipe & fifo....

Could someone Help me with this code please? #include <stdio.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <string.h> #include <fcntl.h> #define SIZE_B 256 /*buffer's size */ #define NUM_ARG 20 /* max number of args for any command */ int... (4 Replies)
Discussion started by: M3xican
4 Replies
Login or Register to Ask a Question