Problems understanding pipes


 
Thread Tools Search this Thread
Top Forums Programming Problems understanding pipes
# 8  
Old 10-07-2011
Quote:
Originally Posted by ab_tall
Instead of creating n-1 pipes for n commands, would it be possible to use the same single pipe by suitable adjusting the Fds?
Nope. One pipe is one pipe, for a chain of 10 processes you need 9 pipes.

Not sure what you mean by "adjusting the FD". FD 6 isn't "pipe number 6", it just happens to be the sixth file your process opened. Add two to it and the kernel won't give you pipe number seven, just say "What?"

Or worse -- maybe there really does happen to be an FD 8. You just spun the roulette wheel and landed on a number your process opened already. What is it? Who knows, but whatever it is, you're reading it.

Quote:
From your code I surmised, we need to close anything in the parent that we don't use. But a close() call => that the FD no longer refers to any file. So when the parent closes the write end of the pipe with say curpipe{5,6} , how is the child allowed to use 6 to refer to the write end of the pipe?{is it because, in its address space, 6 is not closed? }
6 is just a number, perhaps the 6th file your process happened to open, it doesn't mean "pipe number 6". Closing #6 doesn't close everyone else's #6.

It's completely okay to have the same file open and used in many different processes, too -- that's how shells work. When you run echo a or cat, they receive copies of the shell's own open file descriptors. That's what fork() does -- creates an almost-perfect clone of the parent, right down to memory and open files. Then they run exec() to become a different program, but keep the same open files.

So echo, cat, et all don't have to tell the shell to write to the terminal -- they do so direct.

Pipes obviously know to wait until the process writing to them finishes before saying they're done. That works for more than one process too. If you have two processes with copies of the write-end and one process with the read end, the kernel will wait for both write ends to close before the pipe gives up -- even if you just left that one open by accident. The same logic goes for the read-end.

Every process is independent. Close everything you don't need.
Quote:
Sorry if these questions sound too basic, but i am unable to clearly visualize the address spaces like that. I think we can debug the child process using gdb, but I am not very familiar debugging multiple threads.
fork() clones a new, independent process. Each process is its own separate little universe and the only thing linking it with any other is sockets, files, pipes, and/or mapped memory.

Threads are something else entirely. When you create a thread it works in the same process, literally sharing all the same memory, all the same files. Change it in one thread and it changes in all of them. That's why threads can be so tricky -- it's easy to rip the floor out from under your threads by altering something they're using simultaneously.

Last edited by Corona688; 10-07-2011 at 07:24 PM..
# 9  
Old 10-07-2011
Quote:
Originally Posted by Corona688
Nope. One pipe is one pipe, for a chain of 10 processes you need 9 pipes.

Not sure what you mean by "adjusting the FD". The FD's are just numbers, sure, but they're numbers representing things in the kernel's table of open files. Add three to it and the kernel won't give you pipe number three, it'll say "What? You never opened file #67."

6 is just a number, perhaps the 6th file your process happened to open, it doesn't mean "pipe number 6". Closing #6 doesn't close everyone else's #6.

It's completely okay to have the same file open and used in many different processes, too -- that's how shells work. When you run echo a or cat, they receive copies of the shell's own open file descriptors, and just write to those direct: The shell doesn't have to write it for them.

Pipes obviously know to wait until the process writing to them finishes before saying they're done, but that works for more than one process too. If you have two processes with copies of the write-end, the kernel will wait for both of them to die before the pipe gives up -- even if one of the open ends was just left open by accident.

Every process is independent. Close everything you don't need. fork() creates a new, independent process. Threads are something else entirely.

When you create a thread it works in the same process, literally sharing all the same memory, all the same files. Change it in one thread and it changes in all of them. That's why threads can be so tricky -- it's easy to rip the floor out from under your threads by altering something they're using simultaneously.
Regarding the last portion,

Quote:
When you create a thread it works in the same process, literally sharing all the same memory, all the same files. Change it in one thread and it changes in all of them. That's why threads can be so tricky -- it's easy to rip the floor out from under your threads by altering something they're using simultaneously.
What I meant to say is I am having a hard time debugging the child process once it execs, as gdb is attached to the parent.
(I thought as there is no explicit concept of threads in Linux, it would be OK to call the child process a thread, but i guess that lead to confusion.)

Quote:
the kernel will wait for both of them to die before the pipe gives up -- even if one of the open ends was just left open by accident.
This point is what I missed out. Will keep this in mind in the future.
# 10  
Old 10-07-2011
Quote:
Originally Posted by ab_tall
What I meant to say is I am having a hard time debugging the child process once it execs, as gdb is attached to the parent.
(I thought as there is no explicit concept of threads in Linux, it would be OK to call the child process a thread, but i guess that lead to confusion.)
What system are you on? On linux, you can use strace, which lists all system calls your program and its children are making. (-f means 'follow children') Just system calls, only system calls, and nothing but system calls -- not line numbers or source code. But it's useful for clearing up mysteries like "why is my program freezing" -- it's stuck up on write(). I've also used it to track down where some silly programs were looking for config files in -- just hunt for open calls to see what files they're trying to open...

It ends up as an awful big list, but it's easy to cut down with grep.

Code:
$ gcc multipipe.c
$ strace ./a.out 2> log
<process runs and finishes>
$ egrep "(fork|execve|open|close|pipe|dup|read|write)\(" log
execve("./a.out", ["./a.out"], [/* 28 vars */]) = 0
open("/etc/ld.so.cache", O_RDONLY)      = 3
close(3)                                = 0
open("/lib/libc.so.6", O_RDONLY)        = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\20m\1\0004\0\0\0"..., 512) = 512
close(3)                                = 0
pipe([3, 4])                            = 0
[pid  9380] close(4 <unfinished ...>
[pid  9381] close(4 <unfinished ...>
[pid  9380] pipe( <unfinished ...>
[pid  9381] close(3 <unfinished ...>
[pid  9381] execve("/usr/local/bin/ls", ["ls"], [/* 28 vars */] <unfinished ...>
[pid  9380] close(5 <unfinished ...>
[pid  9381] execve("/usr/bin/ls", ["ls"], [/* 28 vars */] <unfinished ...>
[pid  9380] close(3 <unfinished ...>
[pid  9381] execve("/bin/ls", ["ls"], [/* 28 vars */] <unfinished ...>
[pid  9382] close(3Process 9383 attached
[pid  9380] close(4 <unfinished ...>
[pid  9382] close(5 <unfinished ...>
[pid  9382] close(4 <unfinished ...>
[pid  9382] execve("/usr/local/bin/tac", ["tac"], [/* 28 vars */] <unfinished ...>
[pid  9381] open("/etc/ld.so.cache", O_RDONLY <unfinished ...>
[pid  9382] execve("/usr/bin/tac", ["tac"], [/* 28 vars */] <unfinished ...>
[pid  9383] close(4 <unfinished ...>
[pid  9381] close(3 <unfinished ...>
[pid  9381] open("/lib/librt.so.1", O_RDONLY <unfinished ...>
[pid  9381] read(3,  <unfinished ...>
[pid  9382] open("/etc/ld.so.cache", O_RDONLY <unfinished ...>
[pid  9382] close(3 <unfinished ...>
[pid  9381] close(3 <unfinished ...>
[pid  9382] open("/lib/libc.so.6", O_RDONLY <unfinished ...>
[pid  9383] execve("/usr/local/bin/less", ["less"], [/* 28 vars */] <unfinished ...>
[pid  9382] read(3,  <unfinished ...>
[pid  9383] execve("/usr/bin/less", ["less"], [/* 28 vars */] <unfinished ...>
[pid  9381] open("/lib/libacl.so.1", O_RDONLY) = 3
[pid  9381] read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\32\0\0004\0\0\0"..., 512) = 512
[pid  9382] close(3 <unfinished ...>
[pid  9383] open("/etc/ld.so.cache", O_RDONLY <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/lib/libncurses.so.5", O_RDONLY <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9381] close(3)                    = 0
[pid  9381] open("/lib/libc.so.6", O_RDONLY) = 3
[pid  9381] read(3,  <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/lib/libc.so.6", O_RDONLY <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9382] open("/tmp/tacpBNVeU", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600 <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/lib/libdl.so.2", O_RDONLY <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9382] read(0,  <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9381] close(3 <unfinished ...>
[pid  9381] open("/lib/libpthread.so.0", O_RDONLY <unfinished ...>
[pid  9381] read(3,  <unfinished ...>
[pid  9381] close(3 <unfinished ...>
[pid  9381] open("/lib/libattr.so.1", O_RDONLY <unfinished ...>
[pid  9381] read(3,  <unfinished ...>
[pid  9381] close(3 <unfinished ...>
[pid  9383] open("/etc/terminfo/x/xterm", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] close(3)                    = 0
[pid  9383] open("/usr/bin/.sysless", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] open("/etc/sysless", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] open("/home/username/.less", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] open("/home/username/.lesshst", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/dev/tty", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9381] open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC <unfinished ...>
[pid  9381] close(3 <unfinished ...>
[pid  9383] write(1, "\33[?1049h\33[?1h\33=", 15 <unfinished ...>
[pid  9383] read(0,  <unfinished ...>
[pid  9381] write(1, "a.out\nlog\nmultipipe.c\n", 22 <unfinished ...>
[pid  9382] read(0,  <unfinished ...>
[pid  9381] close(1 <unfinished ...>
[pid  9382] write(3, "a.out\nlog\nmultipipe.c\n", 22 <unfinished ...>
[pid  9381] close(2 <unfinished ...>
[pid  9382] read(3, "a.out\nlog\nmultipipe.c\n", 22) = 22
[pid  9382] close(0)                    = 0
[pid  9382] write(1, "multipipe.c\nlog\na.out\n", 22 <unfinished ...>
[pid  9383] write(1, "multipipe.c\33[m\nlog\33[m\na.out\33[m\n", 31 <unfinished ...>
[pid  9382] close(1 <unfinished ...>
[pid  9383] read(0,  <unfinished ...>
[pid  9382] close(2 <unfinished ...>
[pid  9383] write(1, "\33[7mlines 1-3/3 (END) \33[27m\33[K", 30 <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] write(1, "\r\33[K\33[?1l\33>\33[?1049l", 19) = 19

# what about just process 9383?  what was it doing?
$ egrep "(fork|execve|open|close|pipe|dup|read|write)\(" log | grep 9383

[pid  9382] close(3Process 9383 attached
[pid  9383] close(4 <unfinished ...>
[pid  9383] execve("/usr/local/bin/less", ["less"], [/* 28 vars */] <unfinished ...>
[pid  9383] execve("/usr/bin/less", ["less"], [/* 28 vars */] <unfinished ...>
[pid  9383] open("/etc/ld.so.cache", O_RDONLY <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/lib/libncurses.so.5", O_RDONLY <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/lib/libc.so.6", O_RDONLY <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/lib/libdl.so.2", O_RDONLY <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/etc/terminfo/x/xterm", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] close(3)                    = 0
[pid  9383] open("/usr/bin/.sysless", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] open("/etc/sysless", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] open("/home/username/.less", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] open("/home/username/.lesshst", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/dev/tty", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] write(1, "\33[?1049h\33[?1h\33=", 15 <unfinished ...>
[pid  9383] read(0,  <unfinished ...>
[pid  9383] write(1, "multipipe.c\33[m\nlog\33[m\na.out\33[m\n", 31 <unfinished ...>
[pid  9383] read(0,  <unfinished ...>
[pid  9383] write(1, "\33[7mlines 1-3/3 (END) \33[27m\33[K", 30 <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] write(1, "\r\33[K\33[?1l\33>\33[?1049l", 19) = 19

$

lots of options for strace too, see the manpage.

---------- Post updated at 04:52 PM ---------- Previous update was at 04:36 PM ----------

Quote:
Originally Posted by ab_tall
how is the pipe underlying structure implemented?
If it is just another file, is it possible to see the contents of the pipe {so as to know what's being passed on between the read and write ends}.
It's not a file on disk. It's a memory buffer inside the kernel itself. The kernel keeps track of how many processes have which ends open and which processes need to be stopped or started when data becomes available or room in the buffer becomes available. When everything using the pipe finally closes it or exits, the kernel sees that nothing needs the buffer anymore and deletes it.

The buffer can vary in size on different systems. In most Linux I think it's 64 kilobytes.
Quote:
Finally,
Is there a way to find out which FDs point to the same underlying description?
If it's hidden somewhere in lsof, i'll dig deeper, but if not do let me know.
In Linux and a few UNIXes, you can see that under /proc/. Try this:

Code:
$ echo | ls -l /proc/self/fd
lr-x------ 1 username users 64 Oct  7 16:47 0 -> pipe:[618262]
lrwx------ 1 username users 64 Oct  7 16:47 1 -> /dev/pts/0
lrwx------ 1 username users 64 Oct  7 16:47 2 -> /dev/pts/0
lr-x------ 1 username users 64 Oct  7 16:47 200 -> /home/username/.ssh-agent
lr-x------ 1 username users 64 Oct  7 16:47 3 -> /proc/10073/fd

'self' is just a special folder meaning 'my own process number', so ls is listing its own open files. You could 'ls /proc/1234' to list process 1234's files.

0 is stdin, the pipe attaching it to 'echo'. I think 618262 is a unique number specific to that particular pipe. It's not a valid link, you can't open it -- it's just informational.

1 and 2 are stdout and stderr, both attached to the same terminal here. They're actually valid symlinks, try
Code:
echo asdf > /proc/fd/self/1

200 is a file my terminal opens on login, just a little script I set up to keep my SSH keys straight.

3 is the directory /proc/self/fd, which ls opened so it could list its own open files. The kernel decided 'self' meant 10073.

Sockets also show up in this list, if you have any open, being that sockets are FD's too...

Last edited by Corona688; 10-07-2011 at 07:41 PM..
This User Gave Thanks to Corona688 For This Post:
# 11  
Old 10-07-2011
I am on Ubuntu 64 11.04. Lots to digest for one post. Smilie
Will try out some stuff and get back.

---------- Post updated at 07:49 PM ---------- Previous update was at 07:13 PM ----------

Thank guys for the input , but I was wondering if mainstream shells like bash/csh use a structure similar to Corona's eg. for their execution.

If that's the case, they'd be need to create separate executables for all their builtin commands as a part of their initialization sequences. That doesn't really make the builtins much different from external commands then, doesn't it?
# 12  
Old 10-07-2011
Quote:
Originally Posted by ab_tall
Thank guys for the input , but I was wondering if mainstream shells like bash/csh use a structure similar to Corona's eg. for their execution. If that's the case, they'd be need to create separate executables for all their builtin commands as a part of their initialization sequences.
There are external commands to match almost every shell builtin -- the UNIX standard requires it. You don't always get a shell as fancy as you want, so there has to be a fallback.

But no -- I don't think so. Shell builtins are definitely not the same as externals, builtins are clearly faster.

Look carefully at what builtins do for you. There's commands like read which read from stdin, and commands like echo which write to stdout -- but you don't get things like cat which do both. That's intentional -- it can keep the builtin entirely inside the shell without risking deadlocking(parts of the shell itself waiting for other parts of the shell itself). It just does whatever's next in the list and carries on, if there's any wait involved its not it's fault.

echo is a particularly simple to do with a builtin. I tried to build a shell once, and managed situations like echo | cat like this:

Code:
int pipefd[2], status;
pipe(pipefd);

write(pipefd[1], "the owls are not what they seem\n", 32);
close(pipefd[1]);

if(fork() == 0)
{
        dup2(pipefd[0], 0);
        close(pipefd[0]);
        execlp("cat", "cat", NULL);
        perror("couldn't exec");
        exit(1);
}

close(pipefd[0]);
wait(&status);

As long as the message is smaller than the pipe's buffer, you don't have to wait -- just squirt it in the write end and close.

---------- Post updated at 06:38 PM ---------- Previous update was at 06:20 PM ----------

I think what I ended up doing was building a list much more complicated than {"echo", "tac", "more", NULL}, it was a structure with all three file descriptors(stdin/stdout/stderr), and a string for the name.

I opened everything in advance, including all pipes and redirections.

If I wanted process 0 to read from file.txt I could just go processlist[0].fd[0]=open("filename.txt", O_RDONLY);

If I wanted processes 2 and 3 joined with a pipe, I'd do
Code:
pipe(pipefd);
processlist[2].fd[1]=pipefd[1];
processlist[3].fd[0]=pipefd[0];

If I wanted a builtin to print into the first process, I'd create a pipe, squirt in a message, and shove the read-end in the array along with everything else.

And then I'd do one big loop to create every process.
Code:
for(n=0; n<numprocs; n++)
{
        if(fork()==0)
        {
                for(fileno=0; fileno<3; fileno++)
                if(processlist[n].fds[fileno] >= 0)
                        dup2(processlist[n].fds[fileno], fileno);

                // Close all the pipes! ALL OF THEM
                for(q=0; q<numprocs; q++)
                {
                        close(processlist[q].fds[0]);
                        close(processlist[q].fds[1]);
                        close(processlist[q].fds[3]);
                }

                execvp(processlist[n].name, processlist[n].args);
                exit(255);
        }
}

In retrospect, this was silly. Every time I fork()ed that huge wad of pipes I had to close so much junk that didn't need cloning in the first place.

Might be better to just do it as you go. Or maybe I should have played with close-on-exec and only copied the pipes I actually needed. (I slightly lied. Not all files get cloned on fork(), you can pick FD's you don't want being cloned and turn that off.)

Last edited by Corona688; 10-07-2011 at 09:44 PM..
# 13  
Old 10-07-2011
Ok, I went the other way round,

I implemented all builtins first, thinking I could use them as needed in my pipes.
I tried to mesh that in with the prev eg. which you gave of the IPC via pipes. But then I need to create 2 versions of my builtins,

if the builtin is the 1st command in the pipe, it's executed in the same shell,
if the builtin is somewhere in the middle,
then it needs to be executed in a separately forked process => that the builtin needs to have a corresponding external command for exec to work.

i.e
Code:
if( piped commands)
{
if(isbuiltin())
{
execbuiltin() // function where i implemented all builtins.
squirt into the first pipe
}
else
{
exec as in your sample prev.
(here it fails as that code would need me to exec an external command always - which may or may not exist)
}
}

Perhaps, i need to rethink my approach.
# 14  
Old 10-07-2011
Are you checking for builtins after you fork? Smilie You should check before. The whole point is that builtins don't need fork at all since they can happen wholly inside the shell.

I don't understand why you'd be using builtins in the middle of a pipe chain in the first place. They don't work there in csh. Unless you're trying to build in things like cat, which I don't think is a good idea.

Of course external commands must exist to use external commands. What's wrong with that? Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Homework & Coursework Questions

Using Pipes and Redirection

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: Create a pipe to show the number of people who are logged into the system right now. Create a pipe to show... (2 Replies)
Discussion started by: lakers34kb
2 Replies

2. Programming

Pipes in C

Hello all, I am trying to learn more about programming Unix pipes in C. I have created a pipe that does od -bc < myfile | head Now, I am trying to create od -bc < myfile | head | wc Here is my code, and I know I might be off, thats why I am here so I can get some clarification. #include... (1 Reply)
Discussion started by: petrca
1 Replies

3. UNIX for Dummies Questions & Answers

Problems understanding example code

I am really new to UNIX and programming in general and so apologies if this thread is a bit simple. I have searched and found a piece of sample code for a training program I am currently undertaking, but seeing as I am relatively new, I dont completely understand how it works. Here is the... (6 Replies)
Discussion started by: Makaer
6 Replies

4. Shell Programming and Scripting

Problems understanding example code

I am really new to UNIX and programming in general and so apologies if this thread is a bit simple. I have searched and found a piece of sample code for a training program I am currently undertaking, but seeing as I am relatively new, I dont completely understand how it works. Here is the... (1 Reply)
Discussion started by: Makaer
1 Replies

5. UNIX for Dummies Questions & Answers

learning about pipes!

im trying to figure out how to do the following: using pipes to combine grep and find commands to print all lines in files that start with the letter f in the current directory that contain the word "test" for example? again using pipes to combine grep and find command, how can I print all... (1 Reply)
Discussion started by: ez45
1 Replies

6. Shell Programming and Scripting

named pipes

How to have a conversation between 2 processes using named pipes? (5 Replies)
Discussion started by: kanchan_agr
5 Replies

7. UNIX for Advanced & Expert Users

Consolidating Pipes

This is something I've given a lot of thought to and come up with no answer. Say you have a data stream passing from a file, through process A, into process B. Process A only modifies a few bytes of the stream, then prints the rest of the stream unmodified. Is there any way to stream the file... (4 Replies)
Discussion started by: Corona688
4 Replies

8. UNIX for Advanced & Expert Users

FIFO Pipes

Hi...Can anyone please guide me on FIFO Pipes in UNIX.I have lerant things like creating fifo pipes,using them for reads and writes etc.I want to know what is the maximum amount of memory that such a pipe may have? Also can anyone guide me on where to get info on this topic from? (4 Replies)
Discussion started by: tej.buch
4 Replies

9. Shell Programming and Scripting

cd using pipes

Hi, Can the cd command be invoked using pipes??? My actual question is slightly different. I am trying to run an executable from different folders and the path of these folders are obtained dynamically from the front end. Is there a way in which i can actually run the executable... (2 Replies)
Discussion started by: Sinbad
2 Replies

10. Filesystems, Disks and Memory

PIPEs and Named PIPEs (FIFO) Buffer size

Hello! How I can increase or decrease predefined pipe buffer size? System FreeBSD 4.9 and RedHat Linux 9.0 Thanks! (1 Reply)
Discussion started by: Jus
1 Replies
Login or Register to Ask a Question