Programming

View Public Profile for ab_tall

10-07-2011

Registered User

17, 0

Join Date: Oct 2011

Last Activity: 18 September 2012, 5:31 AM EDT

Posts: 17

Thanks Given: 4

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by Corona688

Nope. One pipe is one pipe, for a chain of 10 processes you need 9 pipes.

Not sure what you mean by "adjusting the FD". The FD's are just numbers, sure, but they're numbers representing things in the kernel's table of open files. Add three to it and the kernel won't give you pipe number three, it'll say "What? You never opened file #67."

6 is just a number, perhaps the 6th file your process happened to open, it doesn't mean "pipe number 6". Closing #6 doesn't close everyone else's #6.

It's completely okay to have the same file open and used in many different processes, too -- that's how shells work. When you run echo a or cat, they receive copies of the shell's own open file descriptors, and just write to those direct: The shell doesn't have to write it for them.

Pipes obviously know to wait until the process writing to them finishes before saying they're done, but that works for more than one process too. If you have two processes with copies of the write-end, the kernel will wait for both of them to die before the pipe gives up -- even if one of the open ends was just left open by accident.

Every process is independent. Close everything you don't need. fork() creates a new, independent process. Threads are something else entirely.

When you create a thread it works in the same process, literally sharing all the same memory, all the same files. Change it in one thread and it changes in all of them. That's why threads can be so tricky -- it's easy to rip the floor out from under your threads by altering something they're using simultaneously.

Regarding the last portion,

Quote:

When you create a thread it works in the same process, literally sharing all the same memory, all the same files. Change it in one thread and it changes in all of them. That's why threads can be so tricky -- it's easy to rip the floor out from under your threads by altering something they're using simultaneously.

What I meant to say is I am having a hard time debugging the child process once it execs, as gdb is attached to the parent.
(I thought as there is no explicit concept of threads in Linux, it would be OK to call the child process a thread, but i guess that lead to confusion.)

Quote:

the kernel will wait for both of them to die before the pipe gives up -- even if one of the open ends was just left open by accident.

This point is what I missed out. Will keep this in mind in the future.

ab_tall

Find all posts by ab_tall

10-07-2011

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Quote:

Originally Posted by ab_tall

What I meant to say is I am having a hard time debugging the child process once it execs, as gdb is attached to the parent.
(I thought as there is no explicit concept of threads in Linux, it would be OK to call the child process a thread, but i guess that lead to confusion.)

What system are you on? On linux, you can use strace, which lists all system calls your program and its children are making. (-f means 'follow children') Just system calls, only system calls, and nothing but system calls -- not line numbers or source code. But it's useful for clearing up mysteries like "why is my program freezing" -- it's stuck up on write(). I've also used it to track down where some silly programs were looking for config files in -- just hunt for open calls to see what files they're trying to open...

It ends up as an awful big list, but it's easy to cut down with grep.

Code:

$ gcc multipipe.c
$ strace ./a.out 2> log
<process runs and finishes>
$ egrep "(fork|execve|open|close|pipe|dup|read|write)\(" log
execve("./a.out", ["./a.out"], [/* 28 vars */]) = 0
open("/etc/ld.so.cache", O_RDONLY)      = 3
close(3)                                = 0
open("/lib/libc.so.6", O_RDONLY)        = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\20m\1\0004\0\0\0"..., 512) = 512
close(3)                                = 0
pipe([3, 4])                            = 0
[pid  9380] close(4 <unfinished ...>
[pid  9381] close(4 <unfinished ...>
[pid  9380] pipe( <unfinished ...>
[pid  9381] close(3 <unfinished ...>
[pid  9381] execve("/usr/local/bin/ls", ["ls"], [/* 28 vars */] <unfinished ...>
[pid  9380] close(5 <unfinished ...>
[pid  9381] execve("/usr/bin/ls", ["ls"], [/* 28 vars */] <unfinished ...>
[pid  9380] close(3 <unfinished ...>
[pid  9381] execve("/bin/ls", ["ls"], [/* 28 vars */] <unfinished ...>
[pid  9382] close(3Process 9383 attached
[pid  9380] close(4 <unfinished ...>
[pid  9382] close(5 <unfinished ...>
[pid  9382] close(4 <unfinished ...>
[pid  9382] execve("/usr/local/bin/tac", ["tac"], [/* 28 vars */] <unfinished ...>
[pid  9381] open("/etc/ld.so.cache", O_RDONLY <unfinished ...>
[pid  9382] execve("/usr/bin/tac", ["tac"], [/* 28 vars */] <unfinished ...>
[pid  9383] close(4 <unfinished ...>
[pid  9381] close(3 <unfinished ...>
[pid  9381] open("/lib/librt.so.1", O_RDONLY <unfinished ...>
[pid  9381] read(3,  <unfinished ...>
[pid  9382] open("/etc/ld.so.cache", O_RDONLY <unfinished ...>
[pid  9382] close(3 <unfinished ...>
[pid  9381] close(3 <unfinished ...>
[pid  9382] open("/lib/libc.so.6", O_RDONLY <unfinished ...>
[pid  9383] execve("/usr/local/bin/less", ["less"], [/* 28 vars */] <unfinished ...>
[pid  9382] read(3,  <unfinished ...>
[pid  9383] execve("/usr/bin/less", ["less"], [/* 28 vars */] <unfinished ...>
[pid  9381] open("/lib/libacl.so.1", O_RDONLY) = 3
[pid  9381] read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\32\0\0004\0\0\0"..., 512) = 512
[pid  9382] close(3 <unfinished ...>
[pid  9383] open("/etc/ld.so.cache", O_RDONLY <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/lib/libncurses.so.5", O_RDONLY <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9381] close(3)                    = 0
[pid  9381] open("/lib/libc.so.6", O_RDONLY) = 3
[pid  9381] read(3,  <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/lib/libc.so.6", O_RDONLY <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9382] open("/tmp/tacpBNVeU", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600 <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/lib/libdl.so.2", O_RDONLY <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9382] read(0,  <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9381] close(3 <unfinished ...>
[pid  9381] open("/lib/libpthread.so.0", O_RDONLY <unfinished ...>
[pid  9381] read(3,  <unfinished ...>
[pid  9381] close(3 <unfinished ...>
[pid  9381] open("/lib/libattr.so.1", O_RDONLY <unfinished ...>
[pid  9381] read(3,  <unfinished ...>
[pid  9381] close(3 <unfinished ...>
[pid  9383] open("/etc/terminfo/x/xterm", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] close(3)                    = 0
[pid  9383] open("/usr/bin/.sysless", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] open("/etc/sysless", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] open("/home/username/.less", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] open("/home/username/.lesshst", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/dev/tty", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9381] open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC <unfinished ...>
[pid  9381] close(3 <unfinished ...>
[pid  9383] write(1, "\33[?1049h\33[?1h\33=", 15 <unfinished ...>
[pid  9383] read(0,  <unfinished ...>
[pid  9381] write(1, "a.out\nlog\nmultipipe.c\n", 22 <unfinished ...>
[pid  9382] read(0,  <unfinished ...>
[pid  9381] close(1 <unfinished ...>
[pid  9382] write(3, "a.out\nlog\nmultipipe.c\n", 22 <unfinished ...>
[pid  9381] close(2 <unfinished ...>
[pid  9382] read(3, "a.out\nlog\nmultipipe.c\n", 22) = 22
[pid  9382] close(0)                    = 0
[pid  9382] write(1, "multipipe.c\nlog\na.out\n", 22 <unfinished ...>
[pid  9383] write(1, "multipipe.c\33[m\nlog\33[m\na.out\33[m\n", 31 <unfinished ...>
[pid  9382] close(1 <unfinished ...>
[pid  9383] read(0,  <unfinished ...>
[pid  9382] close(2 <unfinished ...>
[pid  9383] write(1, "\33[7mlines 1-3/3 (END) \33[27m\33[K", 30 <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] write(1, "\r\33[K\33[?1l\33>\33[?1049l", 19) = 19

# what about just process 9383?  what was it doing?
$ egrep "(fork|execve|open|close|pipe|dup|read|write)\(" log | grep 9383

[pid  9382] close(3Process 9383 attached
[pid  9383] close(4 <unfinished ...>
[pid  9383] execve("/usr/local/bin/less", ["less"], [/* 28 vars */] <unfinished ...>
[pid  9383] execve("/usr/bin/less", ["less"], [/* 28 vars */] <unfinished ...>
[pid  9383] open("/etc/ld.so.cache", O_RDONLY <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/lib/libncurses.so.5", O_RDONLY <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/lib/libc.so.6", O_RDONLY <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/lib/libdl.so.2", O_RDONLY <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/etc/terminfo/x/xterm", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] close(3)                    = 0
[pid  9383] open("/usr/bin/.sysless", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] open("/etc/sysless", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] open("/home/username/.less", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] open("/home/username/.lesshst", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/dev/tty", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] write(1, "\33[?1049h\33[?1h\33=", 15 <unfinished ...>
[pid  9383] read(0,  <unfinished ...>
[pid  9383] write(1, "multipipe.c\33[m\nlog\33[m\na.out\33[m\n", 31 <unfinished ...>
[pid  9383] read(0,  <unfinished ...>
[pid  9383] write(1, "\33[7mlines 1-3/3 (END) \33[27m\33[K", 30 <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] write(1, "\r\33[K\33[?1l\33>\33[?1049l", 19) = 19

$

lots of options for strace too, see the manpage.

---------- Post updated at 04:52 PM ---------- Previous update was at 04:36 PM ----------

Quote:

Originally Posted by ab_tall

how is the pipe underlying structure implemented?
If it is just another file, is it possible to see the contents of the pipe {so as to know what's being passed on between the read and write ends}.

It's not a file on disk. It's a memory buffer inside the kernel itself. The kernel keeps track of how many processes have which ends open and which processes need to be stopped or started when data becomes available or room in the buffer becomes available. When everything using the pipe finally closes it or exits, the kernel sees that nothing needs the buffer anymore and deletes it.

The buffer can vary in size on different systems. In most Linux I think it's 64 kilobytes.

Quote:

Finally,
Is there a way to find out which FDs point to the same underlying description?
If it's hidden somewhere in lsof, i'll dig deeper, but if not do let me know.

In Linux and a few UNIXes, you can see that under /proc/. Try this:

Code:

$ echo | ls -l /proc/self/fd
lr-x------ 1 username users 64 Oct  7 16:47 0 -> pipe:[618262]
lrwx------ 1 username users 64 Oct  7 16:47 1 -> /dev/pts/0
lrwx------ 1 username users 64 Oct  7 16:47 2 -> /dev/pts/0
lr-x------ 1 username users 64 Oct  7 16:47 200 -> /home/username/.ssh-agent
lr-x------ 1 username users 64 Oct  7 16:47 3 -> /proc/10073/fd

'self' is just a special folder meaning 'my own process number', so ls is listing its own open files. You could 'ls /proc/1234' to list process 1234's files.

0 is stdin, the pipe attaching it to 'echo'. I think 618262 is a unique number specific to that particular pipe. It's not a valid link, you can't open it -- it's just informational.

1 and 2 are stdout and stderr, both attached to the same terminal here. They're actually valid symlinks, try

Code:

echo asdf > /proc/fd/self/1

200 is a file my terminal opens on login, just a little script I set up to keep my SSH keys straight.

3 is the directory /proc/self/fd, which ls opened so it could list its own open files. The kernel decided 'self' meant 10073.

Sockets also show up in this list, if you have any open, being that sockets are FD's too...

Last edited by Corona688; 10-07-2011 at 07:41 PM..

This User Gave Thanks to Corona688 For This Post:

Corona688

View Public Profile for ab_tall

10-07-2011

Registered User

17, 0

Join Date: Oct 2011

Last Activity: 18 September 2012, 5:31 AM EDT

Posts: 17

Thanks Given: 4

Thanked 0 Times in 0 Posts

I am on Ubuntu 64 11.04. Lots to digest for one post.

Will try out some stuff and get back.

---------- Post updated at 07:49 PM ---------- Previous update was at 07:13 PM ----------

Thank guys for the input , but I was wondering if mainstream shells like bash/csh use a structure similar to Corona's eg. for their execution.

If that's the case, they'd be need to create separate executables for all their builtin commands as a part of their initialization sequences. That doesn't really make the builtins much different from external commands then, doesn't it?

ab_tall

Find all posts by ab_tall

10-07-2011

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Quote:

Originally Posted by ab_tall

Thank guys for the input , but I was wondering if mainstream shells like bash/csh use a structure similar to Corona's eg. for their execution. If that's the case, they'd be need to create separate executables for all their builtin commands as a part of their initialization sequences.

There are external commands to match almost every shell builtin -- the UNIX standard requires it. You don't always get a shell as fancy as you want, so there has to be a fallback.

But no -- I don't think so. Shell builtins are definitely not the same as externals, builtins are clearly faster.

Look carefully at what builtins do for you. There's commands like read which read from stdin, and commands like echo which write to stdout -- but you don't get things like cat which do both. That's intentional -- it can keep the builtin entirely inside the shell without risking deadlocking(parts of the shell itself waiting for other parts of the shell itself). It just does whatever's next in the list and carries on, if there's any wait involved its not it's fault.

echo is a particularly simple to do with a builtin. I tried to build a shell once, and managed situations like echo | cat like this:

Code:

int pipefd[2], status;
pipe(pipefd);

write(pipefd[1], "the owls are not what they seem\n", 32);
close(pipefd[1]);

if(fork() == 0)
{
        dup2(pipefd[0], 0);
        close(pipefd[0]);
        execlp("cat", "cat", NULL);
        perror("couldn't exec");
        exit(1);
}

close(pipefd[0]);
wait(&status);

As long as the message is smaller than the pipe's buffer, you don't have to wait -- just squirt it in the write end and close.

---------- Post updated at 06:38 PM ---------- Previous update was at 06:20 PM ----------

I think what I ended up doing was building a list much more complicated than {"echo", "tac", "more", NULL}, it was a structure with all three file descriptors(stdin/stdout/stderr), and a string for the name.

I opened everything in advance, including all pipes and redirections.

If I wanted process 0 to read from file.txt I could just go processlist[0].fd[0]=open("filename.txt", O_RDONLY);

If I wanted processes 2 and 3 joined with a pipe, I'd do

Code:

pipe(pipefd);
processlist[2].fd[1]=pipefd[1];
processlist[3].fd[0]=pipefd[0];

If I wanted a builtin to print into the first process, I'd create a pipe, squirt in a message, and shove the read-end in the array along with everything else.

And then I'd do one big loop to create every process.

Code:

for(n=0; n<numprocs; n++)
{
        if(fork()==0)
        {
                for(fileno=0; fileno<3; fileno++)
                if(processlist[n].fds[fileno] >= 0)
                        dup2(processlist[n].fds[fileno], fileno);

                // Close all the pipes! ALL OF THEM
                for(q=0; q<numprocs; q++)
                {
                        close(processlist[q].fds[0]);
                        close(processlist[q].fds[1]);
                        close(processlist[q].fds[3]);
                }

                execvp(processlist[n].name, processlist[n].args);
                exit(255);
        }
}

In retrospect, this was silly. Every time I fork()ed that huge wad of pipes I had to close so much junk that didn't need cloning in the first place.

Might be better to just do it as you go. Or maybe I should have played with close-on-exec and only copied the pipes I actually needed. (I slightly lied. Not all files get cloned on fork(), you can pick FD's you don't want being cloned and turn that off.)

Last edited by Corona688; 10-07-2011 at 09:44 PM..

Corona688

View Public Profile for ab_tall

10-07-2011

Registered User

17, 0

Join Date: Oct 2011

Last Activity: 18 September 2012, 5:31 AM EDT

Posts: 17

Thanks Given: 4

Thanked 0 Times in 0 Posts

Ok, I went the other way round,

I implemented all builtins first, thinking I could use them as needed in my pipes.
I tried to mesh that in with the prev eg. which you gave of the IPC via pipes. But then I need to create 2 versions of my builtins,

if the builtin is the 1st command in the pipe, it's executed in the same shell,
if the builtin is somewhere in the middle,
then it needs to be executed in a separately forked process => that the builtin needs to have a corresponding external command for exec to work.

i.e

Code:

if( piped commands)
{
if(isbuiltin())
{
execbuiltin() // function where i implemented all builtins.
squirt into the first pipe
}
else
{
exec as in your sample prev.
(here it fails as that code would need me to exec an external command always - which may or may not exist)
}
}

Perhaps, i need to rethink my approach.

ab_tall

Find all posts by ab_tall

10-07-2011

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Are you checking for builtins after you fork?

You should check before. The whole point is that builtins don't need fork at all since they can happen wholly inside the shell.

I don't understand why you'd be using builtins in the middle of a pipe chain in the first place. They don't work there in csh. Unless you're trying to build in things like cat, which I don't think is a good idea.

Of course external commands must exist to use external commands. What's wrong with that?

Corona688