Pipes with the < and >> meanings.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Pipes with the < and >> meanings.
# 1  
Old 04-25-2011
Pipes with the < and >> meanings.

Hey,

Well, we're starting scripting next week for my class, and I have my command done, but I don't actually understand what the meaning is, so I was just wondering if someone could "translate" this in to words so that I might be able to do this better...

Quote:
c1 | c2 | c3 < inputInfo.txt >> outputCollection.txt


command1 | command2 | command3 etc.

But yeah...

Also, the command I have now, I was wondering if their is a way to get the output of the pips to work on 2 commands rather than just one...

Like, if I had ps -A | grep [numbers] | then have this pipe to both, wc and sort, so I can get the total amount of processes and have it sorted...

Sorry for multi-questioning, I really only care about the translation though,

Thanks so much!
# 2  
Old 04-26-2011
Well, the shell is a tool that opens files of various sorts and runs executables. The shell can fork() to create clones processes of itself to run subshells or execvp() programs or scripts. For the pipe between c1 and c2, the shell fork()'s a child that:
  1. opens a connected pair of pipes using pipe(),
  2. fork()'s,
  3. on the child:
    1. close()'s stdout
    2. and the output member of the pipe() array,
    3. dup()'s the input fd of the pipe() array to be stdout,
    4. close()'s the input fd of the pipe() array, and
    5. execvp()'s c1;
  4. the parent
    1. close()'s stdin
    2. and the input member of the pipe() array,
    3. dup()'s the output fd of the pipe() array to be stdin,
    4. close()'s the output fd and
    5. execvp()'s c2.
Voila, two processes running with stdout of c1 pipe()'d to stdin of c2. Of course, it has to do similar steps for the pipe on c2 to c3.

The stdin redirect < is in the wrong place, as c3 has stdin dup()'d to the pipe. The input goes into c1, and the < filename goes after that command's arguments, none in this case. It is after arguments, because, once the shell does any expansion and removes any quotes on arguments, they are the business of execvp() and the command, not the shell. For <, the shell closes stdin (originally your tty, inherited) and then open()'s the filename for read at beginning of file to be stdin.

The open() or dup() will use the lowest number fd not in use, and stdin is 0, stdout is 1. For the overly organized, paranoid and OCD, there is dup2(a,b) to dup to a specific hole in the fd array. An fd, not to be confused with a FILE*, is just an integer to reference a pointer array in the kernel space for each process. The same open file can be pointed to by many processes and many fd in each of those processes, but it is the same open file structure. Web servers even pass open socket fd's to children through a pipe with fancy calls to duplicate the parent open file structure in a child fd. The fd's that all refer to the same open file structure will all be at the same offset (bytes for beginning) for reading or writing. In shell, this means that if one command reads 3 lines and then starts another with stdin inherited, it starts reading at line 4.\

The same filename can be open()'d many times by one or more processes, and these each get new open file structures in the kernel pointing to the same device and inode. This is not the same as many fd pointing to the same open file structure. The open file structures from each open() each have a separate offset pointer for reading or writing the file.

A FILE* is a buffered fd wrapper inside a process that reads ahead or saves output to write bigger blocks. In the shell, if you 'read' a file, it uses a FILE* on stdin and reads ahead, so the next line and a good offset are not available to any inheriting process. To deal with this, you can use command 'line', which read()'s exactly one line, one byte at a time, using a fd or unbuffered FILE*. Every process starts up with three open fd: 0, 1, 2; wrapped in three open FILE* 'stdin', 'stdout', and 'stderr'; stdout is buffered for bulk output, and stderr is unbuffered for error messages that must not be in a buffer if the process crashes out with a fatal signal and perhaps a core file. Wise programmers use or reuse them first before making more fd/FILE*.

Output redirection >> and > are like < but for stdout to write to the filename, possibly creating it; > truncates any old data. More does less, a recurrent theme in shell. For instance, adding '&' means don't wait() for the child to finish. By default, to keep things simple, the shell wait()'s for all the new children to finish before reading more input. Too many & background jobs can exhaust your resources or lock up the system.

A good friend of the pipe is () the subshell. It shares/inherits one stdin, one stdout and one stderr for all commands executed inside it, concatenating output if no '&' or mixing if processes all write in rotation. Further, if there are many background processes and child processes, and you collect all their stderr using "( ... ) 2>&1 | ....", then all processes within must close stderr (exit) before the subshell finishes. This ensures you can see all error messages before proceeding, not missing any from slow, somewhat disconnected child processes.

The reverse of that is tee, which takes one stdin stream and writes a file and stdout the same. Under ksh where there is /dev/fd/# and under bash, you can put redirect to subshell in place of a file name:
Code:
 >( something you can write )
 and
 <( something that writes to you )

.

Pipes are much nicer than temp files, as you never run out of space or have name collisions where someone else owns a file under that name, plus you get 'pipeline parallelism'. For instance, this sorts two files using up to 3 CPUs:
Code:
sort -m <( sort file1 ) <( sort file2 )


Last edited by DGPickett; 04-26-2011 at 03:40 PM..
These 6 Users Gave Thanks to DGPickett For This Post:
# 3  
Old 04-26-2011
wow... what a thorough explanation. Maybe DGPickett's reply should be archived as a tutorial of sorts?
# 4  
Old 04-27-2011
You are very kind! It's dangerous to encourage me! Smilie



Knowing "how simple it really is" is simpler than learning tons by rote. Using truss/tusc/strace can be very educational, showing you all the calls under the covers. To round it out (never ends), this needs:
  1. <<WORD\n . . . \nWORD\n converts script lines to stdin (shell writes it to a temp file). It is nice to manage lists on separate lines and through pipes, not on the command line explicity and flowing horizontally unaligned all over the page. The <<-WORD variant allows you to add prefix tabs for indentation that are removed. Some command line expansions are done here, so for complex data, I prefer the more predictable "echo 'whatever . . .' | . . . ." (use single quotes first, always, as they are less meta-processed (more literal) than double quotes; slip out of single to double where you need it and then go back to single, e.g.: echo 'I got home at '"$time"', well before curfew.' Think of echo as a conversion tool to take command line strings to stdout, the opposite of the next item.
  2. $( command . . . ) When not in sh, this is nicer than `command . . .` 2 ways: nestable and has vi % support to match parens. Converts stdout to a string on the command line; may contain multiple commands whose stdout is inherited, concatenated,
  3. $(<file) Turns the content of a file into a command line string,
  4. ' . . . | command . . . $(cat)' Allows all of the output of a pipeline of commands to end up on the command line without the mess of nesting it all in `` or $(),
  5. ' . . . | xargs -n999 command . . .' Similar, but runs as many command lines of limited length as necessary (scales well), and starting sooner so you get lower latency,
  6. $*, $#, shift, $@, $0-9 Managing the command line args,
  7. . . . | read variable_name Getting the one value or first line only from a command or pipeline of commands,
  8. | while read variable_name do . . . done Processing each line comng out of the pipeline separately, with low latency, robust for all data set sizes but scaling not so well (shell processing speed possibly with one or more fork/exec per line processed).
  9. Named pipes from "mknode pipe_path p" or "mkfifo pipe_path", especially useful if you do not have the /dev/fd/# ksh <() >() (bash uses named pipes under the covers for >() <(), ksh does not).
  10. ${var_name#pattern} with #, ##, %, %%: Chews off nose # minimally or ## aggressively or tail %, %%. Mnemonic: Pound on the nose, get your percent in the end.
  11. trapping signals and kill,
  12. advanced data and storage: typedef, arrays, relational maps (value addressed arrays).
  13. export for child processes, subshells, and how to stay in the first shell so you can modify variables in subroutines and
  14. sourced files (". filename" includes another file in the stdin flow of this shell).
Mostly a basic ksh list, ksh93 and bash goes on much farther. Begs the need for the arithmetic bits, date/time bits, test, with regex matching, case. I love 'case', because it structures testing so all cases are addressed, and facilitates nice commenting! Just use balanced () in case patterns, so the vi % feature is not incapacitated.

BTW, a nice use of 'line' and subshell is to avoid using a command that must pass all the data, just to remove the header line(s) before processing:
Code:
( . . . | ( line >&2 ; sort ) >output_file ) 2>>log_file

Subshell is also great for collecting logging or multipel commands' output without saying >>file_name over and over, like the stderr to log_file above. Trailer lines are a different problem, maybe time to learn some sed:
Code:
 . . . | sed '
  ${
    w trailer_file
    d
   }
 ' | . . . .

The best training is for use, not to pass a certification by knowing every dumb feature; to know which pragmas are useful, clear, fast and robust. It does not hurt to teach formatting in a normal way, so it reads well for review and maintenance. In all code, a common sin is not formatting lists of predicates or actions on separate lines, not throwing in blank lines and indentation so multi-line formatted commands and container expressions like if stand out from surrounding single-line commands. Both the quality of your work and the size and speed of project you can execute are enhanced by good formatting, good logging and good error handling. Don't learn to be a weak, hope and prayer, cryptic script writer; it is not worth the pain or the limitation. You, too, deserve well behaved, robust, easily read, tracable code, even in shells!

PPS: if you only use cd inside subshells, you never leave ~ $HOME, and all your commands can be recalled and rerun using $EDITOR=vi and ksh set -o vi and the like. Increase you $HISTSIZE to 32767, put your $HISTFILE on permanent, and if possible net mounted, disk (not /tmp), periodically save your $HISTFILE, and recall how you did it the last time using escape /pattern. This is the nicest Character User Interface (CUI) this side of becoming an emacs addict. It saves you more typing than cd of your login shell ever did, and reduces mistakes - running a command in the wrong directory. The path you used to type in to cd can be recalled and reused with the whole command 'line' (can be many line single shell command). Recalled commands are also good starters for scripts. Really long command lines need to be passed through vi (escape /pattern v) to avoid being truncated, but still work over and over!

Last edited by DGPickett; 04-27-2011 at 11:56 AM..
These 6 Users Gave Thanks to DGPickett For This Post:
# 5  
Old 04-27-2011
That's awesome, thank you!
Although I've been using parameter expansion ${var%pat} quite a bit, the pound and percent I've been confusing all along. The mnemonic is great!
One think that I have yet to be explained is the redirection from subshell in this manner:
Code:
$ while read i ; do echo i $i ; done <(echo blah | sed 's/la/LA/')
bash: syntax error near unexpected token `<(echo blah | sed 's/la/LA/')'

But the following works:
Code:
$ while read i ; do echo i $i ; done < <(echo blah | sed 's/la/LA/')
i bLAh

Why do I need two '<' in this construct?
# 6  
Old 04-28-2011
Well, it is important to remember that <(...) or >(...) becomes a WORD openable a s a file name, like in ksh ' /dev/fd/7 ', not just a string but with implied separation, so you cannot use it in sed like this, because it looks like 2 arguments. (David Korne says he hadn't thought of it for such usage. You can strip the separation by passing it in a shell subroutine call.):
Code:
sed '
  /xyz/w '>(...)'
  /abc/w '>(...)'
  s/ .*//
 ' infile >outfile

So, some of this is silly, as 'xxx < <(yyy)' is just 'yyy | xxx'. These <() >() constructs are for places where you need a file name but want a pipe, like this:
Code:
xxx | tee >( one-parallel-process-yyy ) | second-parallel-process-zzz

In some UNIX, you have /dev/stdin, /dev/stdout and /dev/stderr, really /dev/fd/0, 1 and 2, which is enough for the occasional obstinate program with no shorthand for stdin/stdout like '-' in tar f, cat, etc. These autoomatically managed named pipes are great if there are more than 2 parallel processes. The bash management of these named pipes is flawed, BTW, bug reported -- they accumulate in /var/tmp. While the ksh in /dev/fd/# UNIX just does a pipe() call and harvests the /dev/fd/# names, the bash implementation uses mknod or mkfifo persistent named pipes, which work a bit different, a pipe() inside open() for one side and connecting to that pipe for the other side, I forget which goes first. For instance, this works to give a c progrqam a robust parallel sort, using the same named pipe twice, for input then for output. Part of the reason this works in the nature of sort, reading all input, sorting and then writing all output:
Code:
system( "rm -rf /tmp/mysort.p ; mknod /tmp/mysort.p p ; sort -o /tmp/mysort.p /tmp/mysort.p &" );
fp = fopen( "/tmp/mysort.p", wb );
while ( . . . ) { . . . ; fputs( data, data_len, fp ); }
fclose( fp );
fp = fopen( "/tmp/mysort.p", rb );
while( fgets( data, data_len, fp )){ . . . }
if ( ferror( fp )){ perror( "sort through pipe failed" ); exit( 1 ); }
fclose( fp );
system( "rm -rf /tmp/mysort.p" );

You can see the overhead of name collision, pipe creation and pipe cleanup, plus if someone else comes along and opens your pipe, it would mess things up. Named pipes were intended for a crude sort of server, where a server process keeps opening the pipe waiting for a client to open the other side the other way, and spinning off a child, somewhat like inetd spinning off server processes for every tcp connection, but it seems tricky to use the fd bidirectionally in a script, without which it is a unidirectional service, like writing a queue.

Some shells consider <() or >() subshells as your login shell's child jobs, with all the start-end notifications and such. Firing up in a subshell means they are not your children, so toss in a protective set of () if you want quiet.

Ksh also has {} to create redirection and inheritance like () but without a fork() to a subshell, but the rules are a bit demanding, so I ignore them. It turns our fork() is 10 times cheaper than execvp(), so for shell speed, count your exec's and use the built-in equivalents like read for line, pattern matching for `grep`, etc. Many nominal PERL scripts are amazingly shellish! Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

top command: abbrevations and meanings - Please !

Hi all, I was trying see some CPU utilization of a Red hat Linux machine using 'top' command. Any way I got high level idea from the out puts, but when I observed the following line: Cpu(s): 7.4%us, 0.5%sy, 0.0%ni, 91.6%id, 0.4%wa, 0.0%hi, 0.1%si, 0.0%st I couldn't make out what... (2 Replies)
Discussion started by: a99u
2 Replies

2. Programming

Pipes in C

Hello all, I am trying to learn more about programming Unix pipes in C. I have created a pipe that does od -bc < myfile | head Now, I am trying to create od -bc < myfile | head | wc Here is my code, and I know I might be off, thats why I am here so I can get some clarification. #include... (1 Reply)
Discussion started by: petrca
1 Replies

3. Shell Programming and Scripting

need meanings for FTP codes

Hi Friends, Could i get the meaning for the following FTP codes? 421 425 426 530 450 550 451 551 452 552 553 Thanks, Raja. (1 Reply)
Discussion started by: smr_rashmy
1 Replies

4. UNIX for Dummies Questions & Answers

learning about pipes!

im trying to figure out how to do the following: using pipes to combine grep and find commands to print all lines in files that start with the letter f in the current directory that contain the word "test" for example? again using pipes to combine grep and find command, how can I print all... (1 Reply)
Discussion started by: ez45
1 Replies

5. Shell Programming and Scripting

Pipes not working

Hi, thanks for b4. can anyone tell me why following not working: noUsers=$(who | cut -d" " -f1 | wc -l) What i'm trying to do is get a list of logged on users and pass it to 'wc -l' and store the output to a variable. Any ideas? (1 Reply)
Discussion started by: Furqan_79
1 Replies

6. Cybersecurity

Syslog events meanings

Hi everybody, I'm writing to know what the following event stands for. I know that the following event is about a "su to root" action but I don't have any Idea about what action could rise this message. For example If an acction performed by the root crontab, a sudo command or something like that.... (1 Reply)
Discussion started by: PVelazco
1 Replies

7. Shell Programming and Scripting

named pipes

How to have a conversation between 2 processes using named pipes? (5 Replies)
Discussion started by: kanchan_agr
5 Replies

8. Shell Programming and Scripting

where can I get exit code meanings?

I'm investigating strange behaviour on two boxes (Sun OS 5.10 and AIX 5.1) in ksh have used $? to get exit codes returned:- 137 and 34 where can I find what these mean? thank you (1 Reply)
Discussion started by: speedieB
1 Replies

9. Shell Programming and Scripting

cd using pipes

Hi, Can the cd command be invoked using pipes??? My actual question is slightly different. I am trying to run an executable from different folders and the path of these folders are obtained dynamically from the front end. Is there a way in which i can actually run the executable... (2 Replies)
Discussion started by: Sinbad
2 Replies

10. Filesystems, Disks and Memory

PIPEs and Named PIPEs (FIFO) Buffer size

Hello! How I can increase or decrease predefined pipe buffer size? System FreeBSD 4.9 and RedHat Linux 9.0 Thanks! (1 Reply)
Discussion started by: Jus
1 Replies
Login or Register to Ask a Question