A twisted feature of sort when called in awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting A twisted feature of sort when called in awk
# 1  
Old 12-04-2009
A twisted feature of sort when called in awk

Hello,

In the book Classic Shell Scripting, there is an example showing how you can call a shell command through awk's pipe:

Code:
for (name in telephone){
    print name "\t" telephone[name] | "sort"
}
close("sort")

The output, as seen from the terminal, is perfectly the sorted list of name-tel pairs. Yet a result like such, however, is not theoretically expected. Since sort appears within the loop body, it should sort each time when a name-tel pair is printed. And thus it should have sorted nothing since the data are separatedly (pairwise) fed to it.

Searching throughout the network I found this:
Quote:
The sort could even be done from within the program:
Code:
sort = "sort -k 2nr"
for (word in freq)
    printf "%s\t%d\n", word, freq[word] | sort

close(sort)

This way of sorting must be used on systems that do not have true pipes at the command-line (or batch-file) level. See the general operating system documentation for more information on how to use the sort program.
But what does this explanation really mean?

Thanks for reading the thread, and please share your thoughts!
# 2  
Old 12-04-2009
Quote:
Yet a result like such, however, is not theoretically expected. Since sort appears within the loop body, it should sort each time when a name-tel pair is printed
The output is correct because this is how the AWK pipes work: in this case AWK opens the pipe once, during the first iteration of the loop. If I understand correctly, you would expect it to work this way:

Code:
% awk 'BEGIN {
  n = split("c a b", t)
  while (++i <= n) {
    print t[i] | "sort"
  close("sort")
   }
  }'
c
a
b

Note that the close command is inside the loop.

But it works like this:

Code:
% awk 'BEGIN {
  n = split("c a b", t)
  while (++i <= n)  
    print t[i] | "sort"
  }'
a                  
b
c

The AWK redirection is a source of confusion too, because it works the same way (you don't need >> to append).

Hope this helps.
# 3  
Old 12-04-2009
Thanks, Radoulov. Your code explicitly clarified how AWK behaves when it comes to pipe. It really helps me get a lot. But,

When I slightly changed your code as follows:

Code:
awk 'BEGIN {
    n = split("c a b", t)
    while (++i <= n)  
        print t[i] | "sort"
    print "hello world!"
    close("sort")
}'

and it turns:
Code:
hello world!
a
b
c

What I learn from this result is that only by pushing a pipe with close() or fflush() can the command be indeed executed, or it would only accumulate and hold the data coming from the upstream. Am I right?
# 4  
Old 12-04-2009
The command begins executing when the pipe opens.
The data will be buffered and you see the result when the output buffers gets flushed (automatically or manually).

Or you expected to see this:

Code:
% awk 'BEGIN {
  n = split("c a b", t)
  while (++i <= n) {
    print t[i] | "sort"
    print "hello world!"
  }
  close("sort")
}'   
hello world!
hello world!
hello world!
a
b
c

As you probably already know, you need the braces when the block contains more than one command.
# 5  
Old 12-04-2009
Sorry, I'm lost here...

Quote:
The command begins executing when the pipe opens.
If this is the case how "sort" can know the end of data it is to process? For example, if we run "cat file | sort" in shell the second command sort will get the entire input (at least I felt so) from cat before it kicks off.

By the way, I was not meant to see
Code:
hello world!
hello world!
hello world!
a
b
c

I was just going to confirm which will be displayed first, "hello world"(just one line) or a sorted "a b c".

Last edited by nrgbooster; 12-04-2009 at 08:53 AM..
# 6  
Old 12-04-2009
Quote:
Originally Posted by nrgbooster
If this is the case how "sort" can know the end of data it is to process? For example, if we run "cat file | sort" in shell the second command sort will get the entire input (at least I felt so) from cat before it kicks off.
No, as you say sort needs the entire input in order to be able to produce the final result, but it's the sort command that does the job (manages/accumulates the input). It starts executing right after the pipe opens, that's what I meant.

If by kicks off you mean begins sorting the data - may be - but I'm not sure, I don't know if it sorts the already received chunks before producing the final output.

That's why the Unix pipelines are so efficient, because the connected programs actually run in parallel (think about the programs that don't need the entire input in order to begin their work).
# 7  
Old 12-04-2009
Then I must have failed to understand the underlying mechanism of pipe. I had always thought that the command in a pipe will never do its job until the preceding one completes and hands over the output.

In the case of sort, I would venture that if it allows paralleled processing, it has to employ Insertion Sorting method or something like that so that it need not know the full data set in advance.

Anyway thanks for your help! I need to make some deeper investigation.
Login or Register to Ask a Question

Previous Thread | Next Thread

7 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk call in bash function called with arugments not working, something lost in translation?

Hello, I have this awk code in a bash script to perform a find and replace task. This finds one unique line in a file and substitutes the found line with a replacement. #! /bin/bash # value determined elsewhere total_outputs_p1=100 # file being modified... (5 Replies)
Discussion started by: LMHmedchem
5 Replies

2. Shell Programming and Scripting

Sort String using awk

Hi, I need help to sort string using awk. I don't want to use sed or perl as I want to add this functionality in my existing awk script Basically I have a variable in AWK which is string with comma separated value. I want to sort that string before using that variable in further processing for... (10 Replies)
Discussion started by: rocky.community
10 Replies

3. Shell Programming and Scripting

Passing awk variable argument to a script which is being called inside awk

consider the script below sh /opt/hqe/hqapi1-client-5.0.0/bin/hqapi.sh alert list --host=localhost --port=7443 --user=hqadmin --password=hqadmin --secure=true >/tmp/alerts.xml awk -F'' '{for(i=1;i<=NF;i++){ if($i=="Alert id") { if(id!="") if(dt!=""){ cmd="sh someScript.sh... (2 Replies)
Discussion started by: vivek d r
2 Replies

4. Shell Programming and Scripting

AWK new line feature getting nullified in outlook

Hi, I am using the below code to send mail to users showing the details of errors, and this happens in a loop till all the lines are validated. printf... (2 Replies)
Discussion started by: ramkiran77
2 Replies

5. Shell Programming and Scripting

awk sort

input file abc1 abc23 abc12 abc15 output abc1 abc12 abc15 abc23 (9 Replies)
Discussion started by: yanglei_fage
9 Replies

6. Shell Programming and Scripting

Sort in AWK

Hi, I usually use Access to sort data however for some reason its not working. Our systems guys and myself cannot figure it out so ive tried to use AWK to do the sorting. The file is made up of single lines in the format ... (4 Replies)
Discussion started by: eknryan
4 Replies

7. Homework & Coursework Questions

awk sort help

1. The problem statement, all variables and given/known data: I dont know what I do wrong, I am trying to create shell programming database: I have this command first: && > $fname ... echo $Name:$Surname:$Agency:$Tel:$Ref: >> $fname then I have echo " Name Surname Agency Tel... (2 Replies)
Discussion started by: jeht
2 Replies
Login or Register to Ask a Question