awk Behavior

 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers awk Behavior
# 1  
Old 08-03-2016
awk Behavior

Linux Release
Quote:
[host@localhost ~]$ more /etc/*lease
::::::::::::::
/etc/centos-release
::::::::::::::
CentOS release 6.7 (Final)
Uname details
Quote:
[host@localhost ~]$ uname -a
Linux localhost 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 06:48:29 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Data file
Quote:

[host@localhost ~]$ more dafile
10.10.10.10,house
10.10.10.11,car
10.10.10.12,boat
10.10.10.13,truck
Ive been at the command line for some time. Back as far as SCO and Interactive Unix. I have always used this construct without issues. I want to isolate the ip / field 1. As you can see .. the first line is "skipped".

Quote:
[host@localhost ~]$ awk '{FS="," }{print $1}' dafile
10.10.10.10,house
10.10.10.11
10.10.10.12
10.10.10.13
This works as expected. But again, whats changed ?
Quote:
[host@localhost ~]$ awk 'BEGIN { FS = "," };{print $1}' dafile
10.10.10.10
10.10.10.11
10.10.10.12
10.10.10.13
Thanks !
# 2  
Old 08-03-2016
A BEGIN rule is executed once only, before the first input record is read. This is the reason why below code works as expected:-
Code:
awk 'BEGIN { FS = "," };{print $1}' dafile

But in this code, FS is set only when the first input record is read:-
Code:
awk '{FS="," }{print $1}' dafile

# 3  
Old 08-03-2016
Quote:
Originally Posted by Yoda
A BEGIN rule is executed once only, before the first input record is read. This is the reason why below code works as expected:-
Code:
awk 'BEGIN { FS = "," };{print $1}' dafile

But in this code, FS is set only when the first input record is read:-
Code:
awk '{FS="," }{print $1}' dafile

I would change the statement shown above in red to:
Quote:
But, in this code FS is set after each input record is read and split into fields. This uses the new FS to split input records after the 1st one; but the default field separator is used to split the 1st record before FS is set:
Other ways to make sure that the FS you want is used to split every input line include:
Code:
awk -F',' '{print $1}' dafile
awk '{print $1}' FS=',' dafile

These 2 Users Gave Thanks to Don Cragun For This Post:
# 4  
Old 08-04-2016
Thank you Don. I checked gawk code in field.c - routines for dealing with fields and record parsing.

So record parsing happens first with default field separator, then new field separator is used to parse subsequent records.

I also noticed that function set_NF is called before record parsing. So gawk behavior for this variable is different.
Code:
awk -F, '{NF=1}{print $NF}' dafile
10.10.10.10
10.10.10.11
10.10.10.12
10.10.10.13

Any idea why developers didn't do the same with function set_FS
# 5  
Old 08-04-2016
Old awk and nawk appear inconsistent:
Code:
nawk '{print $1; FS=","; print $1}' dafile
10.10.10.10,house
10.10.10.10,house
10.10.10.11
10.10.10.11
10.10.10.12
10.10.10.12
10.10.10.13
10.10.10.13

Code:
nawk '{FS=","; print $1}' dafile
10.10.10.10
10.10.10.11
10.10.10.12
10.10.10.13

It looks like they have a "late field splitting" that occurs when a field is referenced the first time.
# 6  
Old 08-04-2016
Even though this discussion about awk intrinsics is fascinating and my horizon was expanded (a collective "thank you" to you all in this thread), just for the record:

Wouldn't the usage of shell means (variable expansion or field splitting) be less costly than the use of an external program? I suppose thread-o/p does something with the values once he split them, something along the lines of:

Code:
awk -F',' '{print $1}' datafile | while read IP ; do ..... done

In such a case it might be easier to do:

Code:
while IFS=, read IP junk ; do ..... done < datafile

or, depending on what else is done:

Code:
while read LINE ; do
     IP="${LINE%,*}"
     .....
done < datafile

bakunin
# 7  
Old 08-04-2016
Quote:
Originally Posted by Yoda
Thank you Don. I checked gawk code in field.c - routines for dealing with fields and record parsing.

So record parsing happens first with default field separator, then new field separator is used to parse subsequent records.

I also noticed that function set_NF is called before record parsing. So gawk behavior for this variable is different.
Code:
awk -F, '{NF=1}{print $NF}' dafile
10.10.10.10
10.10.10.11
10.10.10.12
10.10.10.13

Any idea why developers didn't do the same with function set_FS
I have not looked at the gawk code (and for legal reasons choose not to do so). But one might guess that a function named set_NF() would set the value of the awk NF variable. Are you really telling me that gawk sets the value of NF for a new input record BEFORE parsing that record into fields??? That makes absolutely no sense to me! How can it set NF before it parses a record into fields to determine what value should be assigned to NF? One might expect that a function like that would be called to parse an input line or AFTER parsing an input line depending on the context. In the context of reading a new record from an input file at the start of a new cycle and in the context of using the awk command:
Code:
getline

with no argument naming a variable to be assigned and with no input redirection that should happen (as well as setting $x (for 0 <= x <= NF), NR, and FNR). In the context of reading a new record from an input file using the awk command:
Code:
getline variable

with a variable, but no input redirection, NR and FNR should be updated, but NF and the current record's fields should not be modified. In the context of reading a new record from an input file using the awk command:
Code:
getline variable < file
        or
command | getline variable

with a variable and with input redirection, none of the variables NF, NR, FNR, nor the current record's fields should change.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Explaination on Behavior of awk command

Hello Admin, Could you pls explain on the below behavior of the awk command. $ awk -F">20" "/Cyclomatic complexity/ && /;add;/{print \$1}" inspect_64d_369980 | awk '{print $NF}' | sort | tail -1 65 $var=`awk -F">20" "/Cyclomatic complexity/ && /;add;/{print \$1}" inspect_64d_369980 | awk... (3 Replies)
Discussion started by: chandana hs
3 Replies

2. UNIX for Dummies Questions & Answers

Weird behavior of Vi

Hi there, I am a bit puzzled by a weird behavior of Vi. I very simply would like to add increased numbers in some files. Since I have many thousands entries per file and many files, I would like to macro it in vi. To do this, I enter the first number ("0001") on the first line and then yank... (4 Replies)
Discussion started by: hypsis
4 Replies

3. AIX

LUN Behavior

Aix 6.1, working with a nim master and nim_altmaster both LPARS have access to the same data LUN, /nimdisk I do realize the risks of having 2 servers access the same LUN, however it serves the purpose of being able to restore mksysb's to/from our DR site if necessary, at least in theory ;) ... (3 Replies)
Discussion started by: mshilling
3 Replies

4. Shell Programming and Scripting

awk print behavior weird

Hi Experts I am facing a weird issue while using print statement in awk. I have a text file with 3 fields shown below: # cat f1 234,abc,1000 235,efg,2000 236,jih,3000 # When I print the third column alone, I dont face any issue as shown below: # awk '{print $3 }' FS=, f1 1000 2000... (5 Replies)
Discussion started by: guruprasadpr
5 Replies

5. HP-UX

Unusual Behavior?

Our comp-operator has come across a peculiar ‘feature'. We have this directory where we save all the reports that were generated for a particular department for only one calendar year. Currently there are 45,869 files. When the operator tried to backup that drive it started to print a flie-listing... (3 Replies)
Discussion started by: vslewis
3 Replies

6. Programming

Strange behavior in C++

I have the following program: int main(int argc, char** argv){ unsigned long int mean=0; for(int i=1;i<10;i++){ mean+=poisson(12); cout<<mean<<endl; } cout<<"Sum of poisson: "<< mean; return 0; } when I run it, I get the... (4 Replies)
Discussion started by: santiagorf
4 Replies

7. Shell Programming and Scripting

sed behavior on hp-ux

the sed command: sed 's/^*//' file does not work on HP-UX :-( but it works fine on Linux, content of file: <tab><tab>hello output should be: hello Any ideas?? Thank you Andy (8 Replies)
Discussion started by: andy2000
8 Replies

8. Programming

Behavior of pthreads

Hi All, I ve written a small program to get started off with pthreads. I somehow feel the program doesnt meet the purpose. Please find the code and the output below. Please find my question at the bottom. #include <pthread.h> #include <stdio.h> #include <stdlib.h> void *PrintThread1(void... (4 Replies)
Discussion started by: nhrraj
4 Replies

9. Programming

ls behavior

I put this here because it is a 'behavior' type question.. I seem to remember doing ls .* and getting all the .-files, like .profile .login etc. But ls .* doesn't do that, it lsts the contents of every .*-type subdirectory. Is it supposed to? I should think that a -R should be given to... (10 Replies)
Discussion started by: AtleRamsli
10 Replies
Login or Register to Ask a Question