awk Associative Array and/or Referring to Field by String (Nonconstant String Value)

02-02-2019

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by RudiC

... ... ...

BTW, awk's default FS is a bracket expression regular expression (/[ \t\n]+/) by itself.

... ... ...

This is a common misconception. With the input we have been discussing in this thread:

Code:

 PS028,006 [KJ <Cj>] [CM< <Pr>] [QWL TXNWNJ- <Ob>]

if that were the default ERE used for separating fields, the default first field would be the empty string before the space at the start of the line. But the actual default first field is PS028,006 (with no leading or trailing <space>s).

The actual default FS value is a single <space> character which is a regex that has a special meaning in awk (i.e., it does not have this special meaning in most other utilities). It is the only utility in the standards where <space> has this special meaning in an ERE used as a field separator. In awk, when an entire field separator ERE is a single <space> character, awk is required to skip leading and trailing <blank> and <newline> characters (where a <blank> character is any character in the current Locale's blank character class) and then fields shall be delimited by sets of one or more <blank> or <newline> characters. In the C and POSIX locales, a <blank> is either a <space> character or a <tab> character; in other locales additional characters may also be included in the list of characters in the blank character class (thereby being ignored at the start and end of a record and being treated as additional elements in field separators in other places).

These 2 Users Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

02-02-2019

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Thanks, Don Cragun, for this clarification.
Indeed, man gawk is way more explicit:

Quote:

FS The input field separator, a space by default. See Fields, above. .
.
.
In the special case that FS is a single space, fields are separated by runs of spaces and/or tabs and/or newlines.

than is my man mawk:

Quote:

mawk defines <SPACE> as the regular expression /[ \t\n]+/.

which I used in my above post. man gawk does not have this statement.

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

02-02-2019

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Note that the behavior with the default FS=" " to skip and delimit using both blanks and newlines, used to be different in older Posix implementations, where blanks were used, but not newlines. mawk and gawk still support this older POSIX defined behavior, with special compatibility command line options.

compare:

Code:

$> echo "1.   222   333.
444.   555.666" | mawk '{print $1}' RS=.
1
222
444
555
666
$>

Code:

$> echo "1.   222   333.
444.   555.666" | mawk -W posix_space '{print $1}' RS=.
1
222

444
555
666

$>

Likewise for gawk with the --posix option.

Last edited by Scrutinizer; 02-02-2019 at 10:13 AM..

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

02-04-2019

Registered User

58, 2

Join Date: Aug 2014

Last Activity: 6 April 2020, 3:03 PM EDT

Posts: 58

Thanks Given: 61

Thanked 2 Times in 2 Posts

Quote:

Originally Posted by RudiC

It does. Please apply what has been said to the repective line:

Code:

 PS028,006 [KJ <Cj>] [CM< <Pr>] [QWL TXNWNJ- <Ob>]
^          ^       ^ ^        ^ ^                ^--- last separator; $NF is empty
|          +-------+-+--------+-+-------------------- all FS
+---------------------------------------------------- field 1

Is that clearer now? If you want to remove the leading space from field 1, additional measures must be taken.

OK, thank you so much. I was under the impression that the field separator value was set to the *string* "][" rather than "]" or "[", thus I thought that $1 in the code would have been PS028,006 [KJ <Cj>, rather than PS028,006. This was very helpful. Thank you for taking the time to explain this.

jvoot

View Public Profile for jvoot

Find all posts by jvoot

UNIX for Beginners Questions & Answers

awk Associative Array and/or Referring to Field by String (Nonconstant String Value)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to average field if matching string in another

Discussion started by: cmccabe

2. UNIX for Beginners Questions & Answers

String has * as the field delimiter and I need echo/awk to escape it, how?

Discussion started by: newbie_01

3. Shell Programming and Scripting

Awk: Dealing with whitespace in associative array indicies

Discussion started by: Michael Stora

4. Shell Programming and Scripting

Split string into map (Associative Array)

Discussion started by: chitech

5. Shell Programming and Scripting

sed or awk command to replace a string pattern with another string based on position of this string

Discussion started by: vivek d r

6. Shell Programming and Scripting

Help needed on Associative array in awk

Discussion started by: imsularif

7. Homework & Coursework Questions

passing letters from an array into a string for string comparison

Discussion started by: lotsofideas

8. Shell Programming and Scripting

awk, associative array, compare files

Discussion started by: shruthi123

9. Shell Programming and Scripting

Awk Search text string in field, not all in field.

Discussion started by: rocket_dog

10. Shell Programming and Scripting

Problem with lookup values on AWK associative array

Discussion started by: JasonHamm