awk Associative Array and/or Referring to Field by String (Nonconstant String Value)


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers awk Associative Array and/or Referring to Field by String (Nonconstant String Value)
# 15  
Old 02-02-2019
Quote:
Originally Posted by RudiC
... ... ...

BTW, awk's default FS is a bracket expression regular expression (/[ \t\n]+/) by itself.

... ... ...
This is a common misconception. With the input we have been discussing in this thread:
Code:
 PS028,006 [KJ <Cj>] [CM< <Pr>] [QWL TXNWNJ- <Ob>]

if that were the default ERE used for separating fields, the default first field would be the empty string before the space at the start of the line. But the actual default first field is PS028,006 (with no leading or trailing <space>s).

The actual default FS value is a single <space> character which is a regex that has a special meaning in awk (i.e., it does not have this special meaning in most other utilities). It is the only utility in the standards where <space> has this special meaning in an ERE used as a field separator. In awk, when an entire field separator ERE is a single <space> character, awk is required to skip leading and trailing <blank> and <newline> characters (where a <blank> character is any character in the current Locale's blank character class) and then fields shall be delimited by sets of one or more <blank> or <newline> characters. In the C and POSIX locales, a <blank> is either a <space> character or a <tab> character; in other locales additional characters may also be included in the list of characters in the blank character class (thereby being ignored at the start and end of a record and being treated as additional elements in field separators in other places).
These 2 Users Gave Thanks to Don Cragun For This Post:
# 16  
Old 02-02-2019
Thanks, Don Cragun, for this clarification.
Indeed, man gawk is way more explicit:
Quote:
FS The input field separator, a space by default. See Fields, above. .
.
.
In the special case that FS is a single space, fields are separated by runs of spaces and/or tabs and/or newlines.
than is my man mawk:
Quote:
mawk defines <SPACE> as the regular expression /[ \t\n]+/.
which I used in my above post. man gawk does not have this statement.
This User Gave Thanks to RudiC For This Post:
# 17  
Old 02-02-2019
Note that the behavior with the default FS=" " to skip and delimit using both blanks and newlines, used to be different in older Posix implementations, where blanks were used, but not newlines. mawk and gawk still support this older POSIX defined behavior, with special compatibility command line options.

compare:
Code:
$> echo "1.   222   333.
444.   555.666" | mawk '{print $1}' RS=.
1
222
444
555
666
$>

to
Code:
$> echo "1.   222   333.
444.   555.666" | mawk -W posix_space '{print $1}' RS=.
1
222

444
555
666

$>

Likewise for gawk with the --posix option.

Last edited by Scrutinizer; 02-02-2019 at 10:13 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 18  
Old 02-04-2019
Quote:
Originally Posted by RudiC
It does. Please apply what has been said to the repective line:
Code:
 PS028,006 [KJ <Cj>] [CM< <Pr>] [QWL TXNWNJ- <Ob>]
^          ^       ^ ^        ^ ^                ^--- last separator; $NF is empty
|          +-------+-+--------+-+-------------------- all FS
+---------------------------------------------------- field 1

Is that clearer now? If you want to remove the leading space from field 1, additional measures must be taken.
OK, thank you so much. I was under the impression that the field separator value was set to the *string* "][" rather than "]" or "[", thus I thought that $1 in the code would have been PS028,006 [KJ <Cj>, rather than PS028,006. This was very helpful. Thank you for taking the time to explain this.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to average field if matching string in another

In the awk below I am trying to get the average of the sum of $7 if the string in $4 matches in the line below it. The --- in the desired out is not needed, it is just to illustrate the calculation. The awk executes and produces the current out. I am not sure why the middle line is skipped and the... (2 Replies)
Discussion started by: cmccabe
2 Replies

2. UNIX for Beginners Questions & Answers

String has * as the field delimiter and I need echo/awk to escape it, how?

Hi, I am trying to read an Oracle listener log file line by line and need to separate the lines into several fields. The field delimiter for the line happens to be an asterisk. I have the script below to start with but when running it, the echo command is globbing it to include other... (13 Replies)
Discussion started by: newbie_01
13 Replies

3. Shell Programming and Scripting

Awk: Dealing with whitespace in associative array indicies

Is there a reliable way to deal with whitespace in array indicies? I am trying to annotate fails in a database using a table of known fails. In a begin block I have code like this: # Read in Known Fail List getline < "'"$failListFile"'"; getline < "'"$failListFile"'"; getline <... (6 Replies)
Discussion started by: Michael Stora
6 Replies

4. Shell Programming and Scripting

Split string into map (Associative Array)

Hi Input: { committed = 782958592; init = 805306368; max = 1051394048; used = 63456712; } Result: A map (maybe Associative Array) where I can iterate through the key/value. Something like this: for key in $map do echo key=$key value=$map done Sample output from the map: ... (2 Replies)
Discussion started by: chitech
2 Replies

5. Shell Programming and Scripting

sed or awk command to replace a string pattern with another string based on position of this string

here is what i want to achieve... consider a file contains below contents. the file size is large about 60mb cat dump.sql INSERT INTO `table1` (`id`, `action`, `date`, `descrip`, `lastModified`) VALUES (1,'Change','2011-05-05 00:00:00','Account Updated','2012-02-10... (10 Replies)
Discussion started by: vivek d r
10 Replies

6. Shell Programming and Scripting

Help needed on Associative array in awk

Hi All, I got stuck up with shell script where i use awk. The scenario which i am working on is as below. I have a file text.txt with contents COL1 COL2 COL3 COL4 1 A 500 400 1 B 500 400 1 A 500 200 2 A 290 300 2 B 290 280 3 C 100 100 I could able to sum col 3 and col4 based on... (3 Replies)
Discussion started by: imsularif
3 Replies

7. Homework & Coursework Questions

passing letters from an array into a string for string comparison

attempting the hangman program. This was an optional assignment from the professor. I have completed the logical coding, debugging now. ##I have an array $wordString that initializes to a string of dashes ##reflecting the number of letters in $theWord ##every time the user enters a (valid)... (5 Replies)
Discussion started by: lotsofideas
5 Replies

8. Shell Programming and Scripting

awk, associative array, compare files

i have a file like this < '393200103052';'H3G';'20081204' < '393200103059';'TIM';'20110111' < '393200103061';'TIM';'20060206' < '393200103064';'OPI';'20110623' > '393200103052';'HKG';'20081204' > '393200103056';'TIM';'20110111' > '393200103088';'TIM';'20060206' Now i have to generate a file... (9 Replies)
Discussion started by: shruthi123
9 Replies

9. Shell Programming and Scripting

Awk Search text string in field, not all in field.

Hello, I am using awk to match text in a tab separated field and am able to do so when matching the exact word. My problem is that I would like to match any sequence of text in the tab-separated field without having to match it all. Any help will be appreciated. Please see the code below. awk... (3 Replies)
Discussion started by: rocket_dog
3 Replies

10. Shell Programming and Scripting

Problem with lookup values on AWK associative array

I'm at wits end with this issue and my troubleshooting leads me to believe it is a problem with the file formatting of the array referenced by my script: awk -F, '{if (NR==FNR) {a=$4","$3","$2}\ else {print a "," $0}}' WBTSassignments1.txt RNCalarms.tmp On the WBTSassignments1.txt file... (2 Replies)
Discussion started by: JasonHamm
2 Replies
Login or Register to Ask a Question