Sponsored Content
Top Forums Shell Programming and Scripting Awk: Dealing with whitespace in associative array indicies Post 302942823 by Michael Stora on Friday 1st of May 2015 02:30:50 PM
Old 05-01-2015
My attempts to demonstrate what I thought was happening with simpler examples on the command line were flawed and introduced additional erros (although I did duplicte the issues using Awk -F, I did not post those examples) but I don't believe my "real" code has the same flaws.

I believe that I am quoting the shell varibes correctly (or else the readlines in my code below would fail) and by specifying an alternate delimiter in my AWK statement the issues with whitespace in array indicies are not comming from either splitting or my understaning of splitting.

First I am reading in a csv file that may have non-seperating commas, whitespace, and other potentially problematic characters inside double quoted fields. So I start off with a double-quote counting parser and an alternative delimiter (the old ASCII unit seperator form puncard/paper tape days).

Code:
 
delim=$'\037' # ASCII Unit Seperator (US)
 
awk '{ quote=0; for(i=1;i<=length;i++)
                   { ch=substr($0, i, 1)
                     if ( ch == "\"" ) quote=( ++quote % 2)
                         else if ( quote == 0 && ch == ",") ch="'"$delim"'"
                     printf ch }
       print ""
     }' "$scratchDir""My_Input_File" |

Next I have a rather cryptic and very long awk command routine that transposes data and then stacks 21 variable columns into two columns (a variable name column and a variable value column and adds some other valus form shell variables to the pipe). It is very long and very complicates (as well as cryptic) so Iwill omit that part (it is working exactly as expected).

The next part of a code manipulates the data based on values in three different configuration files. I think you will find that I am quoting the external file names from BASH correctly. If I don't get the nesting of single and double quotes exactly right the readlines fail.

limit file contains a list of spec limits for different parameters (in the first column). For both areas considered sensitive and areas considered insensitive (two different columns). Then there is a file with a list of sensitive areas. finally there is a file with a list of known issues that I wish to substitute.
Edit: In reponse to you the question in your edit, these config files I am reading in are comma seperated values. They do not have the issues of commas inside quotes but one of them does have spaces inside rows of a few columns.

Code:
# Look up and apply limits.
awk -F "$delim" 'BEGIN { OFS=FS
                 # Read in Vibrtation Limits
                 getline < "'"$limitFile"'" # Header Row
                 while (getline < "'"$limitFile"'") { split( $0, a, ","); vLim[a[1]]=a[2]; vLim[a[1]"Sen"]=a[3] }
                 close("'"$limitFile"'")
                 # Read in Sensitive Bay List 
                 getline < "'"$bayListFile"'" # Header Row
                 while (getline < "'"$bayListFile"'") { split( $0, a, ","); sBL[a[1]a[2]]="Sen" }
                 close("'"$bayListFile"'") 
                 # Read in Known Fail List
                 getline < "'"$failListFile"'"; getline < "'"$failListFile"'"; getline < "'"$failListFile"'" # Header Rows
                 while (getline < "'"$failListFile"'") { split( $0, a, ","); i=a[1]a[2]a[3]a[4]a[5]; gsub ( " ", "", i ); failMessage[i]=a[8]
                     fs=a[6]; sub ( "^$", "0000 01 01 00 00 00", fs ); failStart[i]=fs
                     fe=a[7]; sub ( "^$", "9999 12 31 23 59 59", fe ); failEnd[i]=fe }
                 close("'"$failListFile"'") }    
         NR == 1 { print $0 } 
         NR > 1 { if ( $9 ~ /Hz/ ) { limit=vLim[$9sBL[$2$3]]; $11=limit # works great since $9 $2 and $3 never have whitespace
                      if ( $10 > limit) { $12 = "Fail"; i=$2$3$4$5$9; gsub ( " ", "", i ) # sometimes $3, $4, or $5 have spaces.  My code now works with the gsub removing spaces from the index but my purpose in posting was to better understand how Awk handles whitespace in indexes ("help me understand more" not "help me write a script")
                         if ( i in failMessage ) { now = mktime(year" "month" "day" "hour" "min" "sec) #I'm still writing this time aware part and it is not part of my question
                        now = mktime(2015 01 01 00 00 00) # debug
                            if ( now >= mktime(failStart[i]) && now <= mktime(failEnd[i]) ) {
                                $12 = "Known Fail"; if ($7 == "\"\"") gsub ( "\"$", "Known Fail: "failMessage[i]"\"", $7 ) #column 7 is a double quoted sting.  When "null" it is actually ""
                                                    else gsub ( "\"$", "|Known Fail: "failMessage[i]"\"", $7 ) } } } #I convert "|" into DOS style new lines at a later time in my code.  When there is already a comment in $7 I want a new line between my known failure mode message
                      else $12 = "Pass"; print $0 }
                }' |
# Remove alternative delimiter
sed -e 's/'"$delim"'/,/g' > "$scratchDir""My Output File"

Of course a ton of code before and after what I included exists but it is outside the context of my question

Just as a reminder of the context of my question, the code works now that I am removing spaces from the parts in PINK using the gsub commands. My question is educational (what does awk do with whitespace in a index assignment)?

Also in general my questions in UNIX.com are not even about getting something to work but getting something to work efficiently when parsing huge files. In this particular project I am dealing with about 10,000 files that end up in a ~200k line database but in some other projects I am dealing with more than 50 million lines of data. In the case of this particular project I got the running time down from 70 min for a year of data to less than 5 min for 2 years of data. This involved moving a lot of BASH code to AWK, eliminating utility calls (using only built-ins in the interest of speed, even if more complex ) and file interactions in any kind of loop or itterative part and timing alternate versions of portions of my code in AWK, PERL, SED, BASH etc and picking the best performing one. I appologise if my questions about understanding more about how something works or if there are alternatives are being misinterperated as "this is broken, show me an implementation" types of questions. Generally when I ask these questions I have something working but I have a suspicion that there is a better/faster way.

Mike

Last edited by Michael Stora; 05-01-2015 at 04:01 PM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Associative Array

Hi, I am trying to make an associative array to use in a popup_menu on a website. Here is what i have: foreach $entr ( @entries ) { $temp_uid = $entr->get_value(uid); $temp_naam = $entr->get_value(sn); $s++; } This is the popup_menu i want to use it in. popup_menu(-name=>'modcon',... (4 Replies)
Discussion started by: tine
4 Replies

2. Shell Programming and Scripting

Perl: Sorting an associative array

Hi, When using sort on an associative array: foreach $key (sort(keys(%opalfabet))){ $value = $opalfabet{$key}; $result .= $value; } How does it handle double values? It seems to me that it removes them, is that true? If so, is there a way to get... (2 Replies)
Discussion started by: tine
2 Replies

3. Shell Programming and Scripting

Problem with lookup values on AWK associative array

I'm at wits end with this issue and my troubleshooting leads me to believe it is a problem with the file formatting of the array referenced by my script: awk -F, '{if (NR==FNR) {a=$4","$3","$2}\ else {print a "," $0}}' WBTSassignments1.txt RNCalarms.tmp On the WBTSassignments1.txt file... (2 Replies)
Discussion started by: JasonHamm
2 Replies

4. Shell Programming and Scripting

awk, associative array, compare files

i have a file like this < '393200103052';'H3G';'20081204' < '393200103059';'TIM';'20110111' < '393200103061';'TIM';'20060206' < '393200103064';'OPI';'20110623' > '393200103052';'HKG';'20081204' > '393200103056';'TIM';'20110111' > '393200103088';'TIM';'20060206' Now i have to generate a file... (9 Replies)
Discussion started by: shruthi123
9 Replies

5. Shell Programming and Scripting

Help needed on Associative array in awk

Hi All, I got stuck up with shell script where i use awk. The scenario which i am working on is as below. I have a file text.txt with contents COL1 COL2 COL3 COL4 1 A 500 400 1 B 500 400 1 A 500 200 2 A 290 300 2 B 290 280 3 C 100 100 I could able to sum col 3 and col4 based on... (3 Replies)
Discussion started by: imsularif
3 Replies

6. Shell Programming and Scripting

Associative array

I have an associative array named table declare -A table table="fruit" table="veggie" table="GT" table="eminem" Now say I have a variable returning the value highway How do I find corresponding value GT ?? (this value that I find (GT in this case) is supposed to be the name of a mysql... (1 Reply)
Discussion started by: leghorn
1 Replies

7. Shell Programming and Scripting

Associative Array with more than one item per entry

Hi all I have a problem where i have a large list ( up to 1000 of items) and need to have 2 items pulled from it into variables in a bash script my list is like the following and I could have it as an array or possibly an external text file maintained separately. Every line is different and... (6 Replies)
Discussion started by: kcpoole
6 Replies

8. Shell Programming and Scripting

Morse Code with Associative Array

Continuing my quest to learn BASH, Bourne, Awk, Grep, etc. on my own through the use of a few books. I've come to an exercise that has me absolutely stumped. The specifics: 1. Using ONLY BASH scripting commands (not sed, awk, etc.), write a script to convert a string on the command line to... (22 Replies)
Discussion started by: ksmarine1980
22 Replies

9. Shell Programming and Scripting

Using associative array for comparison

Hello together, i make something wrong... I want an array that contains information to associate it for further processing. Here is something from my bash... You will know, what I'm trying to do. I have to point out in advance, that the variable $SYSOS is changing and not as static as in my... (2 Replies)
Discussion started by: Decstasy
2 Replies

10. UNIX for Beginners Questions & Answers

awk Associative Array and/or Referring to Field by String (Nonconstant String Value)

I will start with an example of what I'm trying to do and then describe how I am approaching the issue. File PS028,005 Lexeme HRS # M # PhraseType 1(1:1) 7(7) PhraseLab 501 503 ClauseType ZYq0 PS028,005 Lexeme W # L> # BNH # M #... (17 Replies)
Discussion started by: jvoot
17 Replies
All times are GMT -4. The time now is 11:17 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy