Awk: Dealing with whitespace in associative array indicies
Is there a reliable way to deal with whitespace in array indicies?
I am trying to annotate fails in a database using a table of known fails.
In a begin block I have code like this:
And in the main part, code like this:
$10 is a test value, $12 is pass/fail, $7 is a comment, $2-$5 are building, room, position in room, etc. Later in my code I translate "|" characters to new lines. Everything works fine except when some of my room names have whitespace.
Mike
Last edited by Michael Stora; 04-30-2015 at 11:30 PM..
This is not a false positive you are concatenating two null variables producing a null value just like:
contains whitespace == contains purplespace
Consider:
I meant false positive in the sense of what I am trying to do, not in the sense of "hey, I found a bug in AWK!". I was just showing trying all possible combinations with and without quoting. I realized that it was parsing them as variables.
The problem is the existence of white space in the variables and I'm asking what is the most elegant way to deal with white space in an array index.
The best I can come up with is to remove the whitespace before the array assignment and the contains check
Mike
Last edited by Michael Stora; 05-01-2015 at 05:02 AM..
... Why is whitespace a problem? It seems later part of your problem is that you're embedding shell variables in awk code, you're not properly quoting, and you've a lack of understanding of how awk splits records.
awk splits at spaces, however many there are. Then you put them in the index without any spaces because $1$2 has them stripped out. Using $1 " " $2 would squeeze multiple spaces into single ones.
Treating the entire line as the index, you see spaces work fine.
Please reword the issue if I'm not understanding you...
edit: I re-read the problem. What is the FS you're using? I see the split() uses ,.
My attempts to demonstrate what I thought was happening with simpler examples on the command line were flawed and introduced additional erros (although I did duplicte the issues using Awk -F, I did not post those examples) but I don't believe my "real" code has the same flaws.
I believe that I am quoting the shell varibes correctly (or else the readlines in my code below would fail) and by specifying an alternate delimiter in my AWK statement the issues with whitespace in array indicies are not comming from either splitting or my understaning of splitting.
First I am reading in a csv file that may have non-seperating commas, whitespace, and other potentially problematic characters inside double quoted fields. So I start off with a double-quote counting parser and an alternative delimiter (the old ASCII unit seperator form puncard/paper tape days).
Next I have a rather cryptic and very long awk command routine that transposes data and then stacks 21 variable columns into two columns (a variable name column and a variable value column and adds some other valus form shell variables to the pipe). It is very long and very complicates (as well as cryptic) so Iwill omit that part (it is working exactly as expected).
The next part of a code manipulates the data based on values in three different configuration files. I think you will find that I am quoting the external file names from BASH correctly. If I don't get the nesting of single and double quotes exactly right the readlines fail.
limit file contains a list of spec limits for different parameters (in the first column). For both areas considered sensitive and areas considered insensitive (two different columns). Then there is a file with a list of sensitive areas. finally there is a file with a list of known issues that I wish to substitute.
Edit: In reponse to you the question in your edit, these config files I am reading in are comma seperated values. They do not have the issues of commas inside quotes but one of them does have spaces inside rows of a few columns.
Of course a ton of code before and after what I included exists but it is outside the context of my question
Just as a reminder of the context of my question, the code works now that I am removing spaces from the parts in PINK using the gsub commands. My question is educational (what does awk do with whitespace in a index assignment)?
Also in general my questions in UNIX.com are not even about getting something to work but getting something to work efficiently when parsing huge files. In this particular project I am dealing with about 10,000 files that end up in a ~200k line database but in some other projects I am dealing with more than 50 million lines of data. In the case of this particular project I got the running time down from 70 min for a year of data to less than 5 min for 2 years of data. This involved moving a lot of BASH code to AWK, eliminating utility calls (using only built-ins in the interest of speed, even if more complex ) and file interactions in any kind of loop or itterative part and timing alternate versions of portions of my code in AWK, PERL, SED, BASH etc and picking the best performing one. I appologise if my questions about understanding more about how something works or if there are alternatives are being misinterperated as "this is broken, show me an implementation" types of questions. Generally when I ask these questions I have something working but I have a suspicion that there is a better/faster way.
Mike
Last edited by Michael Stora; 05-01-2015 at 04:01 PM..
Trying to be more rigorous with attempts to demonstrate the question in simple code examples results in me being unable to duplicate the problem . . .
Now I am left to ponder if the problem I thought I fixed with the gsub commands was even a problem and I somehow accidently fixed somthing else (entropy notwithstanding) . . .
Mike
---------- Post updated at 12:31 PM ---------- Previous update was at 12:20 PM ----------
Quote:
Originally Posted by neutronscott
Well I won't pick apart at your code too much more then.
To answer the question on whitespace, awk shouldn't be messing with them aside from the default FS being equivalent to [[:space:]][[:space:]]*.
With that in mind, your first call to awk uses the default FS which may lead to your data having trailing/leading whitespaces removed.
Print the indexes upon assignment and before the gsub in the latter awk to stderr and comb through it. It's probably what's happening.
Thanks for the suggestion but I think the error is creaping in through another source since my first awk invovation uses only $0 which appears to never get parsed and my second omitted awk statement uses the alternative delimiter.
Actually AWK doesn't but UNIX.com does so you'll have to take my work for it
However, you may be right about the error creaping in from something else in my input file. I will look in that direction since I have exhausted other avenues.
BTW: I have already kicked myself several times for not starting with tab seperated values from the very beginning of the project Tabs never exist in my fields.
Mike
Last edited by Michael Stora; 05-01-2015 at 04:36 PM..
I will start with an example of what I'm trying to do and then describe how I am approaching the issue.
File
PS028,005
Lexeme HRS # M #
PhraseType 1(1:1) 7(7)
PhraseLab 501 503
ClauseType ZYq0
PS028,005
Lexeme W # L> # BNH # M #... (17 Replies)
Hello together,
i make something wrong... I want an array that contains information to associate it for further processing.
Here is something from my bash... You will know, what I'm trying to do.
I have to point out in advance, that the variable $SYSOS is changing and not as static as in my... (2 Replies)
Continuing my quest to learn BASH, Bourne, Awk, Grep, etc. on my own through the use of a few books. I've come to an exercise that has me absolutely stumped.
The specifics:
1. Using ONLY BASH scripting commands (not sed, awk, etc.), write a script to convert a string on the command line to... (22 Replies)
Hi all
I have a problem where i have a large list ( up to 1000 of items) and need to have 2 items pulled from it into variables in a bash script
my list is like the following and I could have it as an array or possibly an external text file maintained separately. Every line is different and... (6 Replies)
I have an associative array named table
declare -A table
table="fruit"
table="veggie"
table="GT"
table="eminem"
Now say I have a variable returning the value highway
How do I find corresponding value GT ??
(this value that I find (GT in this case) is supposed to be the name of a mysql... (1 Reply)
Hi All,
I got stuck up with shell script where i use awk. The scenario which i am working on is as below.
I have a file text.txt with contents
COL1 COL2 COL3 COL4
1 A 500 400
1 B 500 400
1 A 500 200
2 A 290 300
2 B 290 280
3 C 100 100
I could able to sum col 3 and col4 based on... (3 Replies)
i have a file like this
< '393200103052';'H3G';'20081204'
< '393200103059';'TIM';'20110111'
< '393200103061';'TIM';'20060206'
< '393200103064';'OPI';'20110623'
> '393200103052';'HKG';'20081204'
> '393200103056';'TIM';'20110111'
> '393200103088';'TIM';'20060206'
Now i have to generate a file... (9 Replies)
I'm at wits end with this issue and my troubleshooting leads me to believe it is a problem with the file formatting of the array referenced by my script:
awk -F, '{if (NR==FNR) {a=$4","$3","$2}\
else {print a "," $0}}' WBTSassignments1.txt RNCalarms.tmp
On the WBTSassignments1.txt file... (2 Replies)
Hi,
When using sort on an associative array:
foreach $key (sort(keys(%opalfabet))){
$value = $opalfabet{$key};
$result .= $value;
}
How does it handle double values?
It seems to me that it removes them, is that true? If so, is there a way to get... (2 Replies)
Hi,
I am trying to make an associative array to use in a popup_menu on a website. Here is what i have:
foreach $entr ( @entries )
{
$temp_uid = $entr->get_value(uid);
$temp_naam = $entr->get_value(sn);
$s++;
}
This is the popup_menu i want to use it in.
popup_menu(-name=>'modcon',... (4 Replies)