awk Associative Array and/or Referring to Field by String (Nonconstant String Value)


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers awk Associative Array and/or Referring to Field by String (Nonconstant String Value)
# 8  
Old 02-01-2019
One could still try:
Code:
awk '$1 ~ /^PS/ {for(i=3; i<=NF; i++) if($i == "<Ob>]"){print $1,substr($(i-1), 2); next}}' file

without needing to use split() (unless I misunderstood and you changed your input file format to remove the <space> before the <Ob>]).
This User Gave Thanks to Don Cragun For This Post:
# 9  
Old 02-01-2019
I'm so sorry Scrutinizer, but as my input is many thousand lines long I did not notice a potential complicating issue that I was wondering if I could get your help addressing. There are time where the desired string between an initial "[" and "<Ob>] contains a space.

So for example, given:
Code:
 PS028,006 [KJ <Cj>] [CM< <Pr>] [QWL TXNWNJ- <Ob>]
 Lexeme     KJ      # CM<      # QWL TXNWN J      #
 PhraseType  6(6) 1(1:2) 2(2.1,2.1,7)
 PhraseLab  509[0]    501[0]     503[0]
 ClauseType xQt0

Which I would pare down with INPUT | awk '$1 ~/^ PS/' to get:
Code:
PS028,006 [KJ <Cj>] [CM< <Pr>] [QWL TXNWNJ- <Ob>]

In this case, the desired output would be:
Code:
PS028,006 QWL TXNWNJ-

or
Code:
PS028,006 [QWL TXNWNJ- <Ob>]

The code you helped me with only gives:
Code:
PS028,006 QWL

Again, I apologize that I did not see the possibility of the space within the desired string until I double-checked the output against INPUT | sed -e 's/.* \[\(.*\) <Ob>\].*/\1/' which gives me the desired string but not the $1 when $1 ~/^ PS/.

Would you be able to help me iron this out?

--- Post updated at 10:02 PM ---

Quote:
Originally Posted by Don Cragun
One could still try:
Code:
awk '$1 ~ /^PS/ {for(i=3; i<=NF; i++) if($i == "<Ob>]"){print $1,substr($(i-1), 2); next}}' file

without needing to use split() (unless I misunderstood and you changed your input file format to remove the <space> before the <Ob>]).
This works well Don except that I represented the desired output strings as "ABC" and "XYZ" which it seems that you took at being a three character string. I should have been more specific and said that "ABC" and "XYZ" represents a string of any length. Thus something like ["some amount of text" <Ob>].
# 10  
Old 02-01-2019
OK... One final attempt...

Based on your single sample latest input file, the following seems to do what you want and will at least show you lines where it wasn't able to match:
Code:
awk '
$1 ~ /^PS/ {
	if(match($0, /[[][^[]* <Ob>[]]/))
		print $1, substr($0, RSTART + 1, RLENGTH - 7)
	else
		print "No Match Found on line " NR, $0
}' file

This User Gave Thanks to Don Cragun For This Post:
# 11  
Old 02-01-2019
Try also
Code:
awk -F"[][]" '/^ *PS.*<Ob>/ {sub(/ *<Ob>.*$/, ""); print $1, $NF}' file
 PS028,005  ABC 
 PS028,005  XYZ 
 PS028,006  QWL TXNWNJ-

This User Gave Thanks to RudiC For This Post:
# 12  
Old 02-01-2019
Quote:
Originally Posted by RudiC
Try also
Code:
awk -F"[][]" '/^ *PS.*<Ob>/ {sub(/ *<Ob>.*$/, ""); print $1, $NF}' file
 PS028,005  ABC 
 PS028,005  XYZ 
 PS028,006  QWL TXNWNJ-

That did it RubiC! Such a simple and elegant way to accomplish it! Thanks so much also to Scrutinizer and Don Cragun for your help!

If I may, could I please ask a question about the field separator value? The man AWK page seems to only imply rather than being explicit that the use of the square brackets when setting the field separator from the command line tells AWK to interpret what is between them as a regex rather than simply a fixed string which would otherwise be indicated by "..."? Is this correct? Thanks again!

--- Post updated at 04:30 PM ---

Quote:
Originally Posted by RudiC
Try also
Code:
awk -F"[][]" '/^ *PS.*<Ob>/ {sub(/ *<Ob>.*$/, ""); print $1, $NF}' file
 PS028,005  ABC 
 PS028,005  XYZ 
 PS028,006  QWL TXNWNJ-

That did it RubiC! Such a simple and elegant way to accomplish it! Thanks so much also to Scrutinizer and Don Cragun for your help!

If I may, could I please ask two questions about how this code is working? The first is about the field separator value. The man AWK page seems to only imply rather than being explicit that the use of the square brackets when setting the field separator from the command line tells AWK to interpret what is between them as a regex rather than simply a fixed string which would otherwise be indicated by "..."? Is this correct?

Secondly, since the value for FS has been set to "][" how come when the print statement calls for {print $1} is does not print from the beginning of the line to the first instance of "][" but rather prints what would be $1 when FS is set to whitespace? In other words, given:
Code:
 PS028,006 [KJ <Cj>] [CM< <Pr>] [QWL TXNWNJ- <Ob>]
 Lexeme     KJ      # CM<      # QWL TXNWN J      #
 PhraseType  6(6) 1(1:2) 2(2.1,2.1,7)
 PhraseLab  509[0]    501[0]     503[0]
 ClauseType xQt0

Why does RudiC's code not give:PS028,006 [KJ <Cj> for {print $1} if FS is set to "]["?

Rather it gives the (desired) first field if FS was at default PS028,006?

Thanks again!
# 13  
Old 02-01-2019
Quote:
Originally Posted by jvoot
That did it RubiC! Such a simple and elegant way to accomplish it! Thanks so much also to Scrutinizer and Don Cragun for your help!

If I may, could I please ask a question about the field separator value? The man AWK page seems to only imply rather than being explicit that the use of the square brackets when setting the field separator from the command line tells AWK to interpret what is between them as a regex rather than simply a fixed string which would otherwise be indicated by "..."? Is this correct? Thanks again!

--- Post updated at 04:30 PM ---



That did it RubiC! Such a simple and elegant way to accomplish it! Thanks so much also to Scrutinizer and Don Cragun for your help!

If I may, could I please ask two questions about how this code is working? The first is about the field separator value. The man AWK page seems to only imply rather than being explicit that the use of the square brackets when setting the field separator from the command line tells AWK to interpret what is between them as a regex rather than simply a fixed string which would otherwise be indicated by "..."? Is this correct?

Secondly, since the value for FS has been set to "][" how come when the print statement calls for {print $1} is does not print from the beginning of the line to the first instance of "][" but rather prints what would be $1 when FS is set to whitespace? In other words, given:
Code:
 PS028,006 [KJ <Cj>] [CM< <Pr>] [QWL TXNWNJ- <Ob>]
 Lexeme     KJ      # CM<      # QWL TXNWN J      #
 PhraseType  6(6) 1(1:2) 2(2.1,2.1,7)
 PhraseLab  509[0]    501[0]     503[0]
 ClauseType xQt0

Why does RudiC's code not give:PS028,006 [KJ <Cj> for {print $1} if FS is set to "]["?

Rather it gives the (desired) first field if FS was at default PS028,006?

Thanks again!
Hi jvoot,
The standards clearly state that the value of the awk FS variable is an extended regular expression and it doesn't matter whether it is set using the -F option, using the -v option, using an assignment statement between pathname operands, or using an assignment statement in the awk script itself. When the ERE is set to [][] that is a bracket expression that specifies that the <open-square-bracket> character ([) and the <close-square-bracket> character (]) are each to be treated as separate field separators.

With the FS value RudiC used, field 1 is everything that appears in the record before the 1st open or close square bracket character (including the leading and trailing <space>). I chose to use the default FS value because I didn't think you wanted the leading and trailing <space> characters at the start of lines in your input data to be included in your output.

Hope this helps,
Don
These 2 Users Gave Thanks to Don Cragun For This Post:
# 14  
Old 02-02-2019
Quote:
Originally Posted by jvoot
...
The first is about the field separator value. The man AWK page seems to only imply rather than being explicit that the use of the square brackets when setting the field separator from the command line tells AWK to interpret what is between them as a regex rather than simply a fixed string which would otherwise be indicated by "..."? Is this correct?
You are partly right, the field separator string will be interpreted as a regex, and always. In Scrutinizers proposal (from which I stole shamelessly), he uses the bracket expression [][].
man regex:
Quote:
A bracket expression is a list of characters enclosed in "[]". It normally matches any single character from the list.
So awk splits the input line at any occurrence of either [ or ] .

BTW, awk's default FS is a bracket expression regular expression (/[ \t\n]+/) by itself.


Quote:
Secondly, since the value for FS has been set to "][" how come when the print statement calls for {print $1} is does not print from the beginning of the line to the first instance of "][" but rather prints what would be $1 when FS is set to whitespace?
It does. Please apply what has been said to the repective line:
Code:
 PS028,006 [KJ <Cj>] [CM< <Pr>] [QWL TXNWNJ- <Ob>]
^          ^       ^ ^        ^ ^                ^--- last separator; $NF is empty
|          +-------+-+--------+-+-------------------- all FS
+---------------------------------------------------- field 1

Is that clearer now? If you want to remove the leading space from field 1, additional measures must be taken.
These 2 Users Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to average field if matching string in another

In the awk below I am trying to get the average of the sum of $7 if the string in $4 matches in the line below it. The --- in the desired out is not needed, it is just to illustrate the calculation. The awk executes and produces the current out. I am not sure why the middle line is skipped and the... (2 Replies)
Discussion started by: cmccabe
2 Replies

2. UNIX for Beginners Questions & Answers

String has * as the field delimiter and I need echo/awk to escape it, how?

Hi, I am trying to read an Oracle listener log file line by line and need to separate the lines into several fields. The field delimiter for the line happens to be an asterisk. I have the script below to start with but when running it, the echo command is globbing it to include other... (13 Replies)
Discussion started by: newbie_01
13 Replies

3. Shell Programming and Scripting

Awk: Dealing with whitespace in associative array indicies

Is there a reliable way to deal with whitespace in array indicies? I am trying to annotate fails in a database using a table of known fails. In a begin block I have code like this: # Read in Known Fail List getline < "'"$failListFile"'"; getline < "'"$failListFile"'"; getline <... (6 Replies)
Discussion started by: Michael Stora
6 Replies

4. Shell Programming and Scripting

Split string into map (Associative Array)

Hi Input: { committed = 782958592; init = 805306368; max = 1051394048; used = 63456712; } Result: A map (maybe Associative Array) where I can iterate through the key/value. Something like this: for key in $map do echo key=$key value=$map done Sample output from the map: ... (2 Replies)
Discussion started by: chitech
2 Replies

5. Shell Programming and Scripting

sed or awk command to replace a string pattern with another string based on position of this string

here is what i want to achieve... consider a file contains below contents. the file size is large about 60mb cat dump.sql INSERT INTO `table1` (`id`, `action`, `date`, `descrip`, `lastModified`) VALUES (1,'Change','2011-05-05 00:00:00','Account Updated','2012-02-10... (10 Replies)
Discussion started by: vivek d r
10 Replies

6. Shell Programming and Scripting

Help needed on Associative array in awk

Hi All, I got stuck up with shell script where i use awk. The scenario which i am working on is as below. I have a file text.txt with contents COL1 COL2 COL3 COL4 1 A 500 400 1 B 500 400 1 A 500 200 2 A 290 300 2 B 290 280 3 C 100 100 I could able to sum col 3 and col4 based on... (3 Replies)
Discussion started by: imsularif
3 Replies

7. Homework & Coursework Questions

passing letters from an array into a string for string comparison

attempting the hangman program. This was an optional assignment from the professor. I have completed the logical coding, debugging now. ##I have an array $wordString that initializes to a string of dashes ##reflecting the number of letters in $theWord ##every time the user enters a (valid)... (5 Replies)
Discussion started by: lotsofideas
5 Replies

8. Shell Programming and Scripting

awk, associative array, compare files

i have a file like this < '393200103052';'H3G';'20081204' < '393200103059';'TIM';'20110111' < '393200103061';'TIM';'20060206' < '393200103064';'OPI';'20110623' > '393200103052';'HKG';'20081204' > '393200103056';'TIM';'20110111' > '393200103088';'TIM';'20060206' Now i have to generate a file... (9 Replies)
Discussion started by: shruthi123
9 Replies

9. Shell Programming and Scripting

Awk Search text string in field, not all in field.

Hello, I am using awk to match text in a tab separated field and am able to do so when matching the exact word. My problem is that I would like to match any sequence of text in the tab-separated field without having to match it all. Any help will be appreciated. Please see the code below. awk... (3 Replies)
Discussion started by: rocket_dog
3 Replies

10. Shell Programming and Scripting

Problem with lookup values on AWK associative array

I'm at wits end with this issue and my troubleshooting leads me to believe it is a problem with the file formatting of the array referenced by my script: awk -F, '{if (NR==FNR) {a=$4","$3","$2}\ else {print a "," $0}}' WBTSassignments1.txt RNCalarms.tmp On the WBTSassignments1.txt file... (2 Replies)
Discussion started by: JasonHamm
2 Replies
Login or Register to Ask a Question