awk Associative Array and/or Referring to Field by String (Nonconstant String Value)


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers awk Associative Array and/or Referring to Field by String (Nonconstant String Value)
# 1  
Old 02-01-2019
awk Associative Array and/or Referring to Field by String (Nonconstant String Value)

I will start with an example of what I'm trying to do and then describe how I am approaching the issue.

File
Code:
 PS028,005 [JHRS-<Pr>] [ABC <Ob>]
 Lexeme     HRS       # M      #
 PhraseType  1(1:1) 7(7)
 PhraseLab  501[0]      503[0]
 ClauseType ZYq0

 PS028,005 [W-<Cj>] [L> <Ng>] [JBN-<Pr>] [XYZ <Ob>]
 Lexeme     W      # L>      # BNH      # M      #
 PhraseType  6(6) 11(11) 1(1:1) 7(7)
 PhraseLab  509[0]   510[0]    501[0]     503[0]
 ClauseType WxY0

Desired Output
Code:
 PS028,005 ABC

 PS028,005 XYZ

I would also be happy with the following where I can strip things off by piping into sed:

Code:
 PS028,005 [ABC <Ob>]

 PS028,005 [XYZ <Ob>]

In essence, when a line begins with /^ PS/ then print $1 of that line along with the string between strings "[" and "<Ob>]". I can use sed to get the string between "[" and "<Ob>]" but I cannot get $1 (when $1 ~/^ PS/) to print along with it.

I have attempted:
Code:
awk '/^ PS/{print $1, $(/\[.*\<Ob\>\]/)}' File

Here I am attempting to use a nonconstant field number, however this seems to print the entire line containing the matching string in question.

Another attempt has been this:
Code:
awk '/^PS/{a = $1; $2 = /\[.*\<Ob\>\]/}{print a,$2}' File

Finally I have tried utilize an array, and must admit that even after reading the man awk page, I still find these confusing.
Code:
awk 'BEGIN{a[NR]=$0}{if(/\[.*\<Ob\>\]/ in a && $1 ~/^ PS/) print}' File

Obviously, none of these has worked. I would greatly appreciate any help on what should be a relatively easy bit of code that I'm just not getting. Thanks in advance.

Last edited by RudiC; 02-01-2019 at 05:14 AM..
# 2  
Old 02-01-2019
Hi, try using the square brackets as field separators, for example:
Code:
awk -F '[][]' '
  $1~/^[ \t]*PS/ {
    for(i=2; i<=NF; i+=2)
      if($i~/<Ob>/) {
        split($i,F," ")
        print $1 F[1]
        next
      }
  }
' file

The code could perhaps be simplified if the file is always structured in a certain way, for instance if <Ob> always occcurs in the last field:

Code:
awk -F '[][]' '                                                                     
  $1~/^[ \t]*PS/ && $(NF-1)~/<Ob>/ {
    split($(NF-1),F," ")
    print i,$1 F[1]
  }
' file

And in which case you could probably also do it without adjusting the field separators:
Code:
awk '$1~/^PS/ && $NF~/<Ob>/ { 
  sub(/\[/,"",$(NF-1))
  print $1, $(NF-1)
}' file


Last edited by Scrutinizer; 02-01-2019 at 01:34 AM..
These 2 Users Gave Thanks to Scrutinizer For This Post:
# 3  
Old 02-01-2019
Deleted.

Last edited by jvoot; 02-01-2019 at 01:16 AM..
# 4  
Old 02-01-2019
In your sample data, the [string <0b>] always appears at the end of the line that starts with <space>s immediately followed by PS. Is that also true in your real data? If it is, we can simplify the code Scrutinizer suggested to something like:

Code:
awk '$1 ~ /^PS/ {sub(/\[/, "", $(NF - 1));print $1, $(NF - 1)}' file

or:
Code:
awk '$1 ~ /^PS/ {print $1, substr($(NF - 1), 2)}' file

This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 02-01-2019
I wrote some additional approaches in my page. And there was an extra variable (used for debugging) that I now removed in the first example. The is a space between the brackets in the field separator that should not be there in your example:
This User Gave Thanks to Scrutinizer For This Post:
# 6  
Old 02-01-2019
Thanks so much Scrutinizer. It looked like it was printing out some manner of counter (possibly string length?) as the first field of every line. I adjusted your code slightly and also for simplicity sake took out the leading space in the input file. I also needed to transcribe your code to a one-liner as I was passing output into it via pipe (I presented it as a file above for simplicity sake).

Thus, your code transcribed awk -F '[][]' '{for(i=2; i<=NF; i+=2) if($i~/<Ob>/){split($i,F," "); print i,$1 F[1]; next}}'gave me this:
Code:
4  PS028,005 M
8  PS028,005 M

I adjusted to awk -F '[][]' '{for(i=2; i<=NF; i+=2) if($i~/<Ob>/){split($i,F," "); print $1 F[1]; next}}' and while I haven't investigated in detail, that seems to have done the trick. Thanks so much!

--- Post updated at 09:18 PM ---

Quote:
Originally Posted by Don Cragun
In your sample data, the [string <0b>] always appears at the end of the line that starts with <space>s immediately followed by PS. Is that also true in your real data? If it is, we can simplify the code Scrutinizer suggested to something like:

Code:
awk '$1 ~ /^PS/ {sub(/\[/, "", $(NF - 1));print $1, $(NF - 1)}' file

or:
Code:
awk '$1 ~ /^PS/ {print $1, substr($(NF - 1), 2)}' file

Unfortunately no Don, the string with <Ob> can appear anywhere in the line. Nevertheless, I did a bit of an adjustment to Scrutinizer's code and it seems to be working very well. Thank you so much Don.
# 7  
Old 02-01-2019
You are welcome, you do not need to use a oneliner, BTW. You could do this:

Code:
INPUT |
awk ...

This User Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to average field if matching string in another

In the awk below I am trying to get the average of the sum of $7 if the string in $4 matches in the line below it. The --- in the desired out is not needed, it is just to illustrate the calculation. The awk executes and produces the current out. I am not sure why the middle line is skipped and the... (2 Replies)
Discussion started by: cmccabe
2 Replies

2. UNIX for Beginners Questions & Answers

String has * as the field delimiter and I need echo/awk to escape it, how?

Hi, I am trying to read an Oracle listener log file line by line and need to separate the lines into several fields. The field delimiter for the line happens to be an asterisk. I have the script below to start with but when running it, the echo command is globbing it to include other... (13 Replies)
Discussion started by: newbie_01
13 Replies

3. Shell Programming and Scripting

Awk: Dealing with whitespace in associative array indicies

Is there a reliable way to deal with whitespace in array indicies? I am trying to annotate fails in a database using a table of known fails. In a begin block I have code like this: # Read in Known Fail List getline < "'"$failListFile"'"; getline < "'"$failListFile"'"; getline <... (6 Replies)
Discussion started by: Michael Stora
6 Replies

4. Shell Programming and Scripting

Split string into map (Associative Array)

Hi Input: { committed = 782958592; init = 805306368; max = 1051394048; used = 63456712; } Result: A map (maybe Associative Array) where I can iterate through the key/value. Something like this: for key in $map do echo key=$key value=$map done Sample output from the map: ... (2 Replies)
Discussion started by: chitech
2 Replies

5. Shell Programming and Scripting

sed or awk command to replace a string pattern with another string based on position of this string

here is what i want to achieve... consider a file contains below contents. the file size is large about 60mb cat dump.sql INSERT INTO `table1` (`id`, `action`, `date`, `descrip`, `lastModified`) VALUES (1,'Change','2011-05-05 00:00:00','Account Updated','2012-02-10... (10 Replies)
Discussion started by: vivek d r
10 Replies

6. Shell Programming and Scripting

Help needed on Associative array in awk

Hi All, I got stuck up with shell script where i use awk. The scenario which i am working on is as below. I have a file text.txt with contents COL1 COL2 COL3 COL4 1 A 500 400 1 B 500 400 1 A 500 200 2 A 290 300 2 B 290 280 3 C 100 100 I could able to sum col 3 and col4 based on... (3 Replies)
Discussion started by: imsularif
3 Replies

7. Homework & Coursework Questions

passing letters from an array into a string for string comparison

attempting the hangman program. This was an optional assignment from the professor. I have completed the logical coding, debugging now. ##I have an array $wordString that initializes to a string of dashes ##reflecting the number of letters in $theWord ##every time the user enters a (valid)... (5 Replies)
Discussion started by: lotsofideas
5 Replies

8. Shell Programming and Scripting

awk, associative array, compare files

i have a file like this < '393200103052';'H3G';'20081204' < '393200103059';'TIM';'20110111' < '393200103061';'TIM';'20060206' < '393200103064';'OPI';'20110623' > '393200103052';'HKG';'20081204' > '393200103056';'TIM';'20110111' > '393200103088';'TIM';'20060206' Now i have to generate a file... (9 Replies)
Discussion started by: shruthi123
9 Replies

9. Shell Programming and Scripting

Awk Search text string in field, not all in field.

Hello, I am using awk to match text in a tab separated field and am able to do so when matching the exact word. My problem is that I would like to match any sequence of text in the tab-separated field without having to match it all. Any help will be appreciated. Please see the code below. awk... (3 Replies)
Discussion started by: rocket_dog
3 Replies

10. Shell Programming and Scripting

Problem with lookup values on AWK associative array

I'm at wits end with this issue and my troubleshooting leads me to believe it is a problem with the file formatting of the array referenced by my script: awk -F, '{if (NR==FNR) {a=$4","$3","$2}\ else {print a "," $0}}' WBTSassignments1.txt RNCalarms.tmp On the WBTSassignments1.txt file... (2 Replies)
Discussion started by: JasonHamm
2 Replies
Login or Register to Ask a Question