Awk: Performing "for" loop within text block with two files


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Awk: Performing "for" loop within text block with two files
# 1  
Old 01-22-2019
Awk: Performing "for" loop within text block with two files

I am hoping to pull multiple strings from one file and use them to search within a block of text within another file.

File 1
Code:
PS001,001 HLK
PS002,004 MWQ
PS004,002 RXM
PS004,006 DBX
PS004,006 SBR
PS005,007 ML
PS005,009 DBR
PS005,011 MR
PS005,012 SBR
PS006,003 RXM
PS006,003 >SJ
PS006,010 QBL

File 2
Code:
 PS001,001 [VWB-WHJ <Su>] [L-GBR> <PC>]
 Lexeme     VWB HJ==     # L GBR       #
 PhraseType  2(2.1,7) 5(5,2.3)
 PhraseLab  502[0]         521[0]
 ClauseType NmCl

 PS001,001 [D-<Re>] [B->WRX> D-<WL> <Co>] [L> <Ng>] [HLK <Pr>]
 Lexeme     D      # B >WRX D <WL        # L>      # HLK      #
 PhraseType  6(6) 5(5,2.3,5,2.3) 11(11) 1(1:2)
 PhraseLab  519[0]   504[0]                510[0]    501[0]
 ClauseType xQt0 
 
 PS002,004 [W-<Cj>] [MRJ> <Su>] [NMJQ <Pr>] [B-HWN <Co>]
 Lexeme     W      # MRJ>      # MWQ       # B HWN=     #
 PhraseType  6(6) 3(3.2) 1(1:1) 5(5,7)
 PhraseLab  509[0]   502[0]      501[0]      504[0]
 ClauseType WXYq

 PS002,005 [HJ DJN <Mo>] [NMLL <Pr>] [<LJ-HWN <Co>] [B-RWGZ-H <Aj>]
 Lexeme     HJ= DJN=    # ML        # <L HWN=      # B RWGZ H      #
 PhraseType  4(8,4) 1(1:1) 5(5,7) 5(5,2.1,7)
 PhraseLab  508[0]        501[0]      504[0]         505[0]
 ClauseType xYq0

 PS005,012 [D-<Re>] [MSBRJN <PC>] [B-K <Co>]
 Lexeme     D      # SBR         # B K      #
 PhraseType  6(6) 1(1:6.2) 5(5,7)
 PhraseLab  519[0]   521[0]        504[0]
 ClauseType Ptcp

 PS005,012 [W-<Cj>] [L-<LM <Ti>] [NCBXWN-<Pr>] [K <Ob>]
 Lexeme     W      # L <LM      # CBX         # K      #
 PhraseType  6(6) 5(5,2.2) 1(1:1) 7(7)
 PhraseLab  509[0]   506[0]       501[0]        503[0]
 ClauseType WxY0 PS005,013 [>JK SKR> MQBLT> <Aj>] [T<VP-<Pr>] [NJ <Ob>]
 Lexeme     >JK SKR QBL          # <VP       # NJ      #
 PhraseType  5(5,2.3,13:62.3) 1(1:1) 7(7)
 PhraseLab  505[0]                 501[0]      503[0]
 ClauseType xYq0

 PS006,002 [MRJ> <Vo>]
 Lexeme     MRJ>      #
 PhraseType  3(3.2)
 PhraseLab  562[0]
 ClauseType Voct

 PS006,002 [L> <Ng>] [B-RWGZ-K <Aj>] [TKS-<Pr>] [NJ <Ob>]
 Lexeme     L>      # B RWGZ K      # KS       # NJ      #
 PhraseType  11(11) 5(5,2.1,7) 1(1:1) 7(7)
 PhraseLab  510[0]    505[0]          501[0]     503[0]
 ClauseType xYq0

My hope was that when $1 of File 1 matches $1 in File 2, $0 in File 2 contains the string "<Co>", and $2 of File 1 matches a string *exactly* in File 2 on a line beginning with the word "Lexeme," then print.

Thus, my desired output would look like this:
Code:
 PS001,001 [D-<Re>] [B->WRX> D-<WL> <Co>] [L> <Ng>] [HLK <Pr>]
 Lexeme     D      # B >WRX D <WL        # L>      # HLK      #
 PhraseType  6(6) 5(5,2.3,5,2.3) 11(11) 1(1:2)
 PhraseLab  519[0]   504[0]                510[0]    501[0]
 ClauseType xQt0 

 PS002,004 [W-<Cj>] [MRJ> <Su>] [NMJQ <Pr>] [B-HWN <Co>]
 Lexeme     W      # MRJ>      # MWQ       # B HWN=     #
 PhraseType  6(6) 3(3.2) 1(1:1) 5(5,7)
 PhraseLab  509[0]   502[0]      501[0]      504[0]
 ClauseType WXYq

 PS005,012 [D-<Re>] [MSBRJN <PC>] [B-K <Co>]
 Lexeme     D      # SBR         # B K      #
 PhraseType  6(6) 1(1:6.2) 5(5,7)
 PhraseLab  519[0]   521[0]        504[0]
 ClauseType Ptcp

With the following code I am able to am able to do two of the three criteria listed above, namely, I am able to match $1 of File1 with $1 of File2 and also when $0 of File 1 has the string "<Co>". However, I am having difficulty with the last criteria, viz., match $2 of File 1 with the exact string in File 2 when the lines begins with "Lexeme."

Code:
NR==FNR   {A[$1]
           B[$2]
           next
          }
/^ Cl/   {if (PR1 && PR2 && PR3)   {print"\n" BUF
                                    print
                             }
           PR1 = PR2 = PR3 = 0
           BUF = ""
           next
          }

          {BUF = BUF (BUF?ORS:_) $0
           if ($1 in A) PR1 = 1
           if ($0 ~/\<Co\>/) PR2 = 1
           for (b in B) if($0 ~ b) PR3 = 1
          }

I have also tried:
Code:
NR==FNR   {A[$1]
           B[$2]
           next
          }
/^ Cl/   {if (PR1 && PR2 && PR3)   {print"\n" BUF
                                    print
                             }
           PR1 = PR2 = PR3 = 0
           BUF = ""
           next
          }

          {BUF = BUF (BUF?ORS:_) $0
           if ($1 in A) PR1 = 1
           if ($0 ~/\<Co\>/) PR2 = 1
          if ($1 ~/ ^L/ && $0 in B) PR3 = 1
          }

I think there might be something wrong with the way that I'm defining the "B" array with $2 of File 1 or defining the "for" loop in the script. Thank you so much in advance for your help.

Last edited by jvoot; 01-22-2019 at 10:45 AM.. Reason: Highlighted problem areas in script example.
# 2  
Old 01-22-2019
Since you have multiple lines in File 1 with the same $1 values and different $2 values, there are a lot of similarities between the awk code needed to solve your problem and the problem that cmccabe presented to us three days ago: awk to add text to each line of matching id

Have you looked at the awk code that was provided to help cmccabe in that thread to see if it might help you solve your problem?
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 01-22-2019
Thank you so much Don. I'll take a look.

--- Post updated 01-22-19 at 07:06 AM ---

I read through that thread Don and while I must admit that I do not exactly follow all of it, it seems to be a bit different than what I am asking here. My problem is not so much dealing with repeated values, but rather getting that "for" loop to work in my awk script.

To isolate the problem further, I am having trouble getting $2 in File 1 to precisely match a string in File 2 on lines that begin with "Lexeme" (~/ ^L/) giving me fits. I have highlighted the two lines of code in both of the example *.awk scripts with red font. In principle something similar to awk 'FNR==NR{B[$2]; next}{for (b in B) if ($0 ~b) print} ' File[12] should do the trick, but relative to this particular issue: (A) I'm having difficulty getting something like that to work in my script; and (B) this does not produce exact matches, but rather treats the values of $2 in File 1 as strings (rather than say $2 ~/^pattern$/). Thus for the line in File 1 that reads PS005,007 ML I'll get matches such as MLL> MLT> TQML, etc.

Hopefully that hones in on my particular impasse that I'm experiencing.
# 4  
Old 01-22-2019
You mean loop through the fields and each lookup in the array, like this?
Code:
  if ($1 == "Lexeme") {
    for (b=2; b<=NF; b++) if ($b in B) PR3 = 1
  }

Also, do you want to search <Co> in all the records? Or only in the records that matched with $1? then it should be
Code:
  if (($1 in A) && /<Co>/) PR1 = 1

or (again) loop through the fields and compare each field:
Code:
  if ($1 in A) {
    for (b=2; b<=NF; b++) if ($b == "<Co>") PR1 = 1
  }

This User Gave Thanks to MadeInGermany For This Post:
# 5  
Old 01-22-2019
The suggestions MadeInGermany provided you will work for your stated requirements.

But, when I look at you examples it seems that your requirements might be more stringent that what you have stated. If we look at your sample File 1:
Code:
PS001,001 HLK
PS002,004 MWQ
PS004,002 RXM
PS004,006 DBX
PS004,006 SBR
PS005,007 ML
PS005,009 DBR
PS005,011 MR
PS005,012 SBR
PS006,003 RXM
PS006,003 >SJ
PS006,010 QBL

and the output you say you are trying to produce:
Code:
PS001,001 [D-<Re>] [B->WRX> D-<WL> <Co>] [L> <Ng>] [HLK <Pr>]
 Lexeme     D      # B >WRX D <WL        # L>      # HLK      #
 PhraseType  6(6) 5(5,2.3,5,2.3) 11(11) 1(1:2)
 PhraseLab  519[0]   504[0]                510[0]    501[0]
 ClauseType xQt0 

 PS002,004 [W-<Cj>] [MRJ> <Su>] [NMJQ <Pr>] [B-HWN <Co>]
 Lexeme     W      # MRJ>      # MWQ       # B HWN=     #
 PhraseType  6(6) 3(3.2) 1(1:1) 5(5,7)
 PhraseLab  509[0]   502[0]      501[0]      504[0]
 ClauseType WXYq

 PS005,012 [D-<Re>] [MSBRJN <PC>] [B-K <Co>]
 Lexeme     D      # SBR         # B K      #
 PhraseType  6(6) 1(1:6.2) 5(5,7)
 PhraseLab  519[0]   521[0]        504[0]
 ClauseType Ptcp

I note that each of the selected output line groups does not just have a line 1 $1 value that is in your A[] array and a word in a line that starts with Lexeme that is in your B[] array; it has a matched pair where the word matched in the Lexeme line had to be from a line in File 1 that had a $1 value that matched $1 in that 1st line.

The requirements you stated do not require that both of those values found in a group of lines in File 2 come from a single line in File 1. But, in each of your sample output groups of lines, both values came from the same line in File 1.

Am I reading too much into your example? Or are your requirements more stringent than what you stated.

If you do want the more stringent requirements (and I am correct in assuming that the $1 value from File 1 is supposed to appear only as $1 in the first line of a group of lines in File 2 and the $2 from that same line in File 1 is required to appear in a line of a group of lines in File 2 starting with Lexeme, then you might want something more like:
Code:
awk '
NR == FNR {
	Keys[$1, $2]
	next
}
function PrintGroup() {
	if(PrintThisGroup) {
		if(GroupsPrinted++)
			print ""
		printf("%s",  Group)
	}
	Group = KeyField1 = ""
	LinesInGroup = PrintThisGroup = 0
}
NF == 0 {
	PrintGroup()
	next
}
++LinesInGroup == 1 && /<Co>/ {
	KeyField1 = $1
}
{	Group = Group $0 "\n"
}
$1 == "Lexeme" {
	for(i = 2; i <= NF; i++)
		if((KeyField1, $i) in Keys) {
			PrintThisGroup = 1
			next
		}
}
END {	PrintGroup()
}' "File 1" "File 2"

These 2 Users Gave Thanks to Don Cragun For This Post:
# 6  
Old 01-22-2019
Reply deleted.

Last edited by jvoot; 01-22-2019 at 06:15 PM.. Reason: Significant error in description.
# 7  
Old 01-22-2019
There is a leading space on all of the lines in the samples of the second file.
If that is not there in the actual file, you could try:

Code:
awk '
  NR==FNR {
    A[$1,$2]
    next
  }
  {
    for(i=2; i<=NF; i++)
      if(($1,$i) in A) {
        print
        next
      }
  }
' file1 RS= ORS='\n\n' file2

---
Or you can preprocess the second file, like so:
Code:
sed 's/^ //' file2 > file2.new


Last edited by Scrutinizer; 01-22-2019 at 05:12 PM..
These 3 Users Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Using "for" loop within "awk"

Hi Team. I am trying to execute a simple for loop within an awk but its giving a different result. Below is the main code: awk '{for(i=1;i<=6;i++) print $i}'The result should be 1 2 3 4 5 6 but its not giving this result. Can someone please help? (3 Replies)
Discussion started by: chatwithsaurav
3 Replies

2. Shell Programming and Scripting

For Loop Field editing - without using "awk"

Hi, I'm using Linux and bash shell. I have a file (F1.txt) with contents like Table1 Column1 123abc Table1 Column2 xyz Table2 Column1 543 Now, I would like to get the output as UPDATE Table1 SET Column1='123abc'; UPDATE Table1 SET Column2='xyz'; UPDATE Table2 SET Column1='543';... (3 Replies)
Discussion started by: Dev_Dev
3 Replies

3. Shell Programming and Scripting

how to use "cut" or "awk" or "sed" to remove a string

logs: "/home/abc/public_html/index.php" "/home/abc/public_html/index.php" "/home/xyz/public_html/index.php" "/home/xyz/public_html/index.php" "/home/xyz/public_html/index.php" how to use "cut" or "awk" or "sed" to get the following result: abc abc xyz xyz xyz (8 Replies)
Discussion started by: timmywong
8 Replies

4. Shell Programming and Scripting

Using sed to find text between a "string " and character ","

Hello everyone Sorry I have to add another sed question. I am searching a log file and need only the first 2 occurances of text which comes after (note the space) "string " and before a ",". I have tried sed -n 's/.*string \(*\),.*/\1/p' filewith some, but limited success. This gives out all... (10 Replies)
Discussion started by: haggismn
10 Replies

5. Shell Programming and Scripting

Get values from 2 files - Complex "for loop and if" awk problem

Hi everyone, I've been thinking and trying/changing all day long the below code, maybe some awk expert could help me to fix the for loop I've thought, I think I'm very close to the correct output. file1 is: <boxes content="Grapes and Apples"> <box No.="Box MT. 53"> <quantity... (8 Replies)
Discussion started by: Ophiuchus
8 Replies

6. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

7. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies

8. Shell Programming and Scripting

cat $como_file | awk /^~/'{print $1","$2","$3","$4}' | sed -e 's/~//g'

hi All, cat file_name | awk /^~/'{print $1","$2","$3","$4}' | sed -e 's/~//g' Can this be done by using sed or awk alone (4 Replies)
Discussion started by: harshakusam
4 Replies

9. UNIX for Dummies Questions & Answers

Explain the line "mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`"

Hi Friends, Can any of you explain me about the below line of code? mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'` Im not able to understand, what exactly it is doing :confused: Any help would be useful for me. Lokesha (4 Replies)
Discussion started by: Lokesha
4 Replies

10. Shell Programming and Scripting

Printing "END" before a new loop in AWK

First off, I have been learning AWK by trial and error over the last week or so, and there are some gaps in my basic understanding of the language. Here is my situation: I am coding and outputting results from an experiment I conducted in Psyscope, which has all been compiled into a master file.... (2 Replies)
Discussion started by: ccox85
2 Replies
Login or Register to Ask a Question