(g)awk: Matching strings from one file in another file between two strings

12-29-2018

Registered User

58, 2

Join Date: Aug 2014

Last Activity: 6 April 2020, 3:03 PM EDT

Posts: 58

Thanks Given: 61

Thanked 2 Times in 2 Posts

(g)awk: Matching strings from one file in another file between two strings

Hello all, I can get close to what I am looking for but cannot seem to hit it exactly and was wondering if I could get your help.

I have the following sample from textfile with many thousands of lines: File 1

Code:

PS001,001 HLK
PS002,004 L<G
PS004,002 XNN
PS004,006 BVX
PS004,006 ZBX=
PS005,007 DBR=
PS005,011 MRH
PS005,012 XSH
PS006,003 RP>
PS006,003 XNN
PS006,010 LQX
PS007,002 XSH
PS009,011 BVX

I have another large text file with many lines such as this: File 2

Code:

           * 0 1 55 0 0 .\ 1 LineNr 4 ClauseNr 1: 1: 2: 104: 505 11 SentenceNr 1 TxtType: Q Pargr: 2 ClType:InfC
 PS004,002 <NH                 0   1  1  0  1 -1 -1    3  2  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,002 NJ                 -1   7 -1 -1 -1 -1 -1   -1  1  1 -1    -1   7   7   2      -1      -1      -1    0  503     0
           * 0 -1 55 1 103 2 123 3 200 0 0 .N 0 LineNr 5 ClauseNr 2: 1: 2: 133: 0 0 SentenceNr 1 TxtType: Q Pargr: 2 ClType:ZIm0
           * 0 -2 123 0 0 .. 1 LineNr 7 ClauseNr 1: 1: 3: 132: 0 0 SentenceNr 2 TxtType: Q Pargr: 2 ClType:xQt0
 PS004,002 XNN                 0   1  1  0  1 -1 -1    3  2  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,002 NJ                 -1   7 -1 -1 -1 -1 -1   -1  1  1 -1    -1   7   7   2      -1      -1      -1    0  503     0
           * 0 -3 200 1 201 2 103 18 163 22 123 0 0 .. 0 LineNr 8 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 3 TxtType: Q Pargr: 2 ClType:ZIm0
           * 0 -3 200 1 201 2 103 18 163 22 123 0 0 .. 0 LineNr 8 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 3 TxtType: Q Pargr: 2 ClType:ZIm0
 PS004,006 ZBX=                0   1  1  0  7 -1 -1    3  2  3  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,006 ZBX                 0   2 -1 -1 -1  5 -1   -1 -1  3  2     1   2   0  -1       2      -1      -1   -1   -1    -1
 PS004,006 YDQ                 0   2 -1 -1 -1  1 -1   -1 -1  1  2     2   2   2   1  -10002      -1      -1    0  503     0
           * 0 -3 200 1 201 0 0 .. 5 LineNr 24 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 14 TxtType: Q Pargr: 2.1 ClType:ZIm0
          * 0 -2 523 1 122 0 0 .. 3 LineNr 32 ClauseNr 1: 1: 4: 142: 0 0 SentenceNr 17 TxtType: Q Pargr: 2.1 ClType:xQtX
 PS006,010 CM<                 0   1  0  0  1 -1 -1    2  3  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS006,010 JHWH                0   3 -1 -1 -1  1 -1   -1 -1  1  2     2   3   3   2      -1      -1      -1    0  502     0
 PS006,010 TXNH                0   2 -1 -1 -1  3 -1   -1 -1  1  1     1   2   0  -1      -1      -1      -1   -1   -1    -1
 PS006,010 J                  -1   7 -1 -1 -1 -1 -1   -1  1  1 -1    -1   7   2   2      -1      -1      -1    0  503     0
           * 0 -1 122 1 112 0 0 .. 4 LineNr 33 ClauseNr 2: 1: 3: 112: -6 -11 SentenceNr 17 TxtType: Q Pargr: 2.1 ClType:ZQtX
           * 0 -1 122 1 112 0 0 .. 4 LineNr 33 ClauseNr 2: 1: 3: 112: -6 -11 SentenceNr 17 TxtType: Q Pargr: 2.1 ClType:ZQtX
 PS006,010 JHWH                0   3 -1 -1 -1  1 -1   -1 -1  1  2     2   3   3   2      -1      -1      -1    0  502     0
 PS006,010 TPLH                0   2 -1 -1 -1  3 -1   -1 -1  1  1     1   2   0  -1      -1      -1      -1   -1   -1    -1
 PS006,010 J                  -1   7 -1 -1 -1 -1 -1   -1  1  1 -1    -1   7   2   2      -1      -1      -1    0  503     0
 PS006,010 LQX                 0   1  2  0  1 -1 -1    1  3  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
           * 0 -1 112 0 0 .. 5 LineNr 34 ClauseNr 3: 1: 3: 121: -6 -11 SentenceNr 17 TxtType: Q Pargr: 2.1 ClType:XYqt

My desire is that when $1 && $2 of File 1 match $1 && $2 of File 2 and that match is between lines beginning with "*" and also has $22=="503" in that same group of lines between "*", then print. So:

Code:

            * 0 -2 123 0 0 .. 1 LineNr 7 ClauseNr 1: 1: 3: 132: 0 0 SentenceNr 2 TxtType: Q Pargr: 2 ClType:xQt0
 PS004,002 XNN                 0   1  1  0  1 -1 -1    3  2  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,002 NJ                 -1   7 -1 -1 -1 -1 -1   -1  1  1 -1    -1   7   7   2      -1      -1      -1    0  503     0
           * 0 -3 200 1 201 2 103 18 163 22 123 0 0 .. 0 LineNr 8 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 3 TxtType: Q Pargr: 2 ClType:ZIm0
           * 0 -1 103 0 0 m. 7 LineNr 23 ClauseNr 1: 1: 1: 304: 0 0 SentenceNr 13 TxtType: Q Pargr: 2.1 ClType:MSyn
 PS004,006 ZBX=                0   1  1  0  7 -1 -1    3  2  3  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,006 ZBX                 0   2 -1 -1 -1  5 -1   -1 -1  3  2     1   2   0  -1       2      -1      -1   -1   -1    -1
 PS004,006 YDQ                 0   2 -1 -1 -1  1 -1   -1 -1  1  2     2   2   2   1  -10002      -1      -1    0  503     0
           * 0 -3 200 1 201 0 0 .. 5 LineNr 24 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 14 TxtType: Q Pargr: 2.1 ClType:ZIm0
           * 0 -1 122 1 112 0 0 .. 4 LineNr 33 ClauseNr 2: 1: 3: 112: -6 -11 SentenceNr 17 TxtType: Q Pargr: 2.1 ClType:ZQtX
 PS006,010 JHWH                0   3 -1 -1 -1  1 -1   -1 -1  1  2     2   3   3   2      -1      -1      -1    0  502     0
 PS006,010 TPLH                0   2 -1 -1 -1  3 -1   -1 -1  1  1     1   2   0  -1      -1      -1      -1   -1   -1    -1
 PS006,010 J                  -1   7 -1 -1 -1 -1 -1   -1  1  1 -1    -1   7   2   2      -1      -1      -1    0  503     0
 PS006,010 LQX                 0   1  2  0  1 -1 -1    1  3  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
           * 0 -1 112 0 0 .. 5 LineNr 34 ClauseNr 3: 1: 3: 121: -6 -11 SentenceNr 17 TxtType: Q Pargr: 2.1 ClType:XYqt

My current tactic was to take File 2 and print only matches between "*" that have $22=="503"

Code:

gawk '{BUF = BUF ORS $0} $22=="503"{PRT=1}/^ *\*/{if(PRT) print BUF; BUF=$0; PRT=DL=""}' File 2

Then I was taking File 1 iterating over the previous output to find matches:

Code:

gawk 'FNR==NR{a[$1]; next} ($1) in a || $0 ~/\*/' File 1 <(awk '{BUF = BUF ORS $0} $22=="503"{PRT=1}/^ *\*/{if(PRT) print BUF;BUF=$0; PRT=DL=""}' File2)

However, this method produces many false matches because the search criteria ($1 of File 1) is too ambiguous to match the specific matches I need. If I include the other field in the search criteria of File 1, it becomes too specific and will not include the surrounding lines.

So for example, given a hypothetical:
File 1a

Code:

PS004,002 XNN

File 2a

Code:

 * 0 1 55 0 0 .\ 1 LineNr 4 ClauseNr 1: 1: 2: 104: 505 11 SentenceNr 1 TxtType: Q Pargr: 2 ClType:InfC
 PS004,002 <NH                 0   1  1  0  1 -1 -1    3  2  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,002 NJ                 -1   7 -1 -1 -1 -1 -1   -1  1  1 -1    -1   7   7   2      -1      -1      -1    0  503     0
           * 0 -1 55 1 103 2 123 3 200 0 0 .N 0 LineNr 5 ClauseNr 2: 1: 2: 133: 0 0 SentenceNr 1 TxtType: Q Pargr: 2 ClType:ZIm0
           * 0 -2 123 0 0 .. 1 LineNr 7 ClauseNr 1: 1: 3: 132: 0 0 SentenceNr 2 TxtType: Q Pargr: 2 ClType:xQt0
 PS004,002 XNN                 0   1  1  0  1 -1 -1    3  2  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,002 NJ                 -1   7 -1 -1 -1 -1 -1   -1  1  1 -1    -1   7   7   2      -1      -1      -1    0  503     0
           * 0 -3 200 1 201 2 103 18 163 22 123 0 0 .. 0 LineNr 8 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 3 TxtType: Q Pargr: 2 ClType:ZIm0

My sample code gives:

Code:

 * 0 1 55 0 0 .\ 1 LineNr 4 ClauseNr 1: 1: 2: 104: 505 11 SentenceNr 1 TxtType: Q Pargr: 2 ClType:InfC
 PS004,002 <NH                 0   1  1  0  1 -1 -1    3  2  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,002 NJ                 -1   7 -1 -1 -1 -1 -1   -1  1  1 -1    -1   7   7   2      -1      -1      -1    0  503     0
           * 0 -1 55 1 103 2 123 3 200 0 0 .N 0 LineNr 5 ClauseNr 2: 1: 2: 133: 0 0 SentenceNr 1 TxtType: Q Pargr: 2 ClType:ZIm0
           * 0 -2 123 0 0 .. 1 LineNr 7 ClauseNr 1: 1: 3: 132: 0 0 SentenceNr 2 TxtType: Q Pargr: 2 ClType:xQt0
 PS004,002 XNN                 0   1  1  0  1 -1 -1    3  2  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,002 NJ                 -1   7 -1 -1 -1 -1 -1   -1  1  1 -1    -1   7   7   2      -1      -1      -1    0  503     0
           * 0 -3 200 1 201 2 103 18 163 22 123 0 0 .. 0 LineNr 8 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 3 TxtType: Q Pargr: 2 ClType:ZIm0

Rather than the desired:

Code:

           * 0 -2 123 0 0 .. 1 LineNr 7 ClauseNr 1: 1: 3: 132: 0 0 SentenceNr 2 TxtType: Q Pargr: 2 ClType:xQt0
 PS004,002 XNN                 0   1  1  0  1 -1 -1    3  2  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,002 NJ                 -1   7 -1 -1 -1 -1 -1   -1  1  1 -1    -1   7   7   2      -1      -1      -1    0  503     0
           * 0 -3 200 1 201 2 103 18 163 22 123 0 0 .. 0 LineNr 8 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 3 TxtType: Q Pargr: 2 ClType:ZIm0

Thanks so much and sorry for the lengthy post. Hopefully I have described this accurately.

Last edited by RudiC; 12-29-2018 at 05:30 PM.. Reason: corrected ICODE --> CODE

jvoot

View Public Profile for jvoot

Find all posts by jvoot

12-29-2018

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Let me try to paraphrase your request: In file2, "blocks" (or "records"?) are delimited by a leading and a trailing * line. Whenever a block has a line whose $1,$2 matches any $1,$2 in file1, AND its $22 is "503", then print the block.
This is untested, as I'm on a windows PC. Try

Code:

awk '
NR==FNR   {T[$1,$2]
           next
          }
/^ *\*/   {if (!STARONE)    {STARONE = 1
                             BUF = $0
                            }
           else             {if (PRT)   {print BUF
                                         print
                                        }
                             STARONE = PRT = 0
                            }
           next
          }
          {BUF = BUF ORS $0
           if (($1,$2) in T) && ($22 == 503)) PRT = 1
          }
 ' file1 file2

and report back.

These 2 Users Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

12-30-2018

Registered User

58, 2

Join Date: Aug 2014

Last Activity: 6 April 2020, 3:03 PM EDT

Posts: 58

Thanks Given: 61

Thanked 2 Times in 2 Posts

Actually RudiC, now that I read your paraphrase a bit more closely, it is slight off:

Quote:

Everything was correct up until your "AND" statement. The value "503" can be in $22 in any line within the block of text between two "*" where $1,$2 of file1 match $1,$2 in file2.

So for example, above in my example File 1a I had:

Code:

PS004,002 XNN

And for example file 2a as:

Code:

 * 0 1 55 0 0 .\ 1 LineNr 4 ClauseNr 1: 1: 2: 104: 505 11 SentenceNr 1 TxtType: Q Pargr: 2 ClType:InfC
 PS004,002 <NH                 0   1  1  0  1 -1 -1    3  2  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,002 NJ                 -1   7 -1 -1 -1 -1 -1   -1  1  1 -1    -1   7   7   2      -1      -1      -1    0  503     0
           * 0 -1 55 1 103 2 123 3 200 0 0 .N 0 LineNr 5 ClauseNr 2: 1: 2: 133: 0 0 SentenceNr 1 TxtType: Q Pargr: 2 ClType:ZIm0
           * 0 -2 123 0 0 .. 1 LineNr 7 ClauseNr 1: 1: 3: 132: 0 0 SentenceNr 2 TxtType: Q Pargr: 2 ClType:xQt0
 PS004,002 XNN                 0   1  1  0  1 -1 -1    3  2  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,002 NJ                 -1   7 -1 -1 -1 -1 -1   -1  1  1 -1    -1   7   7   2      -1      -1      -1    0  503     0
           * 0 -3 200 1 201 2 103 18 163 22 123 0 0 .. 0 LineNr 8 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 3 TxtType: Q Pargr: 2 ClType:ZIm0

Given this set of example input the desired output would be:

Code:

           * 0 -2 123 0 0 .. 1 LineNr 7 ClauseNr 1: 1: 3: 132: 0 0 SentenceNr 2 TxtType: Q Pargr: 2 ClType:xQt0
 PS004,002 XNN                 0   1  1  0  1 -1 -1    3  2  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,002 NJ                 -1   7 -1 -1 -1 -1 -1   -1  1  1 -1    -1   7   7   2      -1      -1      -1    0  503     0
           * 0 -3 200 1 201 2 103 18 163 22 123 0 0 .. 0 LineNr 8 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 3 TxtType: Q Pargr: 2 ClType:ZIm0

In this case $22==503 does not occur on the same line as the match between file 1a and file 2a. Thus, I would need $1,$2 in file 1 to match $1,$2 in file 2 but only between blocks of texts beginning and ending with "*" and one of (any one of) the lines in this block where a match occurs also has $22==503. Hopefully that makes better sense.

Thanks again.

Last edited by jvoot; 12-30-2018 at 02:23 AM.. Reason: Update

jvoot

View Public Profile for jvoot

Find all posts by jvoot

12-30-2018

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

You are right, it's clearly written in your spec. He who can read... try

Code:

                else      {if (PR1 && PR2)   {print BUF
                                              print
                                             }
                           STARONE = PR1 = PR2 = 0
                          }
                next 
               }
               {BUF = BUF ORS $0
                if ($1,$2) in T) PR1 = 1
                if ($22 == 503)  PR2 = 1
               }

These 2 Users Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

12-31-2018

Registered User

58, 2

Join Date: Aug 2014

Last Activity: 6 April 2020, 3:03 PM EDT

Posts: 58

Thanks Given: 61

Thanked 2 Times in 2 Posts

Thanks so much for this RudiC. I don't want to wear out your patience, but it seems that when I took samples from file 2, I took too many "*" lines and effectively made two consecutive lines beginning with "*" when in fact there are only one. Your code is very close and I suspect going awry due to my error in representing the data.

If it is not too much trouble, could I get your help in correcting my error? What is happening is that it is printing the block of lines separated by star-lines immediately after the block where the match should occur again, most likely due to my copy/paste error. I'll give you a snapshot of the output from your modified code and a correct representation of File 2. I really, really appreciate your help.

Here are the real first handful of records from file 1:

Code:

PS001,001 HLK
PS002,004 L<G
PS004,002 XNN
PS004,006 BVX
PS004,006 ZBX=
PS005,007 DBR=
PS005,009 NXH
PS005,011 MRH
PS005,012 XSH
PS006,003 RP>
PS006,003 XNN
PS006,010 LQX
PS007,002 XSH
PS009,011 BVX
PS009,014 XNN

The third record of file 1, which is PS004,002 XNN, should return a match from file 2. However, it is returning the immediately subsequent block.

Thus, the relevant portion of the corrected version of file 2 below. It should be said that the first line of the file does not begin with "*", but each block of text is separated by a line beginning with a "*". Again, I apologize as when I copied and pasted relevant portions from my file I took too many "*" lines. I am very sorry.

Code:

 PS004,002 B                   0   5 -1 -1 -1 -1 -1   -1 -1 -1 -1    -1   5   0  -1      -1      -1      -1   -1   -1    -1
 PS004,002 H                   0   0 -1 -1 -1 -1 -1   -1 -1 -1 -1    -1   0   0  -1      -1      -1      -1   -1   -1    -1
 PS004,002 YR=                 0  13 -1 -1 -1  1 -1   -1 -1  1  0     2   2   5   2      -1      -1      -1    0  505     0
 PS004,002 RXB                 0   1  0  2  3 -1 -1    2  2  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,002 L                   0   5 -1 -1 -1 -1 -1   -1 -1 -1 -1    -1   5   0  -1      -1      -1      -1   -1   -1    -1
 PS004,002 J                  -1   7 -1 -1 -1 -1 -1   -1  1  1 -1    -1   7   5   2      -1      -1      -1    0  504     0
           * 0 -2 123 0 0 .. 1 LineNr 7 ClauseNr 1: 1: 3: 132: 0 0 SentenceNr 2 TxtType: Q Pargr: 2 ClType:xQt0
 PS004,002 XNN                 0   1  1  0  1 -1 -1    3  2  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,002 NJ                 -1   7 -1 -1 -1 -1 -1   -1  1  1 -1    -1   7   7   2      -1      -1      -1    0  503     0
           * 0 -3 200 1 201 2 103 18 163 22 123 0 0 .. 0 LineNr 8 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 3 TxtType: Q Pargr: 2 ClType:ZIm0
 PS004,002 W                   0   6 -1 -1 -1 -1 -1   -1 -1 -1 -1    -1   6   6  -1      -1      -1      -1    0  509     0
 PS004,002 CM<                 0   1  1  0  1 -1 -1    3  2  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,002 TPLH                0   2 -1 -1 -1  3 -1   -1 -1  1  1     1   2   0  -1      -1      -1      -1   -1   -1    -1
 PS004,002 J                  -1   7 -1 -1 -1 -1 -1   -1  1  1 -1    -1   7   2   2      -1      -1      -1    0  503     0
           * 0 -1 201 0 0 .. 4 LineNr 9 ClauseNr 1: 1: 3: 153: 0 0 SentenceNr 4 TxtType: Q Pargr: 2 ClType:WIm0

The block of text that your latest modification returns is:

Code:

 PS004,002 W                   0   6 -1 -1 -1 -1 -1   -1 -1 -1 -1    -1   6   6  -1      -1      -1      -1    0  509     0
 PS004,002 CM<                 0   1  1  0  1 -1 -1    3  2  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,002 TPLH                0   2 -1 -1 -1  3 -1   -1 -1  1  1     1   2   0  -1      -1      -1      -1   -1   -1    -1
 PS004,002 J                  -1   7 -1 -1 -1 -1 -1   -1  1  1 -1    -1   7   2   2      -1      -1      -1    0  503     0
           * 0 -1 201 0 0 .. 4 LineNr 9 ClauseNr 1: 1: 3: 153: 0 0 SentenceNr 4 TxtType: Q Pargr: 2 ClType:WIm0

Rather than:

Code:

 PS004,002 XNN                 0   1  1  0  1 -1 -1    3  2  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,002 NJ                 -1   7 -1 -1 -1 -1 -1   -1  1  1 -1    -1   7   7   2      -1      -1      -1    0  503     0
           * 0 -3 200 1 201 2 103 18 163 22 123 0 0 .. 0 LineNr 8 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 3 TxtType: Q Pargr: 2 ClType:ZIm0

It seems like there is something simple that is slightly off, but after several iterations, I cannot seem to spot it. For review, here is the code that is returning what I just described:

Code:

NR==FNR   {T[$1,$2]
           next
          }
/^ *\*/   {if (!STARONE)    {STARONE = 1
                             BUF = $0
                            }
   else      {if (PR1 && PR2)   {print BUF
                                              print
                                             }
                           STARONE = PR1 = PR2 = 0
                          }
                next
               }
               {BUF = BUF ORS $0
                if (($1,$2) in T) PR1 = 1
                if ($22 == 503)  PR2 = 1
               }

Last edited by jvoot; 12-31-2018 at 01:25 AM..

jvoot

View Public Profile for jvoot

Find all posts by jvoot

12-31-2018

Registered User

489, 285

Join Date: Nov 2018

Last Activity: 30 October 2021, 10:47 AM EDT

Location: undefined

Posts: 489

Thanks Given: 382

Thanked 285 Times in 215 Posts

You can even do so

Code:

NR==FNR   {T[$1$2]
...
if ($1$2 in T) PR1 = 1

This User Gave Thanks to nezabudka For This Post:

nezabudka

View Public Profile for nezabudka

Find all posts by nezabudka

12-31-2018

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

So we have learned that posting correct specs AND samples (admittedly, ALSO reading them) saves time and effort on all sides. With a single * line separating blocks, the "STARONE" identification token can be dropped. Try

Code:

awk '
NR==FNR   {T[$1,$2]
           next
          }
/^ *\*/   {if (PR1 && PR2)   {print BUF
                              print
                             }
           PR1 = PR2 = 0
           BUF = ""
           next
          }

          {BUF = BUF (BUF?ORS:_) $0
           if (($1,$2) in T) PR1 = 1
           if ($22 == 503)   PR2 = 1
          }

 ' file1 file2
 PS004,002 XNN                 0   1  1  0  1 -1 -1    3  2  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,002 NJ                 -1   7 -1 -1 -1 -1 -1   -1  1  1 -1    -1   7   7   2      -1      -1      -1    0  503     0
           * 0 -3 200 1 201 2 103 18 163 22 123 0 0 .. 0 LineNr 8 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 3 TxtType: Q Pargr: 2 ClType:ZIm0

Last edited by RudiC; 12-31-2018 at 06:11 AM..

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

UNIX for Beginners Questions & Answers

(g)awk: Matching strings from one file in another file between two strings

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Use strings from nth field from one file to match strings in entire line in another file, awk

Discussion started by: jvoot

2. UNIX for Beginners Questions & Answers

How to pass strings from a list of strings from another file and create multiple files?

Discussion started by: nubie2linux

3. Shell Programming and Scripting

Output counts of all matching strings lessthan a number using awk

Discussion started by: cmccabe

4. UNIX for Dummies Questions & Answers

1st time awk user strings not matching right....

Discussion started by: squidGreen

5. Shell Programming and Scripting

awk extract strings matching multiple patterns

Discussion started by: chrissycc

6. Shell Programming and Scripting

Need to append matching strings in a file

Discussion started by: nanduri

7. Shell Programming and Scripting

Extract two strings from a file and create a new file with these strings

Discussion started by: Tuxidow

8. UNIX for Dummies Questions & Answers

Extraction of strings from a file, after pattern matching

Discussion started by: abkush

9. Shell Programming and Scripting

Delete lines in file containing duplicate strings, keeping longer strings

Discussion started by: raidzero

10. Shell Programming and Scripting

AWK- delimiting the strings and matching the fields

Discussion started by: kajolo