(g)awk: Matching strings from one file in another file between two strings
Hello all, I can get close to what I am looking for but cannot seem to hit it exactly and was wondering if I could get your help.
I have the following sample from textfile with many thousands of lines: File 1
I have another large text file with many lines such as this: File 2
My desire is that when $1 && $2 of File 1 match $1 && $2 of File 2 and that match is between lines beginning with "*" and also has $22=="503" in that same group of lines between "*", then print. So:
My current tactic was to take File 2 and print only matches between "*" that have $22=="503"
Then I was taking File 1 iterating over the previous output to find matches:
However, this method produces many false matches because the search criteria ($1 of File 1) is too ambiguous to match the specific matches I need. If I include the other field in the search criteria of File 1, it becomes too specific and will not include the surrounding lines.
So for example, given a hypothetical:
File 1a
File 2a
My sample code gives:
Rather than the desired:
Thanks so much and sorry for the lengthy post. Hopefully I have described this accurately.
Last edited by RudiC; 12-29-2018 at 05:30 PM..
Reason: corrected ICODE --> CODE
Let me try to paraphrase your request: In file2, "blocks" (or "records"?) are delimited by a leading and a trailing * line. Whenever a block has a line whose $1,$2 matches any $1,$2 in file1, AND its $22 is "503", then print the block.
This is untested, as I'm on a windows PC. Try
Actually RudiC, now that I read your paraphrase a bit more closely, it is slight off:
Quote:
Let me try to paraphrase your request: In file2, "blocks" (or "records"?) are delimited by a leading and a trailing * line. Whenever a block has a line whose $1,$2 matches any $1,$2 in file1, AND its $22 is "503", then print the block.
Everything was correct up until your "AND" statement. The value "503" can be in $22 in any line within the block of text between two "*" where $1,$2 of file1 match $1,$2 in file2.
So for example, above in my example File 1a I had:
And for example file 2a as:
Given this set of example input the desired output would be:
In this case $22==503 does not occur on the same line as the match between file 1a and file 2a. Thus, I would need $1,$2 in file 1 to match $1,$2 in file 2 but only between blocks of texts beginning and ending with "*" and one of (any one of) the lines in this block where a match occurs also has $22==503. Hopefully that makes better sense.
Thanks again.
Last edited by jvoot; 12-30-2018 at 02:23 AM..
Reason: Update
Thanks so much for this RudiC. I don't want to wear out your patience, but it seems that when I took samples from file 2, I took too many "*" lines and effectively made two consecutive lines beginning with "*" when in fact there are only one. Your code is very close and I suspect going awry due to my error in representing the data.
If it is not too much trouble, could I get your help in correcting my error? What is happening is that it is printing the block of lines separated by star-lines immediately after the block where the match should occur again, most likely due to my copy/paste error. I'll give you a snapshot of the output from your modified code and a correct representation of File 2. I really, really appreciate your help.
Here are the real first handful of records from file 1:
The third record of file 1, which is PS004,002 XNN, should return a match from file 2. However, it is returning the immediately subsequent block.
Thus, the relevant portion of the corrected version of file 2 below. It should be said that the first line of the file does not begin with "*", but each block of text is separated by a line beginning with a "*". Again, I apologize as when I copied and pasted relevant portions from my file I took too many "*" lines. I am very sorry.
The block of text that your latest modification returns is:
Rather than:
It seems like there is something simple that is slightly off, but after several iterations, I cannot seem to spot it. For review, here is the code that is returning what I just described:
So we have learned that posting correct specs AND samples (admittedly, ALSO reading them) saves time and effort on all sides. With a single * line separating blocks, the "STARONE" identification token can be dropped. Try
I cannot seem to get what should be a simple awk one-liner to work correctly and cannot figure out why. I would like to use patterns from a specific field in one file as regex to search for matching strings in the entire line ($0) of another file.
I would like to output the lines of File2 which... (1 Reply)
Hello Everyone ,
Iam a newbie to shell programming and iam reaching out if anyone can help in this :-
I have two files
1) Insert.txt
2) partition_list.txt
insert.txt looks like this :-
insert into emp1 partition (partition_name)
(a1,
b2,
c4,
s6,
d8)
select
a1,
b2,
c4, (2 Replies)
The awk below is supposed to count all the matching $5 strings and count how many $7 values is less than 20. I don't think I need the portion in bold as I do not need any decimal point or format, but can not seem to get the correct counts. Thank you :).
file
chr5 77316500 77316628 ... (6 Replies)
So I was given a file,and I want to count how many occurrences happen with a specific string. I have two, that could have up to 3 different outcomes.
Now my trouble I believe starts with this string, "news.cais.net"
but why?
as of now my output is this...
accepted rejected ... (3 Replies)
Hi,
I wasn't quite sure how to title this one! Here goes:
I have some already partially parsed log files, which I now need to extract info from. Because of the way they are originally and the fact they have been partially processed already, I can't make any assumptions on the number of... (8 Replies)
Hi ,
I am writing a shell script to check pvsizes in linux box.
# for i in `cat vgs1`
> do
> echo "########### $i ###########"
> pvs|grep -i $i|awk '{print $2,$1,$5}'>pvs_$i
> pvs|grep -i $i|awk '{print $1}'|while read a
> do
> fdisk -l $a|head -2|tail -1|awk '{print $2,$3}'>pvs_$i1
>... (3 Replies)
I have the following lines in a log file. It would be great if some one can help me to create a new file with the just entries in the below format.
66.150.161.195 HPSAC=Z05
66.150.161.196 HPSAC=A05
That is just extract the IP address and the string DPSAC=its value
66.150.161.195 -... (1 Reply)
I need to extract strings from a file.
The file contains data like:
Plan ABCD
IN-+-172BB---118C2C---GGN_342-+-MM77_23--+-LAS24_3|GGK_774
| | \-LAS24_2|GGN_774
| +-AA_800_1-+-BAS_000|GGK_362
| | \-BAS_001|GGK_360
| \-DD_000T1---DAM_001|STEEL_0
Plan SHELL_1... (3 Replies)
The question is not as simple as the title... I have a file, it looks like this
<string name="string1">RZ-LED</string>
<string name="string2">2.0</string>
<string name="string2">Version 2.0</string>
<string name="string3">BP</string>
I would like to check for duplicate entries of... (11 Replies)
Hello,
I am newbie in awk. I have just started learning it.
1) I have input file which looks like:
{4812 4009 1602 2756 306} {4814 4010 1603 2757 309} {8116 9362 10779 }
{10779 10121 9193 10963 10908} {1602 2756 306 957 1025} {1603 2757 307}
and so on.....
2) In output:
a)... (10 Replies)