Using strings in one file as regex to search field of another file


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Using strings in one file as regex to search field of another file
# 1  
Old 12-30-2016
Using strings in one file as regex to search field of another file

I have a data file, file1.txt, such as the following:

Code:
1,2    "TWRTW",   "TWRH/"  [nominal ending: "T"] [pronominal suffix: "W", gender: masculine, number: singular, person: third] [gender: feminine, number: singular, part of speech: noun, state: absolute]
1,2    "JHGH",    "HGJ["  [preformative: "J"] [verbal ending: ""] [gender: masculine, number: singular, person: third, part of speech: verb, verbal stem: qal, verbal tense: imperfect]
1,2    "JWMM",    "JWMM"  [part of speech: adverb]
1,2    "W",       "W"  [part of speech: conjunction]
1,2    "LJLH",    "LJLH/"  [nominal ending: ""] [gender: masculine, number: singular, lexical set: potential adverb, part of speech: noun]
1,3    "W",       "W"  [part of speech: conjunction]
1,3    "HJH",     "HJJ["  [verbal ending: ""] [gender: masculine, number: singular, person: third, lexical set: copulative verb, part of speech: verb, verbal stem: qal, verbal tense: perfect]

I have another file, file2.txt, that contains only the strings in field 3 ($3), stripped of all characters beside [A-Z<>] due to the abundance of metacharacters in $3 (e.g., literal "[" and "/").

File2.txt contents:
Code:
TWRH
HGJ
JWMM
W
LJLH
W
HJJ

My goal is to print lines from file1.txt that do not have strings in $2 that contain the strings from file2.txt as follows:

Desired output:
Code:
1,2    "TWRTW",   "TWRH/"  [nominal ending: "T"] [pronominal suffix: "W", gender: masculine, number: singular, person: third] [gender: feminine, number: singular, part of speech: noun, state: absolute]
1,2    "JHGH",    "HGJ["  [preformative: "J"] [verbal ending: ""] [gender: masculine, number: singular, person: third, part of speech: verb, verbal stem: qal, verbal tense: imperfect]
1,3    "HJH",     "HJJ["  [verbal ending: ""] [gender: masculine, number: singular, person: third, lexical set: copulative verb, part of speech: verb, verbal stem: qal, verbal tense: perfect]

I've attempted to accomplish this with the following bit of awk code:
Code:
awk 'BEGIN { while(getline l < "file2.txt") PATS[l] } ok=0;{for (p in PATS) if ($2 ~ p) ok=1}; !ok {print}' < file1.txt

I've used code similar to this on other files of mine where I needed to execute a similar function. However, this is returning no output whatsoever and cannot seem to pinpoint the problem here.

I would be very much appreciative of a solution that uses the two file approach I've described above or indeed, one that acts upon file1.txt using $3 as regex that are to be not matched by the strings in $2. Something of the sort:
Code:
awk '{if($2 !~ $3){print}}' file1.txt

# 2  
Old 12-30-2016
Code:
awk 'FNR==NR{a[$1]=$1; next} {c=$2; gsub(/[^[:alpha:]]/, "", c)} !a[c]' file2.txt file1.txt

This User Gave Thanks to Aia For This Post:
# 3  
Old 12-31-2016
You could also try this slight simplification of Aia's suggestion:
Code:
awk -F'"' 'FNR==NR{a[$1]; next} !($2 in a)' File2.txt file1.txt

Note that your code uses the filenames file1.txt and file2.txt, but your description says that the second file is named File2.txt. The code above uses the filenames given in the description of your problem and produces the output you said you wanted.

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
This User Gave Thanks to Don Cragun For This Post:
# 4  
Old 01-01-2017
Thank you so much for this Don and I apologize for the typo. I should have caught that.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search for a text between two strings in a file using regex

Here is my sample file data: My requirement is to have a regex expression that is able to search for visible starting string "SSLInsecureRenegotiation Off" between strings "<VirtualHost " and "</VirtualHost>". In the sample data two lines should be matched. Below is what I tried but... (5 Replies)
Discussion started by: mohtashims
5 Replies

2. UNIX for Beginners Questions & Answers

Use strings from nth field from one file to match strings in entire line in another file, awk

I cannot seem to get what should be a simple awk one-liner to work correctly and cannot figure out why. I would like to use patterns from a specific field in one file as regex to search for matching strings in the entire line ($0) of another file. I would like to output the lines of File2 which... (1 Reply)
Discussion started by: jvoot
1 Replies

3. Shell Programming and Scripting

Read in search strings from text file, search for string in second text file and output to CSV

Hi guys, I have a text file named file1.txt that is formatted like this: 001 , ID , 20000 002 , Name , Brandon 003 , Phone_Number , 616-234-1999 004 , SSNumber , 234-23-234 005 , Model , Toyota 007 , Engine ,V8 008 , GPS , OFF and I have file2.txt formatted like this: ... (2 Replies)
Discussion started by: An0mander
2 Replies

4. Shell Programming and Scripting

Perl - use search keywords from array and search a file and print 3rd field when matched

Hi , I have been trying to write a perl script to do this job. But i am not able to achieve the desired result. Below is my code. my $current_value=12345; my @users=("bob","ben","tom","harry"); open DBLIST,"<","/var/tmp/DBinfo"; my @input = <DBLIST>; foreach (@users) { my... (11 Replies)
Discussion started by: chidori
11 Replies

5. Shell Programming and Scripting

Search strings from array in second file

I have a file search_strings.txt filled with search strings which have a blank in between and look like this: S. g. Erh. o. J. v. d. Chijs g. Ehr.I would like to search the strings in the second given Textfile.txt and it shall return the column number. Can anybody help with the correct... (3 Replies)
Discussion started by: sdf
3 Replies

6. UNIX for Dummies Questions & Answers

How to search two strings in a file and print the contents in between to a file

I have a file called po.txt. Here is the content of the file: <!DOCTYPE PurchaseOrderMessage (View Source for full doctype...)> - <PurchaseOrder> - <Header> <MessageId>cdb3062b-685b-4cd5-9633-013186750e10</MessageId> <Timestamp>2011-08-01T13:47:23.536-04:00</Timestamp> </Header> -... (4 Replies)
Discussion started by: webbi
4 Replies

7. Shell Programming and Scripting

Search multiple Strings in a File

Hi I want to search multiple strings in a file . But the search should start with "From" Keyword and end with before "Where" keyword. Please suggest me. Thanks (2 Replies)
Discussion started by: sboss
2 Replies

8. Shell Programming and Scripting

Search complicated strings on file

Can someone help me? I been figuring out how I can search and extract a complicated search string from a file. The whole string is delimited by a period. And the file where I'm searching is composed of differnt string such as that. For example, I have this search string: and I have a file... (3 Replies)
Discussion started by: Orbix
3 Replies

9. Shell Programming and Scripting

search file between last occurence of 2 strings

I need to extract the last block of /== START OF SQLPLUS ==/ and /== END OF SQLPLUS ==/. The logifle is written to several times in a day using >> to append. I need a solution using grep/sed. logfile looks like this START OF LOGFILE /== START OF SQLPLUS ==/ ERROR /== END OF SQLPLUS... (5 Replies)
Discussion started by: hanton
5 Replies

10. Shell Programming and Scripting

How to search multiple strings in a file

Hi All, I want to search all the ksh scripts that has following details. 1. Search for "exit 0" 2. Search for "sqlldr" or sqlplus" 3. In the above files i want to search for all the script that has no "case" in it. Please advice. Thanks, Deep (2 Replies)
Discussion started by: deepakpv
2 Replies
Login or Register to Ask a Question