Regex within IF statement in awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Regex within IF statement in awk
# 8  
Old 05-12-2013
The reason why those fail is explained in my post and in the link that I included.

Regards,
Alister
# 9  
Old 05-12-2013
Hello alister,

Thanks for anwer and link shared. I see about the escape sequences in that link for a regexp constant.

I haven't put any escape to the "|" because awk understand it as literal "|", but I don't know why continues falin the code below.

Code:
awk '
BEGIN {
X="5|35|998367383|5|3||,7|44|783738002|3|55|JK|,97|16|3337128943"
Y="|87|50|2|,8,3,32,0,1,0,1,7,8,9,2,2,3"
Z=X "(15|20|45|70)" Y
}
$0 !~ Z {print}' input.txt

Is not possible because the string contains "|" or what is wrong?

Thanks in advance.

Regards
# 10  
Old 05-12-2013
That's for sending the exact input file and script you are using. That really helped.

I took another look, tried several things, learned it's a sticky wicket, found something that seems to work well. Smilie

------------------------------

When I added a diagnostic statement print "Z=" Z at the end of the BEGIN segment, it printed a message that shows awk disregards the single back-slash: Smilie
Code:
awk: cmd. line:3: warning: escape sequence `\|' treated as plain `|'
Z=5|35|998367383|5|3||,7|44|783738002|3|55|JK|,97|16|3337128943(15|20|45|70)|87|50|2|,8,3,32,0,1,0,1,7,8,9,2,2,3

------------------------------

I tried adding another \ character, in line with what awk expects, to make sure the | is escaped.
Code:
awk '
BEGIN {
X="5\\|35\\|998367383\\|5\\|3\\|\\|,7\\|44\\|783738002\\|3\\|55\\|JK\\|,97\\|16\\|3337128943"
Y="\\|87\\|50\\|2\\|,8,3,32,0,1,0,1,7,8,9,2,2,3"
Z=X "(15|20|45|70)" Y
print "Z=" Z
}
{if($0 !~ Z); print}' input.txt

That got rid of the warning message, and produced the expected Z string, but unfortunately did not seem to help (I thought it would work): Smilie
Code:
$ ./test.sh
Z=5\|35\|998367383\|5\|3\|\|,7\|44\|783738002\|3\|55\|JK\|,97\|16\|3337128943(15|20|45|70)\|87\|50\|2\|,8,3,32,0,1,0,1,7,8,9,2,2,3
5|35|998367383|5|3||,7|44|783738002|3|55|JK|,97|16|333712894315|87|50|2|,8,3,32,0,1,0,1,7,8,9,2,2,3
5|35|998367383|5|3||,7|44|783738002|3|55|JK|,97|16|333712894334|87|50|2|,8,3,32,0,1,0,1,7,8,9,2,2,3
5|35|998367383|5|3||,7|44|783738002|3|55|JK|,97|16|333712894320|87|50|2|,8,3,32,0,1,0,1,7,8,9,2,2,3
5|35|998367383|5|3||,7|44|783738002|3|55|JK|,97|16|333712894302|87|50|2|,8,3,32,0,1,0,1,7,8,9,2,2,3
5|35|998367383|5|3||,7|44|783738002|3|55|JK|,97|16|333712894391|87|50|2|,8,3,32,0,1,0,1,7,8,9,2,2,3
5|35|998367383|5|3||,7|44|783738002|3|55|JK|,97|16|333712894345|87|50|2|,8,3,32,0,1,0,1,7,8,9,2,2,3
5|35|998367383|5|3||,7|44|783738002|3|55|JK|,97|16|333712894320|87|50|2|,8,3,32,0,1,0,1,7,8,9,2,2,3
5|35|998367383|5|3||,7|44|783738002|3|55|JK|,97|16|333712894345|87|50|2|,8,3,32,0,1,0,1,7,8,9,2,2,3
5|35|998367383|5|3||,7|44|783738002|3|55|JK|,97|16|333712894370|87|50|2|,8,3,32,0,1,0,1,7,8,9,2,2,3
5|35|998367383|5|3||,7|44|783738002|3|55|JK|,97|16|333712894315|87|50|2|,8,3,32,0,1,0,1,7,8,9,2,2,3

------------------

I tried adding further backslash characters, it would not work. Maybe there is some way to add a series of preceding backslash characters and make it work, but at this point the large number of confusing backslash characters is a turn-off anyway.

The business of escaping escape characters and dealing with the two passes that awk makes to process the string expression made me re-strategize a way to avoid the byzantine complications from two parser traverses. Following is a solution that uses the simple way using // but creates it on the fly from within a shell script:
Code:
$ cat test.sh
B='5\|35\|998367383\|5\|3\|\|,7\|44\|783738002\|3\|55\|JK\|,97\|16\|3337128943'
M='(15|20|45|70)'
E='\|87\|50\|2\|,8,3,32,0,1,0,1,7,8,9,2,2,3'
Z="$B$M$E"
echo "\$0 !~ /$Z/ {print}" > script.awk
awk -f script.awk input.txt

It works correctly: Smilie
Code:
$ ./test.sh
5|35|998367383|5|3||,7|44|783738002|3|55|JK|,97|16|333712894334|87|50|2|,8,3,32,0,1,0,1,7,8,9,2,2,3
5|35|998367383|5|3||,7|44|783738002|3|55|JK|,97|16|333712894302|87|50|2|,8,3,32,0,1,0,1,7,8,9,2,2,3
5|35|998367383|5|3||,7|44|783738002|3|55|JK|,97|16|333712894391|87|50|2|,8,3,32,0,1,0,1,7,8,9,2,2,3

The dynamically generated script.awk file, which you can examine in case needed to troubleshoot:
Code:
$ cat script.awk
$0 !~ /5\|35\|998367383\|5\|3\|\|,7\|44\|783738002\|3\|55\|JK\|,97\|16\|3337128943(15|20|45|70)\|87\|50\|2\|,8,3,32,0,1,0,1,7,8,9,2,2,3/ {print}

You can pass in the B (begin) and E (end) strings as shell script arguments, so this seems adaptable to changing them as needed. Hope this works!
This User Gave Thanks to hanson44 For This Post:
# 11  
Old 05-12-2013
Quote:
Originally Posted by hanson44
... awk disregards the single back-slash: Smilie
Code:
awk: cmd. line:3: warning: escape sequence `\|' treated as plain `|'

"\|" is an undefined sequence. It looks like an escape sequence, but there is no such defined escape sequence in string literals. Some awk string parsers (as the quoted error message makes clear) will discard the backslash and keep the pipe symbol, which is not special in a string. Other awks will keep both characters. However, an implementation that aborts with a compilation error (to my knowledge, none do) is not violating any standard.

The same is true when any character that is not part of a defined escape sequence (\\n, t, \\, etc) follows a backslash.

This is also an issue with sed escape sequence handling. If someone actually wrote an implementation that strictly refused undefined escape sequences, most non-trivial scripts posted in these forums would fail (which one could argue would be preferable to unknowingly harboring unreliable behavior).

Quote:
Originally Posted by hanson44
Code:
awk '
BEGIN {
X="5\\|35\\|998367383\\|5\\|3\\|\\|,7\\|44\\|783738002\\|3\\|55\\|JK\\|,97\\|16\\|3337128943"
Y="\\|87\\|50\\|2\\|,8,3,32,0,1,0,1,7,8,9,2,2,3"
Z=X "(15|20|45|70)" Y
print "Z=" Z
}
{if($0 !~ Z); print}' input.txt

That got rid of the warning message, and produced the expected Z string, but unfortunately did not seem to help (I thought it would work): Smilie
The problem has nothing to do with the string literals and that their value is later processed by the regular expression parser. The problem is a misplaced semicolon forming an empty if-statement.

You can move or drop the semicolon, or just use a bare pattern. With your string literals as I quoted them, the following will work fine.
Code:
BEGIN { X=...; Y=...; Z=X...Y }
$0 !~ Z

Regards,
Alister

Last edited by alister; 05-12-2013 at 10:52 PM..
These 2 Users Gave Thanks to alister For This Post:
# 12  
Old 05-12-2013
Quote:
The problem is a misplaced semicolon forming an empty if-statement.
That is correct. Thanks. As I said before, I did expect adding the extra \ would work, was surprised it did not. I just copied the posted script verbatim, did not see the extra semicolon. Here is the corrected script, now you have two ways that work:
Code:
$ cat test.sh
awk '
BEGIN {
X="5\\|35\\|998367383\\|5\\|3\\|\\|,7\\|44\\|783738002\\|3\\|55\\|JK\\|,97\\|16\\|3337128943"
Y="\\|87\\|50\\|2\\|,8,3,32,0,1,0,1,7,8,9,2,2,3"
Z=X "(15|20|45|70)" Y
}
{if($0 !~ Z) print}' input.txt

Code:
$ ./test.sh
5|35|998367383|5|3||,7|44|783738002|3|55|JK|,97|16|333712894334|87|50|2|,8,3,32,0,1,0,1,7,8,9,2,2,3
5|35|998367383|5|3||,7|44|783738002|3|55|JK|,97|16|333712894302|87|50|2|,8,3,32,0,1,0,1,7,8,9,2,2,3
5|35|998367383|5|3||,7|44|783738002|3|55|JK|,97|16|333712894391|87|50|2|,8,3,32,0,1,0,1,7,8,9,2,2,3

This User Gave Thanks to hanson44 For This Post:
# 13  
Old 05-13-2013
Hello hanson and alister,

Many thanks for your time to try to help. And thanks for your explanations, I have more clear some things.

hanson,

I was tried too, use double escaping but I had the semicolon in the same place either Smilie.

Alister,

May you explain me please, why that misplaces semicolon generates an empty if-statement.

When I need to put semicolon (when is mandatory and when is not)?

In this case, the semicolon was actually ruining the output.

Thanks for all the help.

Last edited by Ophiuchus; 05-13-2013 at 01:39 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Convert Update statement into Insert statement in UNIX using awk, sed....

Hi folks, I have a scenario to convert the update statements into insert statements using shell script (awk, sed...) or in database using regex. I have a bunch of update statements with all columns in a file which I need to convert into insert statements. UPDATE TABLE_A SET COL1=1 WHERE... (0 Replies)
Discussion started by: dev123
0 Replies

2. Shell Programming and Scripting

Perl - what does this statement mean -Regex

push @MACARRAY, "$+{catalog} $+{machine}\n" if ($info =~ /(?<catalog>catalog).+?(?<machine>\*+)/ms); I am (still) trying to solve problem. Looking around on the server I found this piece of code. Specifically what does "$+{catalog} $+{machine}\n" do ? Thanks in advance (1 Reply)
Discussion started by: popeye
1 Replies

3. Shell Programming and Scripting

If statement with [[ ]] and regex not working as expected

Using BASH: $ if -- ::00" ]]; then echo "true"; else echo "false"; fi false Mike (5 Replies)
Discussion started by: Michael Stora
5 Replies

4. Shell Programming and Scripting

Regex escape special character in AWK if statement

I am having issues escaping special characters in my AWK script as follows: for id in `cat file` do grep $id in file2 | awk '\ BEGIN {var=""} \ { if ( /stringwith+'|'+'50'chars/ ) { echo "do this" } else if ( /anotherString/ ) { echo "do that" } else { ... (4 Replies)
Discussion started by: purebc
4 Replies

5. Shell Programming and Scripting

awk equivalent of regex

Hi all, Can someone tell me what's the (g)awk equal of this simple regex to find ip addresses in urls: egrep "^http://{1,3}\.{1,3}\.{1,3}\.{1,3}(:{1,5})?/"Input: http://10.0.0.1/query.exe http://11y10x09w:80/howaboutme http://192.168.100.190:1234/takeme.gpg Output:... (8 Replies)
Discussion started by: r4v3n
8 Replies

6. UNIX for Dummies Questions & Answers

Using AWK and regex

Hi can you suggest in this regard The sample.txt conatins the data name lines type sam 12 txt sam 24 xls sam 36 pdf ram 32 txt ram 45 sxls ram 58 word sam 92 jpeg sam 21 gif sam 22 ltf from the data i need to sum all line... (5 Replies)
Discussion started by: krashraj
5 Replies

7. Shell Programming and Scripting

awk regex problem

hi everyone suppose my input file is ABC-12345 ABCD-12345 BCD-123456 i want to search the specific pattern which looks like - in a file so i used this command cat $file | awk ' { if ($0 ~ /-/) { print } }' so it gives me the result as ABCD-12345 BCD-12345 BCD-12345 ... (31 Replies)
Discussion started by: aishsimplesweet
31 Replies

8. Shell Programming and Scripting

[BASH] recognise new line regex in case statement

Hi, I'm trying to write a routine to parse a file that contains data that will be read into arrays. The file is composed of labels to identify data types and arbitrary lines of data with the usual remarks and empty new lines as is common with config files. The initial pass is built as so:... (3 Replies)
Discussion started by: ASGR
3 Replies

9. Shell Programming and Scripting

awk or regex

Hi! I want to made a program that will generate code like this: {{Navedi XYZ |avtor=XYZ1 |naslov=XYZ2 |leto_izzida=XYZ3 |zalozba=XYZ4 |kraj=XYZ5 |isbn=XYZ6 |cobiss_id=XYZ7 }} from input like this: <b> ODGOVORNOST............. : <a... (5 Replies)
Discussion started by: smihael
5 Replies

10. UNIX for Dummies Questions & Answers

Regex in if-then-else statement to match strings

hello I want to do a pattern match for string in the if statement, but I am not sure how to use regex inside the if statement. I am looking for something like this: if {2,3} ]; then ..... .... ... fi (7 Replies)
Discussion started by: rakeshou
7 Replies
Login or Register to Ask a Question