This is a problem I've worked on a while and can't figure out.
There is a file.txt
The Awk program is trying to extract the year portion of the birth and death ("98: and "2nd C.") using the below technique
There are other ways to do it via the command line, but I need inside a function in a script using readfile().
The above code returns the correct birth year, but the death year is mangled because the regex is grabbing both the birth and death strings.
It works when not using the readfile() function, instead getline and this regex
The ".*" grabs everything to the end of the line and since readline makes the entire file a single line it grabs to the end of the "line" (file). That's why I'm using word boundary ("\y"), which works, but it doesn't work if there is a space in the data, such as the case here with the death string ("2nd C."). I tried adding "[:space:]" but that didn't work. I think this is solvable with the right regex but I'm out of ideas.
Last edited by Scrutinizer; 10-26-2014 at 03:36 AM..
Reason: extra code tags
OK - I found the problem. The match line should look like this:
The problem was [: punct:] was matching the [[]] characters in file.txt .. so in order to match the "." in "2nd C." it's now noted directly (right before the A-z).
Thanks.
Last edited by Scrutinizer; 10-26-2014 at 03:36 AM..
Reason: code tags
OK - I found the problem. The match line should look like this:
The problem was [: punct:] was matching the [[]] characters in file.txt .. so in order to match the "." in "2nd C." it's now noted directly (right before the A-z).
Thanks.
The RE [A-z] also includes the characters [, \, ], ^, _, and `. The RE [[:space:]] contains all whitespace characters; I'm guessing that you just want a space character instead. And, if you're trying to catch common forms of dates, you probably also want to include comma (for dates like December 25, 1999. So, a better RE would probably be:
This User Gave Thanks to Don Cragun For This Post:
Note: A-z0-9 will probably not do what you want:
This is because square brackets fall within that range. Moreover, ranges like that are also dependent on locale which could produce other unexpected results. So it would be better to use [:alnum:] instead.
---
Also the code looks a bit convoluted for such a simple task. I don't see why you would need to use gawk and read the entire file in memory, while this could also be done by using awk's line processing mid section, which is typically used for this. I would suggest you read up on that.
---
You could perhaps also consider selecting a different line processing tool like GNU sed
Which would maybe produce similarly acceptable results..
I would like to extract "1333 Fairlane" given the below text.
The word "Building:" is always present. The wording between Building and the beginning of the address can be almost anything. It appears the the hyphen is there most of the time.
Campus: Fairlane Business Park
Building:... (9 Replies)
My input file looks like this:
13154|X,the deer hunter
13154|Y,the good life
1316|,american idol
1316|,bowling
1316|,chuck
etc...
The X, Y, or any other character (besides a comma) after the pipe is a "Device Type". I want to strip out lines that do not have a device type.
I have... (2 Replies)
Good Day,
Im new to scripting especially awk and sed. I just would like to ask help from you guys about a sed command that prints the line immediately after a regexp, but not the line containing the regexp.
sed -n '/regexp/{n;p;}' filename
What if my regexp is 3 word or a sentence. Im... (3 Replies)
I'd like to know if there is a catchall line for renaming the following patterns:
s01e03 -> 01x03
s4e9 -> 04x09
s10e08 ->10x08
and possibly even:
318 -> 03x18
1002 ->10x02
if its the first 3 or first digit number in the string.
thanks! (0 Replies)
I have 2 files called stuff-egress-filter and stuff-ingress filter. There are also files called something like stuff-egress-F/0
I want to match the first two... I tried (i realize there is no filename... I'm piping this from the ls command)
grep stuff-*-filter
Finds nothing. If I... (18 Replies)
please help:
I want to add 1 space between string and numbers:
input file:
abcd12345
output file:
abcd 1234
The following sed command does not work:
sed 's/\(+\)\(+\)/\1 \2/' file
Any ideas, please
Andy (2 Replies)
Hi guys,
does anyone know how to test for a regular expression - i want to include it in a script to make sure the variable is a regexp
cheers (1 Reply)