While doing some further testing, I came up with a few questions. If you had the following input file:
what, if any, valid IP addresses would you like your script to report? I'm guessing that none should be found here, but one of the scripts you posted early in this thread will come up with something like the following:
The current script would not match all those and wouldn't have since I moved to an Awk only solution. I posted about this in post num 17, see paragraph "The code has failed to handle one simple limitation." Post num 17 is here: https://www.unix.com/showpost.php?p=3...9&postcount=17
Which more greedily matches number dot sequences and leaves it to the conditional in the END section to determine actual IP validity.
Putting your input file (above) into the script should have given just this IP address: 11.22.33.44 - that being the only valid IP address in your data (as far as my intentions are concerned).
What it actually gave was this:
Well done and thanks you've spotted a bug. What happened was this:
The regex encInFwdSlashesNotIP = "[/]" ipLikeSequence "[/]"; replaced the IP surrounded by forward slashes with an x, leaving 2 valid IP addresses on either side of the x - both of which should have been removed because, in the input line, one begins with a forward slash while the other ends with one.
I've modified the code. I had already introduced the self explanatory beginsWithFwdSlashNotIP and endsWithFwdSlashNotIP regexes back in post number 20 to handle version numbers in Urls (which look like IPs) more robustly. I've now removed enclosedInFwdSlashesNotIP realizing that it is redundant (also making the thread title redundant) and solved the issue by using '/' instead of 'x' as the replacement char in the 'begins with fwd slash' and 'ends with fwd slash' regex gsub() calls. So now I have:
and with your input file I now get just 11.22.33.44 which is what I want. The problem line now has all 3 IP addresses which begin or end with a forward slash removed:
Quote:
Originally Posted by Don Cragun
I'm looking at a different way to evaluate possible IP addresses, but I need to know what you want to be required to appear before and after a valid IP address. Am I correct in assuming that a valid IP address should appear at the start of a line or be preceded by a white-space character, be followed by a white-space character or appear at the end of a line, and contain four 1 to 3 digit numbers separated by single occurrences of a period where the values of the numbers are 0 <= number <= 255?
Note that if my assumption is correct, an IP address surrounded by alphabetic or punctuation characters (in addition to slashes) should also be rejected. If my assumption is correct, should an exception be made allowing commas (or comma followed by space) to separate IP addresses?
No your assumptions are not correct, typical input is HTML, though sometimes just plain text, and sometimes just a file consisting of nothing at all except for a single IP address. The script as a whole retrieves a computer's WAN IP address from behind a router by downloading web pages which display a user's IP address. The script randomly downloads 2 or more such pages to use as verification. Do you want a copy of the whole thing to have a look at?
You are however correct in thinking I want (quote Don): "four 1 to 3 digit numbers separated by single occurrences of a period where the values of the numbers are 0 <= number <= 255". [I know that that notation means the inclusive range of 0..255 but I've never understood why the accepted notation is not "0 >= number <= 255" since what is wanted is greater than or equal to 0 and not less than or equal to 0 which is how it reads to me.]
All of the below are real world examples of valid IPs that should be accepted (many of these are sections of a line and not the whole line which are often quite long):
Quote:
Originally Posted by Don Cragun
Are we having fun yet?
Yes Sir.
The current Awk code with some helpful debugging print statements is:
Thanks again,
Hi,
I have a log file containg records in sequence
<CRMSUB:MSIN=2200380,BSNBC=TELEPHON-7553&TS21-7716553&TS22-7716553,NDC=70,MSCAT=ORDINSUB,SUBRES=ONAOFPLM,ACCSUB=BSS,NUMTYP=SINGLE;
<ENTROPRSERV:MSIN=226380,OPRSERV=OCSI-PPSMOC-ACT-DACT&TCSI-PPSMTC-ACT-DACT&UCSI-USSD;... (17 Replies)
I admin two co-located servers. I built an app that creates subdirectories for users ie www.site.com/username.
one server that works just fine when you hit that url, it sees the index within and does as it should.
I moved the app to my other server running FEDORA 1 i686 standard, cPanel... (3 Replies)
Hello,
I'm working on unix with grep (GNU grep) 2.5.1. I'm going through some of the newer regex syntax using Regular Expression Reference - Advanced Syntax a guide.
ls -aLl /bin | grep "\(x\)"
Which works, just highlights 'x' where ever, when ever.
I'm trying to to get (?:) to work but... (4 Replies)
Hi,
I have a pipe delimited file. I am checking for junk characters ( non printable characters and unicode values).
I am using the following code
grep '' file.txt
But i want to ignore the name fields. For example field2 is firstname so i want to ignore if the junk characters occur... (4 Replies)
Hi,
I need to perform a grep from a file, but ignore any results from the first column.
For simplicity I have changed the actual data, but for arguments sake, I have a file that reads:
MONACO Monaco ASMonaco
MANUTD ManUtd ManchesterUnited
NEWCAS NewcastleUnited
NAC000 NAC ... (5 Replies)
Hi Guys.
I guess I have a very basic query but stuck with it :(
I have a file in which I want to extract particular content. The content is between standard format like :
Verify stats
A=0
B=12
C=34
TEST Failed
Now I want to extract data between "Verify stats" & "TEST Failed" but do... (6 Replies)
Friends,
In the file i am having more then 100 lines like,
File1 had the values like this:
#Example East.server_01=EAST.SERVER_01
East.server_01=EAST.SERVER_01
West.server_01=WEST.SERVER_01
File2 had the values like this:
#Example EAST.SERVER_01=http://yahoo.com... (3 Replies)
Hi,
How to achieve the displaying of sequence no while doing grep for an output.
Ex., need the output like below with the serial no, but not the available line number in the file
S.No Array Lun
1 AABC 7080
2 AABC 7081
3 AADD 8070
4 AADD 8071
5 ... (3 Replies)
Hi,
I want to read a file line by line and exclude the lines that are beginning with special characters. The below code is working fine except when the line starts with hyphen (-) in the file.
for TEST in `cat $FILE | grep -E -v '#|/+' | awk '{FS=":"}NF > 0{print $1}'`
do
.
.
done
How... (4 Replies)