Sponsored Content
Top Forums Shell Programming and Scripting Grep regex to ignore sequence only if surrounded by fwd-slashes Post 302878921 by Don Cragun on Monday 9th of December 2013 02:22:02 PM
Old 12-09-2013
Quote:
Originally Posted by gencon
… … …
What I don't understand is why both ipUnique[ip] = ip; and ipUnique[ip]; appear to function equivalently BECAUSE please note that the bit I've made bold in what you wrote below is not correct...

Below I am reposting Don's code from the end of his most recent post, which is here: https://www.unix.com/showpost.php?p=3...8&postcount=15

In his code he uses this line ipUnique[ip] and not ipUnique[ip] = ip; but then in the END section he is able to access the values of the array (which 'should be' the null string but aren't) as if he had used ipUnique[ip] = ip;. The relevant bits are highlighted in red in the code below.

Code:
#!/bin/bash
tempFileName=${1:-file}

    awkExtractIPAddresses='
    BEGIN {
        ipSequence = "[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+"
        digitSequenceTooLongNotIP = "[0-9][0-9][0-9][0-9]+"
        encInFwdSlashesNotIP = "[/]" ipSequence "[/]"
        versioningNotIP = "[Vv]([Ee][Rr]([Ss][Ii][Oo][Nn])?)?[ .]*" ipSequence
    }
    {
        line = $0
        gsub(digitSequenceTooLongNotIP, "x", line)
        gsub(encInFwdSlashesNotIP, "x", line)
        gsub(versioningNotIP, "x", line)
        while (match(line, ipSequence)) {
            ip = substr(line, RSTART, RLENGTH)
            ipUnique[ip]
            line = substr(line, RSTART + RLENGTH + 1)
        }
    }
    END {
        ipRangeMin = 0
        ipRangeMax = 255
        ipNumSegments = 4
        ipDelimiter = "."
        for (ip in ipUnique) {
            numSegments = split(ip, ipSegments, ipDelimiter)
            if (numSegments == ipNumSegments &&
                ipSegments[1] >= ipRangeMin && ipSegments[1] <= ipRangeMax &&
                ipSegments[2] >= ipRangeMin && ipSegments[2] <= ipRangeMax &&
                ipSegments[3] >= ipRangeMin && ipSegments[3] <= ipRangeMax &&
                ipSegments[4] >= ipRangeMin && ipSegments[4] <= ipRangeMax) {
                    print ip
            }
        }
    }'

    ipAddressMatches=$(awk "$awkExtractIPAddresses" "$tempFileName")

printf "%s\n" $ipAddressMatches

Do you need convincing? I certainly did !!

… … …

So I repeat myself (at least in essence): ipUnique[ip] = ip; and ipUnique[ip]; function equivalently in the code above. I do not understand why ipUnique[ip]; works at all. As I said in my reply to Don, my best guess is that it has something to do with stack manipulation because, as you pointed out and the manual clearly says, when an array is referenced (with no assignment) the null string is assigned to that array element's value.

Here's hoping the Don Craguneleone will get back into the action, if ever I needed The Godfather it's now. Cue a (somewhat slimmer) Marlon Brandoesque figure in the heavily shaded study of his mansion, with a blinking cursor wizzing across the line like a speeding bullet and wedding guests waiting patiently with their own coding problems. Smilie

All the best, thanks for taking the time to read this,

Gencon

… … …
I apologize for taking so long to get back to you. But when I have a choice between spending some time with the grandkids or evaluating an awk script; the grandkids are going to win every time. Smilie

There is no stack manipulation going on… I am not referencing the value of any ipUnique[] array elements in the END clause.

I'm working on a much lengthier response to message #17 in this thread, but I may not be ready to post it for a couple of days (while I get caught up on other things). But, this point seems to be bothering you and (I hope) will be easy to explain. As you have said, the command ipUnique[ip] creates an element in the array ipUniqe with index ip and assigns a null value to it. But the command
Code:
for (ip in ipUnique)

never looks at the value assigned to any element in the array; it only looks at the indices of the elements in the array. Perhaps a simpler example will help:
Code:
awk '
BEGIN {
    ipUnique["index1"] = 1
    ipUnique["index 200"] = 2
}
END {
    for (ip in ipUnique)
        printf("ipUnique index: %s, ipUnique[index] value: %s\n",
            ip, ipUnique[ip]);
}' /dev/null

which produces the output:
Code:
ipUnique index: index 200, ipUnique[index] value: 2
ipUnique index: index1, ipUnique[index] value: 1

Your full script (and mine) never use ipUnique[ip] (which is the value of an array element) in the END clause; they only reference ip (which is the index of an array element).

We would need to use:
Code:
        ipUnique[ip] = ip;

instead of:
Code:
        ipUnique[ip];

if we used:
Code:
        numSegments = split(ipUnique[ip], ipSegments, ipDelimiter);

instead of:
Code:
        numSegments = split(ip, ipSegments, ipDelimiter);

Do you understand now why we don't need to waste time or space assigning the index of each array element as the value of the array element as well instead of just using the index itself?

Last edited by Don Cragun; 12-10-2013 at 04:57 AM.. Reason: Remove extraneous end code tag.
This User Gave Thanks to Don Cragun For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

To grep in sequence

Hi, I have a log file containg records in sequence <CRMSUB:MSIN=2200380,BSNBC=TELEPHON-7553&TS21-7716553&TS22-7716553,NDC=70,MSCAT=ORDINSUB,SUBRES=ONAOFPLM,ACCSUB=BSS,NUMTYP=SINGLE; <ENTROPRSERV:MSIN=226380,OPRSERV=OCSI-PPSMOC-ACT-DACT&TCSI-PPSMTC-ACT-DACT&UCSI-USSD;... (17 Replies)
Discussion started by: helplineinc
17 Replies

2. Fedora

Hosting issue regarding subdirectories and fwd Slashes

I admin two co-located servers. I built an app that creates subdirectories for users ie www.site.com/username. one server that works just fine when you hit that url, it sees the index within and does as it should. I moved the app to my other server running FEDORA 1 i686 standard, cPanel... (3 Replies)
Discussion started by: iecowboy
3 Replies

3. UNIX for Dummies Questions & Answers

| help | unix | grep (GNU grep) 2.5.1 | advanced regex syntax

Hello, I'm working on unix with grep (GNU grep) 2.5.1. I'm going through some of the newer regex syntax using Regular Expression Reference - Advanced Syntax a guide. ls -aLl /bin | grep "\(x\)" Which works, just highlights 'x' where ever, when ever. I'm trying to to get (?:) to work but... (4 Replies)
Discussion started by: MykC
4 Replies

4. Shell Programming and Scripting

ignore fields to check in grep

Hi, I have a pipe delimited file. I am checking for junk characters ( non printable characters and unicode values). I am using the following code grep '' file.txt But i want to ignore the name fields. For example field2 is firstname so i want to ignore if the junk characters occur... (4 Replies)
Discussion started by: ashwin3086
4 Replies

5. Shell Programming and Scripting

Grep but ignore first column

Hi, I need to perform a grep from a file, but ignore any results from the first column. For simplicity I have changed the actual data, but for arguments sake, I have a file that reads: MONACO Monaco ASMonaco MANUTD ManUtd ManchesterUnited NEWCAS NewcastleUnited NAC000 NAC ... (5 Replies)
Discussion started by: danhodges99
5 Replies

6. Shell Programming and Scripting

regex - start with a word but ignore that word

Hi Guys. I guess I have a very basic query but stuck with it :( I have a file in which I want to extract particular content. The content is between standard format like : Verify stats A=0 B=12 C=34 TEST Failed Now I want to extract data between "Verify stats" & "TEST Failed" but do... (6 Replies)
Discussion started by: ratneshnagori
6 Replies

7. Shell Programming and Scripting

Ignore escape sequence in sed

Friends, In the file i am having more then 100 lines like, File1 had the values like this: #Example East.server_01=EAST.SERVER_01 East.server_01=EAST.SERVER_01 West.server_01=WEST.SERVER_01 File2 had the values like this: #Example EAST.SERVER_01=http://yahoo.com... (3 Replies)
Discussion started by: jothi basu
3 Replies

8. Shell Programming and Scripting

Need sequence no in the grep output

Hi, How to achieve the displaying of sequence no while doing grep for an output. Ex., need the output like below with the serial no, but not the available line number in the file S.No Array Lun 1 AABC 7080 2 AABC 7081 3 AADD 8070 4 AADD 8071 5 ... (3 Replies)
Discussion started by: ksgnathan
3 Replies

9. Shell Programming and Scripting

Grep command to ignore line starting with hyphen

Hi, I want to read a file line by line and exclude the lines that are beginning with special characters. The below code is working fine except when the line starts with hyphen (-) in the file. for TEST in `cat $FILE | grep -E -v '#|/+' | awk '{FS=":"}NF > 0{print $1}'` do . . done How... (4 Replies)
Discussion started by: Srinraj Rao
4 Replies

10. Shell Programming and Scripting

Grep and ignore list from file

cat /tmp/i.txt '(ORA-28001|ORA-00100|ORA-28001|ORA-20026|ORA-20025|ORA-02291|ORA-01458|ORA-01017|ORA-1017|ORA-28000|ORA-06512|ORA-06512|Domestic Phone|ENCRYPTION)' grep -ia 'ORA-\{5\}:' Rep* |grep -iavE `cat /tmp/i.txt` grep: Unmatched ( or \( Please tell me why am i getting that (6 Replies)
Discussion started by: jhonnyrip
6 Replies
All times are GMT -4. The time now is 09:01 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy