Assign zero to strings that don't appear in block, store result in AWK array


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Assign zero to strings that don't appear in block, store result in AWK array
# 1  
Old 10-22-2011
Assign zero to strings that don't appear in block, store result in AWK array

Hi to all,

I have this input:
Code:
<group>                    
<x "2">Group D</x>
<x "3">Group B</x>
<x "1">Group A</x>
</group>                    
<group>                    
<x "1">Group E</x>
<x "0">Group B</x>
<x "1">Group C</x>
</group>                    
<group>                    
<x "3">Group C</x>
<x "2">Group B</x>
<x "7">Group A</x>
</group>

And I would like this output stored in an AWK array.
Code:
2|Group D
3|Group B
1|Group A
0|Group C
0|Group E
1|Group E
0|Group B
1|Group C
0|Group A
0|Group B
3|Group C
2|Group B
7|Group A
0|Group E
0|Group D

the unique Groups are
Code:
Group A
Group B
Group C
Group D
Group E

As you can see, some Groups could be in all blocks, but sometime one or more Groups appear only in some blocks.

For those Groups that don't appear in a specific block I need to generate an output that assign them zero value for that block as
shown for the groups in red.

I am able to get the output in the format "number|Group X" with the script below, but I don't know how to
add the groups that don't appear in a specific block and assign them zero value.
Code:
awk '/<x "/{A[1 + c++]=gensub(/(.+")([0-9]+)(">)(.+)(<\/.+)/, "\\2|\\4", "g")}
     END{for (i=1;i<=length(A);i++) print A[i]}' groups

I really need it in awk because I need that array to include it in a main AWK code.

Many thanks for your help in advance.
# 2  
Old 10-23-2011
Try this...
Code:
awk '/<x /{
  A=gensub(/.+"([0-9]+)">(.+)<.*/, "\\1|\\2", "g")
  split(A, arr, "|");
  a[++j]=arr[1]"|"arr[2]; u[arr[2]] }
/<\/g/{a[++j]=-1}

END{
  for(i=1;i<=j;i++) {
    if(a[i] == -1) {
      for(k in u) {
        k in t?v="":v=k
        if(v){print "0|"v}
      } delete t; continue 
    }
    print a[i]
    split(a[i],arr,"|"); t[arr[2]]  
  }
}' input_file

--ahamed
# 3  
Old 10-23-2011
I'd do it this way:

Code:
awk  '
    BEGIN {
        soup = "Group A,Group B,Group C,Group D,Group E"
        nlist = split( soup, list, "," );
    }

    /^<x / {
        n = substr( $2, 2 ) + 0;
        gsub( "<[^>]*>", "" );  # assumes <x ...>stuff</x> is the ONLY tag on the line!
        group[$0] = n;
        next;
    }

    /^<\/group>/ {   # print out last collection including zeros
        for( i = 1; i <= nlist; i++ )
            printf( "%d|%s\n", group[list[i]], list[i] );
        delete group;   # clear for next go round
        next;
    }
' input-file

It makes some BIG assumptions; if your input file is more complex than you've indicated it might have issues. Specifically, if there is more than one 'tag' on the <x ... </x> line, it will break. Also, each is printed in the order that they are defined in the 'soup' and not in the order presented in the input file.
# 4  
Old 10-23-2011
Hi ahamed and agama,

Thanks for your help, both work just great!.

But how can I include it in the first part of the main awk code I already have?

I would like to have the array with the output of your codes indexed numerically in ascending order from 1 to last element.

My code looks like this:
Code:
awk 'NR==FNR{
        if($0 ~ /XYZ/){Var1++} #Counting occurences of "XYZ" and storing in Var1
        if($0 ~ /<x "/){           
            A[1 + c++]=gensub(/(.+")([0-9]+)(">)(.+)(<\/.+)/, "\\2|\\4", "g")  # I would like your output in this array
            
            B[gensub(/pattern/,"how","g")] #Storing desired data in array B
            C[gensub(/pattern/,"how","g")] # #Storing desired data in array C
            D[1 + c++]=gensub(/pattern/, "\\2|\\4", "g") # #Storing desired data in array D
        }
next}
{
        some other code
} 
END{
    for ( i=1;i<=N;i++ ) { #Loop for printing values of 4 arrays after some manipulations
        some code to manipulate 4 arrays created when NR=FNR
    }
}' file1 file2

As you can see in the code, the data stored in array A (in red) is given by the "gensub()" function and the array is indexed
from 1 to last element.

Then, I would like to have inside the array A the ouptut of your codes insted of output of gensub(). Is possible only to generate
data of A in that way to use it later in my code?

Something like:
Code:
awk 'NR==FNR{
        if($0 ~ /XYZ/){Var1++}
        if($0 ~ /<x "/){
            A[1 + c++]="new output" # (new output = output generated by your codes)
            
            B[gensub(/pattern/,"how","g")] ...
            C[gensub(/pattern/,"how","g")] ...
            D[1 + c++]=gensub(/pattern/, "\\2|\\4", "g") ..
        }
.
.
.

Thanks for help so far.

Last edited by Ophiuchus; 10-23-2011 at 03:15 AM..
# 5  
Old 10-23-2011
You could try something like this:
Code:
awk  '
    BEGIN {
        soup = "Group A,Group B,Group C,Group D,Group E"
        nlist = split( soup, list, "," );
    }

    NR != FNR {
        # some other processing for file2
        next;
    }

    # ----------- blocks for processing file 1 ------------------------
    /XYZ/ { Var1++; }   # count lines with XYZ

    /^<x / {
        str = gensub(/(.+")([0-9]+)(">)(.+)(<\/.+)/, "\\2|\\4", "g")
        split( str, a, "|" );
        agroup[a[2]] = a[1];

        # your original code
        B[gensub(/pattern/,"how","g")]      #Storing desired data in array B
        C[gensub(/pattern/,"how","g")]      # #Storing desired data in array C

        # small change to match D with A
        dgroup[a[2]] = gensub(/pattern/, "\\2|\\4", "g") # #Storing desired data in array D

        next;
    }

    /^<\/group>/ {
        for( i = 1; i <= nlist; i++ )       # end of group, it is now safe to fill in D and A
        {
            A[++aidx] = sprintf( "%d|%s", agroup[list[i]], list[i] );
            D[aidx] = dgroup[list[i]];
        }
        delete agroup;
        delete dgroup;
        next;
    }

    END {
        for( i = 1; i <= length( A ); i++ )         # my testing to ensure they align
            printf( "(%s) (%s)\n", A[i], D[i] );
    }
' file1 file2

Notice the change to process FNR != NR so that later rules for the first file can be separate blocks.

There was one bug in the sample you posted. You incremented c twice in the same block of code. The result would have been that the array A would have had values stored with odd indexes, and D would have values stored starting at 2 with even indexes. The code above doesn't have this issue, and ensures that the values in array D match the values in array A -- they aren't in the order seen in the input, but the order that matches the list in the BEGIN block.

Hope this gets you closer.
# 6  
Old 10-23-2011
Hi agama,

Thanks for your reply and help. Yes, I saw now the bug about double incrementing c. Thanks Smilie.

Regarding your code I've been trying to adapt it to my main awk code, but the main problem I have is that the "soup"
array is pre-defined at the beginning and since the values of Group A, Group B,.. Group E, etc are taken from the same
file groups, the array"soup" should be generated first.

I've been trying with a modified version of your code as below, removing the "BEGIN{}" statement and
definig array list[] as line highlited in red:

*(in blue what I added or modified)

Code:
awk  '
    /^<x / {
        list[gensub(/(.+">)(.+)(<\/.+$)/,"\\2","g")];asorti(list,A) # Generating list[] array that will contain unique group strings
         n=gensub(/.+ "|">.+/,"","g") # extracting only values (numbers)
        group[gensub(/.+">|<\/.+/,"","g")]=n;
        next;
    }
    /^<\/group>/ {   # print out last collection including zeros
        for( i = 1; i <= length(list); i++ )
            A[++w]=sprintf("%d|%s", group[A[i]], A[i] );
        delete group;   # clear for next go round
        next;
    } END{for (i=1;i<=length(A);i++) print A[i]}' groups

I think I don't get correct output because the code needs a predefined "soup" arraySmilie.

Many thanks for help so far.
# 7  
Old 10-23-2011
Having a predefined list is key to knowing what is missing, and yes, that is why you're getting odd output.

Using a technique similar to ahamed101's suggestion might help. Since you're not processing the contents of A and D until the end, you could save everything using doubly indexed agroup and dgroup arrays while building your list. The problem with this, and why I didn't suggest it, is that if there is a group X that is missing from all blocks of the input file, it will not be accounted for in the output.

I'll think about it some more.

---------- Post updated at 19:59 ---------- Previous update was at 19:35 ----------

Ok, this collects the various group names as it reads through the file and builds the A and D arrays at the end. If a group name is missing completely it will not be accounted for:

Code:
awk  '
    NR != FNR {
        # some other processing for file2
        next;
    }

    # ----------- blocks for processing file 1 ------------------------
    /^<x / {
        str = gensub(/(.+")([0-9]+)(">)(.+)(<\/.+)/, "\\2|\\4", "g")
        split( str, a, "|" );

        if( !seen[a[2]]++ )              # new group name, add it to the list
            list[++nlist] = a[2];

        agroup[group+0,a[2]] = a[1];   # changed to track across whole file

        # your original code
        B[gensub(/pattern/,"how","g")]      #Storing desired data in array B
        C[gensub(/pattern/,"how","g")]      # #Storing desired data in array C

        # small change to match D with A
        dgroup[group+0,a[2]] = gensub(/pattern/, "\\2|\\4", "g") # changed to track across whole file

        next;
    }

    /^<\/group>/ {
        group++;
        next;
    }

    END {
        asort( list );
        for( g = 0; g < group; g++ )            # build A and D with groups seen
        {
            for( i = 1; i <= nlist; i++ )       
            {
                A[++aidx] = sprintf( "%d|%s", agroup[g,list[i]], list[i] );
                D[aidx] = dgroup[g,list[i]];
            }
        }

        # whatever end processing on A and D can be done here
        for( i = 1; i <= length( A ); i++ )         # my testing to ensure they align
            printf( "(%s) (%s)\n", A[i], D[i] );
    }
' file1 file2

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Assign awk gsub result to a variable

Hello, I have searched but failed to find what exactly im looking for, I need to eliminate first "." in a output so i can use something like the following echo "./abc/20141127" | nawk '{gsub("^.","");print}' what i want is to use gsub result later on, how could i achieve it? Let say... (4 Replies)
Discussion started by: EAGL€
4 Replies

2. Shell Programming and Scripting

How to Assign an shell array to awk array?

Hello All, Can you please help me with the below. #!/bin/bash ARR="No Differences In Stage Between HASH_TOTALS & HASH_TOTALS_COMP For UNINUM:0722075 PROVIDER:5 EXTRACT_DT:30-SEP-12 VER_NUM:1" ARR="No Differences In Stage Between HASH_TOTALS & HASH_TOTALS_COMP For UNINUM:0722075 PROVIDER:5... (14 Replies)
Discussion started by: Ariean
14 Replies

3. Shell Programming and Scripting

Store value in array with awk

Hi everybody I wanna store some values that r in a .txt file in some arrays for example I have: 32782 28 32783 02 32784 01 32785 29 32786 25 32787 25 32788 00 32789 25 32790 02 32791 29 32792 23 32793 01 32794 28 and I need to save the first... (4 Replies)
Discussion started by: Behrouzx77
4 Replies

4. Shell Programming and Scripting

create an array which can store the strings from the user input in shell script

I want to create an array which can store the strings from the user input in shell script . example :- I want to store the 5 fruits name in a single array which the user provides . (1 Reply)
Discussion started by: Pkast
1 Replies

5. Shell Programming and Scripting

Can -v option in awk be used to store an array of variables?

I want to pass an array of variables to be inserted by awk in the 2nd column of a file. Empl No. Employee Age 1000000 22 1100000 24 1200000 26 Now, I want to pass an array having three different ages which need to replace the... (7 Replies)
Discussion started by: Nishi_Licious
7 Replies

6. Shell Programming and Scripting

awk assign output of array to specific field-number

With this script i want to print the output to a specific field-number . Can anybody help? awk 'NR=FNR{split(FILENAME,fn,"_");nr=$2;f = $1} END{for (i=1;i<=f;i++) print i,$fn=nr}' input_5.csv input_6.csvinput_5.csv 4 135 5 185 6 85 11 30input_6.csv 1 90 3 58 4 135 7 60 8 55 10... (1 Reply)
Discussion started by: sdf
1 Replies

7. Shell Programming and Scripting

assign awk command result to a variable

#!/bin/sh # ## MYSTRING = `awk '/myApp.app/' /Users/$USER/Library/Preferences/loginwindow.plist` if then echo String not found defaults write /Users/$USER/Library/Preferences/loginwindow AutoLaunchedApplicationDictionary -dict-add -string Hide -bool YES -string Path -string... (9 Replies)
Discussion started by: dedmakar
9 Replies

8. Shell Programming and Scripting

Prase a file and store and result to an array

Dear all, I have a file having the following formats: ThreadFail=Web1=1234 ThreadFail=Web2=2345 ThreadFail=Web3=12 ConnectionFail=DB1=11 ConnectionFail=DB2=22 The number of lines will be different from every time . How can I parse the file and store the result to an a array inside... (6 Replies)
Discussion started by: youareapkman
6 Replies

9. Shell Programming and Scripting

assign awk array with printf

I am trying to assign a awk array for further processing later in the script. I can't seem to figure it out. If someone could look at this and help me, I would very much appreciate it. Thanks in Advance. for ( x = 1 ; x <= Var ; x++ ) { if ( x in varr ) { ... (2 Replies)
Discussion started by: timj123
2 Replies

10. Shell Programming and Scripting

How to store query multiple result in shell script variable(Array)

:) Suppose,I have one table A. Table A have one column. Table A have 10 rows. I want this 10 rows store into shell script variable. like #!/bin/ksh v_shell_var=Hi here in call oracle , through loop How can I store table A's 10 rows into v_shell_var (Shell Script Array). Regards, Div (4 Replies)
Discussion started by: div_Neev
4 Replies
Login or Register to Ask a Question