Get values from 2 files - Complex "for loop and if" awk problem


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Get values from 2 files - Complex "for loop and if" awk problem
# 1  
Old 10-21-2011
Get values from 2 files - Complex "for loop and if" awk problem

Hi everyone,

I've been thinking and trying/changing all day long the below code, maybe some awk expert could help me to fix the for loop I've thought,
I think I'm very close to the correct output.

file1
is:
Code:
<boxes content="Grapes and Apples">
    <box No.="Box MT. 53">
      <quantity f="4">Grapes</quantity>
      <quantity f="8">Apples</quantity>
    </box>
    <box No.="Box MJ 62">
      <quantity f="7">Grapes</quantity>
      <quantity f="12">Apples</quantity>
    </box>
  </boxes>

file2 is:
Code:
<some text...>
<some text...>        
        <f><v>Begin</v></f>
        <f><v>Prod No</v></f>
        <f><v>Serial</v></f>
        <f><v>Grapes and Apples</v></f>
        <f><v>Begin 1</v></f>
        <f><v>Box MT. 53</v></f>
        <f><v>XMT. 5563</v></f>
        <f><v>Begin 2</v></f>
        <f><v>Box MJ 62</v></f>
        <f><v>JJKD. 772</v></f>
        <f><v>Apples</v></f>
        <f><v>Grapes</v></f>
</abc>

My code so far is:
Code:
#Arr1  #Array to store info of 1rst block, Don't pay attention to this array.
#Arr3 #Array for 1rst line for blocks 2 y 3 (stores unique strings in blue in file 1, 
          #Apples and Grapes).  Apples and grapes appear in alphabetical order in file2
#Arr5 #Array for values of each block taken from file 1 in red.

awk 'BEGIN{  B = 66 }
    FNR==NR{
        if ($0 ~ "box No.=")
            {Arr1[FNR]=gensub(/^[^"]+"|".+$/,"","g");asorti(Arr1,Arr2)}
        else if ( $0 ~ "quantity f=" )
            {Arr3[gensub(/.+">|<.+$/,"","g")];asorti(Arr3,Arr4) 
             Arr5[FNR]=gensub(/^[^"]+"|".+$/,"","g");asorti(Arr5,Arr6) 
             }
        next
    }
{
###############  for loop to generate blocks #####################
    for ( j=2;j<=length(Arr3)+1;j++ ) {  #Loop to generate block 2 and 3, because of that j begins in 2.
        if($0 ~ ">"Arr4[j-1]"<") {
            {printf("<begin \"%d\" >\n\t<b ln=\"A%d\" t=\"s\"><v>%d</v></b>\n", j,j,FNR);} #print 1rst line of each block
            for ( k=(j-1);k<=(j-2)+length(Arr5);k=k+length(Arr1) ) { #Loop to print rest of the values related to each fruit
                if ( k < length(Arr5)/length(Arr1) ) {
                    printf("\t<b ln=\"%c%d\"><v>%d</v></b>\n", B, j, Arr6[k]); #Printing the value
                    B++
                }    
                else {                    
                    printf("\t<b ln=\"%c%d\"><v>%d</v></b>\n</begin>", B, j, Arr6[k]); #Printing last line of each block
                    B=66  # B=66 because is the ASCII in decimal of letter B.
                }            
            }
        }
    }
}' file1 file2

The for loop intends to generate the blocks 2, 3...N of the output (in the sample only blocks 2 y 3). The blocks 2 and 3 represents info from
uniques fruits in file1 and their respective values. Block 2 is for Apples and contains its values from file1 (8 and 12); Block 3 is for
Grapes and contains its values from file1 (4 and 7).
- In alphabetical order, Apples goes first than Grapes, then, block 2 is for Apples and block 3 for Grapes.
- For each fruit block, the fruit values must appear in same order that appear in file1, e.g for Apples 8 and 12 and not 12 and 8.

I'm getting this output:
Code:
<begin "2" >
        <b ln="A2" t="s"><v>13</v></b>
        <b ln="B2"><v>3</v></b>
        <b ln="C2"><v>7</v></b>
</begin><begin "3" >
        <b ln="A3" t="s"><v>14</v></b>
        <b ln="B3"><v>4</v></b>
</begin>        <b ln="B3"><v>8</v></b>
</begin>

and the correct output should be:
Code:
<begin ln="2" >
    <c ln="A2" t="s"><v>13</v></b>
    <c ln="B2"><v>8</v></b>
    <c ln="C2"><v>12</v></b>
</begin>
<begin ln="3" >
    <c ln="A3" t="s"><v>14</v></b>
    <c ln="B3"><v>4</v></b>
    <c ln="C3"><v>7</v></b>
</begin>

The first line for each block is line number from file2, e.g. Apples appears in line 13 in file2 and Grapes appear in line 14.

Maybe someone could fix my for loop, I'm stuck in the part to print in correct order the values related to each fruit block.

PS: I have another for loop that generates the first block (not shown), so it will be great if the solution could be added to the first loop.

Many thanks in advance.

Last edited by Ophiuchus; 10-21-2011 at 06:07 AM..
# 2  
Old 10-21-2011
Can you please explain once more where file2 fits in here?

--ahamed
# 3  
Old 10-21-2011
Quote:
Originally Posted by ahamed101
Can you please explain once more where file2 fits in here?

--ahamed
Hi ahamed, thanks for reply.

Well, the file2 is needed to know the line number of Apples and Grapes within file2 and put them in first line of each block.

To understand better within file2 check the line number of Apples and you'll see that is 13
and for Grapes is 14. Well, now see that in the output 13 is in blue in first line of block 2 and 14 in first line of block 3.

Thanks for any help.
# 4  
Old 10-21-2011
See if this works for you...

Code:
awk 'NR==FNR {
  gsub(/"|>|</," ");
  if($3 ~ /^[0-9]/){ a[$4]=a[$4]" "$3 }
  next
}
{
  gsub(/"|>|</," ");
  x++;  if(NF==5 && $3 in a){ b[$3,1]=x; }
}
END{ j=2;
  for(i in a) {
    al=65; split(a[i],arr," ")
    print "<begin ln=\""j"\" >"
    printf("\t<c ln=\"%c%d\" t=\"s\"><v>"b[i,1]"</v></b>\n",al++,j)
    for(v in arr) {
      printf("\t<c ln=\"%c%d\"><v>"arr[v]"</v></b>\n",al++,j)
    } print "</begin>";j++
  }
}' file1 file2

Sorry didn't have the patience to go thru your code... Smilie

--ahamed
# 5  
Old 10-21-2011
Hi ahamed,

Thanks for your help. It's very appreciated.

The code is almost work, the only issue is that is printing the blocks in different order. The solution would be to sort array "a"
alphabetically in the first part of awk code (when NR=FNR). I've been trying to use the same logic to sort it using asorti(), but
doesn't work (the line number in first line of each block is not printed if I include asorti in the code).

To test that, I've modified I little bit file1 and file2 as below:
*(If you test using new file1 and new file2 you'll see that 14 appears in 1rst block and 13 in 2nd block, it should be in ascending order)

file1:
Code:
<boxes content="Grapes and Apples">
    <box No.="Box MT. 53">
      <quantity f="4">Grapes A</quantity>
      <quantity f="8">Apples B</quantity>
    </box>
    <box No.="Box MJ 62">
      <quantity f="7">Grapes A</quantity>
      <quantity f="12">Apples B</quantity>
    </box>
  </boxes>

file2:
Code:
<some text...>
<some text...>        
        <f><v>Begin</v></f>
        <f><v>Prod No</v></f>
        <f><v>Serial</v></f>
        <f><v>Grapes and Apples</v></f>
        <f><v>Begin 1</v></f>
        <f><v>Box MT. 53</v></f>
        <f><v>XMT. 5563</v></f>
        <f><v>Begin 2</v></f>
        <f><v>Box MJ 62</v></f>
        <f><v>JJKD. 772</v></f>
        <f><v>Apples B</v></f>
        <f><v>Grapes A</v></f>
</abc>

The code I have so far is:
Code:
# I've added or modified a little bit your code (in blue) in order that be able to handle strings with spaces. 
#(E.g. instead of "Apples" and "Grapes" the string could be "Apples XXX YYY" or "Grapes abc" etc)

awk 'NR==FNR {
  $0=gensub(/(.+=")([0-9]+)(">)(.+)(<\/.+)/, "\\2 \\4", "g");
  if($1 ~ /^[0-9]/){ t=$1; gsub(/^[0-9]+[ ]+/,""); a[$0]=a[$0]" "t }
  next
}
{
  gsub(/.+<.>|<\/.+$/,"")
  x++;  if($0 in a){ b[$0,1]=x; }
}
END{ j=2;
  for(i in a) {
    al=65; split(a[i],arr," ")
    print "<begin ln=\""j"\" >"
    printf("\t<c ln=\"%c%d\" t=\"s\"><v>"b[i,1]"</v></b>\n",al++,j)
    for(v in arr) {
      printf("\t<c ln=\"%c%d\"><v>"arr[v]"</v></b>\n",al++,j)
    } print "</begin>";j++
  }
}' file1 file2

Many thanks for your help so far.

Grettings
# 6  
Old 10-22-2011
Try this...

Code:
awk 'NR==FNR {
  $0=gensub(/(.+=")([0-9]+)(">)(.+)(<\/.+)/, "\\2 \\4", "g");
  if($1 ~ /^[0-9]/){ t=$1; gsub(/^[0-9]+[ ]+/,""); a[$0]=a[$0]" "t; }
  next
}
{
  gsub(/.+<.>|<\/.+$/,"")
  x++;  if($0 in a){ b[$0,1]=x; }
}
END{ j=2;
  asorti(a,d)
  for(i in d) {
    al=65; split(a[d[i]],arr," ")
    print "<begin ln=\""j"\" >"
    printf("\t<c ln=\"%c%d\" t=\"s\"><v>"b[d[i],1]"</v></b>\n",al++,j)
    for(v in arr) {
      printf("\t<c ln=\"%c%d\"><v>"arr[v]"</v></b>\n",al++,j)
    } print "</begin>";j++
  }
}' file1 file2

--ahamed
# 7  
Old 10-22-2011
Great ahamed! it works.

I thought to sort it before and for some reason it wasnt working.

Now that parts goes just fine.

Very appreciated all your help.

Best regards.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Awk: Performing "for" loop within text block with two files

I am hoping to pull multiple strings from one file and use them to search within a block of text within another file. File 1PS001,001 HLK PS002,004 MWQ PS004,002 RXM PS004,006 DBX PS004,006 SBR PS005,007 ML PS005,009 DBR PS005,011 MR PS005,012 SBR PS006,003 RXM PS006,003 >SJ PS006,010... (11 Replies)
Discussion started by: jvoot
11 Replies

2. Shell Programming and Scripting

Using "for" loop within "awk"

Hi Team. I am trying to execute a simple for loop within an awk but its giving a different result. Below is the main code: awk '{for(i=1;i<=6;i++) print $i}'The result should be 1 2 3 4 5 6 but its not giving this result. Can someone please help? (3 Replies)
Discussion started by: chatwithsaurav
3 Replies

3. Shell Programming and Scripting

Working with CSV files values enclosed with ""

I have a CSV file as shown below "1","SANTHA","KUMAR","SAM,MILLER","DEVELOPER","81,INDIA" "2","KAPIL","DHAMI","ECO SPORT","DEVELOPER","82,INDIA" File is comma delimited.All the field values are enclosed by double quotes. But while using awk or cut, it interprets the comma which is present in... (6 Replies)
Discussion started by: santhansk
6 Replies

4. Shell Programming and Scripting

how to use "cut" or "awk" or "sed" to remove a string

logs: "/home/abc/public_html/index.php" "/home/abc/public_html/index.php" "/home/xyz/public_html/index.php" "/home/xyz/public_html/index.php" "/home/xyz/public_html/index.php" how to use "cut" or "awk" or "sed" to get the following result: abc abc xyz xyz xyz (8 Replies)
Discussion started by: timmywong
8 Replies

5. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

6. Shell Programming and Scripting

"for" and "while" loop problem in "sh"

Hi, I have a problem with "for" and "while" loop in "sh". I have: #!/bin/sh for i in $(seq 1 500000); do echo $i doneand it's working in sh on my ubuntu, but when I try to run this on unix(I have access to my university's unix) it crash: syntax error at line 2: `$' unexpected...... (8 Replies)
Discussion started by: Physix
8 Replies

7. Shell Programming and Scripting

cat $como_file | awk /^~/'{print $1","$2","$3","$4}' | sed -e 's/~//g'

hi All, cat file_name | awk /^~/'{print $1","$2","$3","$4}' | sed -e 's/~//g' Can this be done by using sed or awk alone (4 Replies)
Discussion started by: harshakusam
4 Replies

8. UNIX for Dummies Questions & Answers

Explain the line "mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`"

Hi Friends, Can any of you explain me about the below line of code? mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'` Im not able to understand, what exactly it is doing :confused: Any help would be useful for me. Lokesha (4 Replies)
Discussion started by: Lokesha
4 Replies

9. Shell Programming and Scripting

Problem with "set" and "awk"

Hi, i'm programming on /bin/csh and i need to get the number extracted by this: set ppl_kn = $(awk '{ field = $6 } ; END{ print field }' < ppl_LM_kn.ppl ) and the output is: "Illegal variable name." Please anyone can help me what's wrong? Thanks in advance (2 Replies)
Discussion started by: tmxps
2 Replies
Login or Register to Ask a Question