Trying to AWK beyond my level


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Trying to AWK beyond my level
# 1  
Old 02-09-2010
Trying to AWK beyond my level

Hey everyone,

So I have a task that I want to complete with awk (+ find, or something similar), but can't quite achieve it by myself...

I have 60 GB of files that I want to modify. They each consist of 2 columns of numbers, with up to 50,000 lines in a file.
e.g.
Code:
   1.607743       5.928237    
   1.578281       5.941602    
   1.480728       5.867404    
   1.592771       5.980338   
    ....               ....

The files are in subdirectories one layer down from the parent, and look like e.g.
Code:
322_b3.447/resu_1_1_0
322_b3.447/resu_2_1_0

I want to reduce the file sizes by averaging the numbers in the columns in groups of 10, and creating new files with names e.g. smaller/322_b3.447/resu_1_1_0

I've tried to make a do loop with general structure:
Code:
for file in $(find -type f -name 'resu_*')
do 
   some command to specify "$newfile"
   awk '
   {s1+=$1;s2+=$2}
   !(NR%10){
          {
       print (s1/10,s2/10) >  "$newfile"
     }
     s1=0; s2=0
   }' $file
done

However, I have trouble specifying the name of the new file because 'find' returns for $file names like ./resu_2_1_0, and I don't know how to remove the "./" and include a different directory there.

Also, my awk loop only works when the number of lines per file is exactly a multiple of 10. Otherwise the output for the remaining, say, 7 lines, will still be divided by 10 and hence be too small.

I would be extremely grateful if anyone can help me with this - I'm out of my depth!

Last edited by zaxxon; 02-09-2010 at 11:58 AM.. Reason: use code tags please, ty
# 2  
Old 02-09-2010
You could try something like this:
Code:
find -type f -name 'resu_*' |
  while IFS= read -r; do
    _newd=smaller/${REPLY%/*}
    _newn=smaller/$REPLY
    [ -d "$_newd" ] || mkdir -p "$_newd"  
    awk > "$_newn" 'END { 
      if (f) print f/10, s/10 
      }
    !(NR%10) { 
      print f/10, s/10  
      f = s = 0
      }
    { f += $1; s += $2 }' "$REPLY"
  done

You should add exception handling.
You could also try to improve the script using xargs in order to reduce the number of calls made to awk (in this case the awk code should be modified also).
On Solaris you should use gawk, nawk or /usr/xpg4/bin/awk.
# 3  
Old 02-09-2010
Quote:
Originally Posted by symphonic1985
Hey everyone,

So I have a task that I want to complete with awk (+ find, or something similar), but can't quite achieve it by myself...

I have 60 GB of files that I want to modify. They each consist of 2 columns of numbers, with up to 50,000 lines in a file.
e.g.
Code:
   1.607743       5.928237    
   1.578281       5.941602    
   1.480728       5.867404    
   1.592771       5.980338   
    ....               ....

The files are in subdirectories one layer down from the parent, and look like e.g.
Code:
322_b3.447/resu_1_1_0
322_b3.447/resu_2_1_0

I want to reduce the file sizes by averaging the numbers in the columns in groups of 10, and creating new files with names e.g. smaller/322_b3.447/resu_1_1_0

I've tried to make a do loop with general structure:
Code:
for file in $(find -type f -name 'resu_*')


Why are you using find?

Code:
for file  in */resu_*

Quote:
Code:
do 
   some command to specify "$newfile"
   awk '
   {s1+=$1;s2+=$2}
   !(NR%10){
          {
       print (s1/10,s2/10) >  "$newfile"
     }
     s1=0; s2=0
   }' $file
done

However, I have trouble specifying the name of the new file because 'find' returns for $file names like ./resu_2_1_0, and I don't know how to remove the "./" and include a different directory there.

Code:
do 
   newfile=${file#*/}    ## this removes the directory; adjust to taste
   awk '
   {s1+=$1;s2+=$2}
   !(NR%10){
          {
       print (s1/10,s2/10)
     }
     s1=0; s2=0
   }' "$file"  >  "$newfile"
done

Quote:
Also, my awk loop only works when the number of lines per file is exactly a multiple of 10. Otherwise the output for the remaining, say, 7 lines, will still be divided by 10 and hence be too small.

It will not print anything for a trailing fraction of 10 lines. You need to add an END block to calculate the last amount, if any.
# 4  
Old 02-10-2010
Quote:
Originally Posted by cfajohnson

Why are you using find?

Code:
for file  in */resu_*

[...]
Good point!
# 5  
Old 02-10-2010
Thanks radoulov and cfajohnson. I've been playing around with your code and I think that I basically have what I need. The first bit of code from radoulov gave me the file structure that I need, but seems to take the 1st 9 records and divide them by 10, then goes in blocks of 10 thereafter, leaving 1 remaining record (for my test file with 100 lines).

I think that the following combination works. Are there any landmines in this that I should watch out for?


Code:
for file  in */resu_*
do 
    _newd=smaller/${file%/*}
    _newn=smaller/$file
    [ -d "$_newd" ] || mkdir -p "$_newd" 

   awk '
   {s1+=$1;s2+=$2}
   !(NR%10){
          {
       print (s1/10,s2/10)
     }
     s1=0; s2=0
   }' "$file"  >  "$_newn"
done

# 6  
Old 02-10-2010
Quote:
Originally Posted by symphonic1985
Thanks radoulov and cfajohnson. I've been playing around with your code and I think that I basically have what I need. The first bit of code from radoulov gave me the file structure that I need, but seems to take the 1st 9 records and divide them by 10, then goes in blocks of 10 thereafter, leaving 1 remaining record (for my test file with 100 lines).
[...]
This could happen if you have some kind of header (or simply a newline) before the first record.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Red Hat

SSL certificate generation on OS level or application level

We have a RHEL 5.8 server at the production level and we have a Java application on this server. I know of the SSL certificate generation at the OS (RHEL) level but it is implemented on the Java application by our development team using the Java keytool. My doubt is that is the SSL generation can... (3 Replies)
Discussion started by: RHCE
3 Replies

2. Solaris

Difference between run level & init level

what are the major Difference Between run level & init level (2 Replies)
Discussion started by: rajaramrnb
2 Replies

3. UNIX for Dummies Questions & Answers

OS level backup

I am not a DBA or an unix admin (I am a developer) and I have a question I need clarification for. Recently one of our oracle ebusiness suite server (apps tier, red hat 4) crashed and the unix admin had to rebuild the server. We had backups for the file system under applmgr and oracle. And... (3 Replies)
Discussion started by: bodhi2000
3 Replies

4. AIX

maintenance level

Hello I have a question. I have several box with $ oslevel -s 5300-06-01-0000 My questions are: 1.-How long I can keept my actual technology level (in time) before I updated to the next technology level? My other question is If I have to update to the next technology level. ... (2 Replies)
Discussion started by: lo-lp-kl
2 Replies

5. UNIX for Advanced & Expert Users

level 0 dump

need help to create a level 0 dump of the /usr filesystem on the first tape device using compression and then start a level 3 dump of /var after the level 0 completes, is it dump -0ucf /dev/rmt0 /usr (1 Reply)
Discussion started by: jo calamine
1 Replies

6. AIX

ML level went backwards?

Hi all. I've been put in charge of updating one of our AIX 5.2 servers to ML7. (perhaps not wise since I'm an absolute n00b, but hey, it's good experience to fly by the seat of one's pants). So: a) I typed "oslevel -r" and got back "5200-04" b) I went to IBM's Fix Central and downloaded... (1 Reply)
Discussion started by: pschlesinger
1 Replies

7. Solaris

patch level

Hi, how do you check that the latest service packs/patches are installed on the server, When i look at the OS Modules file, all i see is these numbers like 117176-02 etc, what is currently the latest patch level for sunOS 5.9? thnaks (1 Reply)
Discussion started by: narik007
1 Replies

8. What is on Your Mind?

What level are you?

Just for fun (I'm somewhere between novice and user): (9 Replies)
Discussion started by: RTM
9 Replies

9. UNIX for Dummies Questions & Answers

need to get to root level

:confused: In the terminal, it shows i am in the I need to get back into root, HOW DO I DO IT!?!?!?! Anyone who can help please do ASAP thanks Confused mac user trying to learn UNIX jimbo:D (2 Replies)
Discussion started by: jhaven007
2 Replies
Login or Register to Ask a Question