Sorting operations on several files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sorting operations on several files
# 1  
Old 06-12-2013
Sorting operations on several files

I have over 250 files (named grad.1000, grad.1001, grad.1002) - see attachment - that have this format:

Code:
# 0.004 0.692758
# 6.23025467936 6.23025467936 6.23025467936 44.9620453206 44.9620453206 44.9620453206
# 0.8 1.19989 0.99914606306 1.0117015948 1.03761854021 1.07717125288 1.13095455063
0.008 0 887 7.27461421247 11.2755138444 6.64899792979 0.000974619185608 4.33760040716e-06 97.4619182195 0.0867520800533 0.984433771469 -0.170586579294 -0.0423127469799 0.145837674011 0.658470875326 0.738341031764 
0.008 887 0 7.27461421247 11.2755138444 6.64899792979 0.000974619185608 4.33760040716e-06 97.4619182195 0.0867520800533 -0.984433771469 0.170586579294 0.0423127469799 -0.145837674011 -0.658470875326 -0.738341031764 
0.008 0 8845 7.90284821362 11.4658264202 7.26529902842 0.110530350443 0.00536257173099 11053.0350414 107.251434471 0.213274962743 -0.438677902466 -0.872969351212 -0.411755829948 -0.850655035361 0.326868700427 
0.008 8845 0 7.90284821362 11.4658264202 7.26529902842 0.110530350443 0.00536257173099 11053.0350414 107.251434471 -0.213274962743 0.438677902466 0.872969351212 0.411755829948 0.850655035361 -0.326868700427 
0.008 0 521 7.6646990021 10.5040106009 6.88903752062 0.00504147840528 0.000128931622848 504.147840346 2.57863215449 0.497796648814 0.79640579983 -0.343418547004 0.0167640041807 0.387056165816 0.921903732863 
0.008 521 0 7.6646990021 10.5040106009 6.88903752062 0.00504147840528 0.000128931622848 504.147840346 2.57863215449 -0.497796648814 -0.79640579983 0.343418547004 -0.0167640041807 -0.387056165816 -0.921903732863 
0.008 0 8462 8.51563871619 10.5252189655 6.56377074841 0.0695062123246 0.00333349601709 6950.62123031 66.6699200957 -0.593216665384 0.802236178796 0.0671647328714 -0.475994807994 -0.416810585143 0.774401626338

The first three lines with # should be ignored (also ignored for calculating the average).

What I want to do is to generate two sub-files from each file according to the condition:

Code:
if (col [9] / average (col[9]) ) > 1

print the whole lines in the first sub-file named grad_high.1000, grad_high.1001 etc T

else

Code:
if (col [9] / average (col[9]) ) < 1

print the whole lines in the second-subfile named grad_low.1000, grad_low.1001 etc

so that for 250 original grad files, I have 500 new sub-files. I have attached two files as example.

Can someone please help me with this using awk or any other scripting method?
# 2  
Old 06-12-2013
Ok. Is the column9 average based on all files - the average of column9 from grad.1000, grad.1001...grad.nnnn? Or is it based on just the file you are currently working on?

BTW the comparison
Code:
((col [9] / average (col[9]) ) < 1)

is the same as
Code:
(col[9] < average(col[9])

not using division.
# 3  
Old 06-12-2013
The average is based on the file I am currently working on - so the average will be different for different files.
yes!
Code:
 
((col [9] / average (col[9]) ) < 1)

is the same as

Code:
(col[9] < average(col[9])

# 4  
Old 06-12-2013
In any event this is either going to be a memory hog, or require two passes thru a file. Neither is a great choice. I'm opting for two passes.

Code:
#!/bin/bash
cd /path/to/directory
ls grad* >/tmp/files.lis
while read fname
do
    ext=${fname##*.}
    hi=grad_high.${ext}
    lo=grad_low.${ext}
    avg=$(awk '{sum+=$9; rows++} END {printf("%f", sum/rows) }' $fname)
    awk -v avg=$avg -v hi="$hi" -v lo="$lo" ' avg>=$9 {print $0 > hi; next} {print $0 > lo}' $fname   
done </tmp/files.lis

This User Gave Thanks to jim mcnamara For This Post:
# 5  
Old 06-12-2013
Thanks Jim, this should work well well. I will run the script on a cluster, so no problems about memory. Would it be possible to omit the first three lines (with #) in each file in the average calculation? It seems it makes a tiny difference in the final result.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Read and write operations on files.

Dears. kindly guide !!! I have data, which is delimited by | . it should contain 26 columns, but one column data contain | makes few row to 27 columns. I want to find rows have 27 columns and then concatenate the specific columns to single column to make it 26 columns. Kindly help, Can... (3 Replies)
Discussion started by: sadique.manzar
3 Replies

2. Shell Programming and Scripting

Complex data sorting in excel files or text files

Dear all, I have a complex data file shown below,,,,, A_ABCD_13208 0 0 4.16735 141044 902449 1293900 168919 C_ABCD_13208 0 0 4.16735 141044 902449 1293900 168919 A_ABCDEF715 52410.9 18598.2 10611 10754.7 122535 252426 36631.4 C_DBCDI_1353 0... (19 Replies)
Discussion started by: AAWT
19 Replies

3. Shell Programming and Scripting

Sorting files

I have the following set of files and I want to order them according to the ascending values of the run: For example, doing ls -lrt *drw*.log gives n02-z30-sr65-rgdt0p25-dc0p08-4x3drw.log n02-z30-sr65-rgdt0p25-dc0p03-8x6drw.log n02-z30-sr65-rgdt0p25-dc0p01-8x6drw.log ... (18 Replies)
Discussion started by: kristinu
18 Replies

4. Shell Programming and Scripting

i need help in sorting two files

i have file a 123 234 456 567 678 and file b 123|xxx|hhh|ppp or zzz 234|rrr|ttt|xxx 432|ttt|mmm|nnn 678|cft|byt|mop i want to compare file a to file b such that when each of the lines in file a can be found in file b column1 and also xxx or hhh or ppp or zzz can be... (12 Replies)
Discussion started by: blackzinga80
12 Replies

5. UNIX for Dummies Questions & Answers

help with sorting files

find / -type f 2> /dev/null | find -inum +1 2> /dev/null | find -mtime -30 2> /dev/null what i am trying to do i search all regular files in root directory with one or more inodes modified within last 30 days. the /dev/null is to suppress the permission denied outputs. i am now trying to... (5 Replies)
Discussion started by: iluvsushi
5 Replies

6. Shell Programming and Scripting

Matrix Operations of two files

Hi , I have two files aaa.txt (which contains) 1 2 3 4 5 6 7 8 9 10 11 12 and bbb.txt (which contains) -1 -2 -3 -4 -5 -6 5 -8 0 3 0 0 the output that I intended to have is 0 0 0 0 0 0 6 0 4.5 6.5 5.5 6 i.e. Averaging the script is in the file abc Begin{START of the... (2 Replies)
Discussion started by: narendra_linux
2 Replies

7. Shell Programming and Scripting

Operations on columns of 2 files

Hi I have 2 file with many lines and colums and i want to do some operation for each value in the 2 files : Matrix1 : a11 a12 a13 a14 ... a21 a22 a23 a42 ... a31 a32 a33 a32 ... ... Matrix2 : b11 b12 b13 b14 ... b21 b22 b23 b42 ... b31 b32 b33 b32 ... ... I want to have the... (8 Replies)
Discussion started by: rauchy
8 Replies

8. Shell Programming and Scripting

Sorting Files

How to sort such files which contains records of varying length and varying lines? (With respect to Bash shell) Eg: Each record begins with a sting of 1/0(binary) which may or may not be followed by properties like AB,BS etc. I have to sort such records on the basis of 1/0 string and keep the... (2 Replies)
Discussion started by: sandeep_hi
2 Replies

9. Shell Programming and Scripting

Sorting files

Hi, What is the command for sorting files according to their size Thanx in advance (4 Replies)
Discussion started by: sendhil
4 Replies

10. UNIX for Advanced & Expert Users

sorting files

ok so I'm having major issues trying to figure this out: I have this program that I'm inputting the files in hte current directory which are image files...it spits out 5 line chunks describing the files... filename: (name of file) size: (100 x 200) arbitrary data arbitrary data arbitrary... (4 Replies)
Discussion started by: Infraredskies
4 Replies
Login or Register to Ask a Question