Grouping files according to certain fields in their name


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Grouping files according to certain fields in their name
# 1  
Old 03-01-2012
Grouping files according to certain fields in their name

I have a list of fils stored insortedLst, and want to select certain fields to group specific files together:

Example of the files would be as below:

Code:
n02-z30-dsr65-ndelt0.25-varp0.002-16x12drw-run1.log
n02-z30-dsr65-ndelt0.25-varp0.002-16x12drw-run2.log
n02-z30-dsr65-ndelt0.25-varp0.002-16x12drw-run3.log
n02-z30-dsr65-ndelt0.25-varp0.004-16x12drw-run1.log
n02-z30-dsr65-ndelt0.25-varp0.004-16x12drw-run2.log
n02-z30-dsr65-ndelt0.25-varp0.004-16x12drw-run3.log
n02-z30-dsr65-ndelt0.25-varp0.006-16x12drw-run1.log
n02-z30-dsr65-ndelt0.25-varp0.006-16x12drw-run2.log
n02-z30-dsr65-ndelt0.25-varp0.006-16x12drw-run3.log
n02-z30-dsr65-ndelt0.25-varp0.008-16x12drw-run1.log
n02-z30-dsr65-ndelt0.25-varp0.008-16x12drw-run2.log
n02-z30-dsr65-ndelt0.25-varp0.008-16x12drw-run3.log
n02-z30-dsr65-ndelt0.25-varp0.010-16x12drw-run1.log
n02-z30-dsr65-ndelt0.25-varp0.010-16x12drw-run2.log
n02-z30-dsr65-ndelt0.25-varp0.010-16x12drw-run3.log

I use the following command to group similar files according to similar fields

Code:
        echo $sortedLst | tr ' ' '\n' \
          | awk -F- '{ c=($4$5$6!=p && FNR!=1)?ORS:""; p=$4$5$6 } { printf("%c%s\n",c,$0) }'

I now want the user to be able to define the grouping fields himself rather than hardwiring '$4$5$6' in the awk script.

Using the code above, the output would then be shown like this:

Code:
n02-z30-dsr65-ndelt0.25-varp0.002-16x12drw-run1.log
n02-z30-dsr65-ndelt0.25-varp0.002-16x12drw-run2.log
n02-z30-dsr65-ndelt0.25-varp0.002-16x12drw-run3.log
   
n02-z30-dsr65-ndelt0.25-varp0.004-16x12drw-run1.log
n02-z30-dsr65-ndelt0.25-varp0.004-16x12drw-run2.log
n02-z30-dsr65-ndelt0.25-varp0.004-16x12drw-run3.log
   
n02-z30-dsr65-ndelt0.25-varp0.006-16x12drw-run1.log
n02-z30-dsr65-ndelt0.25-varp0.006-16x12drw-run2.log
n02-z30-dsr65-ndelt0.25-varp0.006-16x12drw-run3.log
   
n02-z30-dsr65-ndelt0.25-varp0.008-16x12drw-run1.log
n02-z30-dsr65-ndelt0.25-varp0.008-16x12drw-run2.log
n02-z30-dsr65-ndelt0.25-varp0.008-16x12drw-run3.log
   
n02-z30-dsr65-ndelt0.25-varp0.010-16x12drw-run1.log
n02-z30-dsr65-ndelt0.25-varp0.010-16x12drw-run2.log
n02-z30-dsr65-ndelt0.25-varp0.010-16x12drw-run3.log


Last edited by kristinu; 03-01-2012 at 11:10 AM..
# 2  
Old 03-01-2012
If the user selects a grouping field, would a grep be sufficient?

Can you give us some examples of what the user could select?
# 3  
Old 03-01-2012
Quote:
Originally Posted by Shell_Life
If the user selects a grouping field, would a grep be sufficient?

Can you give us some examples of what the user could select?
The user would want to list for example the log files present in the directory.

Let us assume that the files are these one. These files would have already been sorted using the numeric values rather than just alphabetical, because if one does not sort numerically certain field, everything will be mixed up.

Code:
n02-z30-dsr65-ndelt0.25-varp0.002-16x12drw-run1.log
n02-z30-dsr65-ndelt0.25-varp0.002-16x12drw-run2.log
n02-z30-dsr65-ndelt0.25-varp0.002-16x12drw-run3.log
n02-z30-dsr65-ndelt0.25-varp0.004-16x12drw-run1.log
n02-z30-dsr65-ndelt0.25-varp0.004-16x12drw-run2.log
n02-z30-dsr65-ndelt0.25-varp0.004-16x12drw-run3.log
n02-z30-dsr65-ndelt0.25-varp0.006-16x12drw-run1.log
n02-z30-dsr65-ndelt0.25-varp0.006-16x12drw-run2.log
n02-z30-dsr65-ndelt0.25-varp0.006-16x12drw-run3.log
n02-z30-dsr65-ndelt0.25-varp0.008-16x12drw-run1.log
n02-z30-dsr65-ndelt0.25-varp0.008-16x12drw-run2.log
n02-z30-dsr65-ndelt0.25-varp0.008-16x12drw-run3.log
n02-z30-dsr65-ndelt0.25-varp0.010-16x12drw-run1.log
n02-z30-dsr65-ndelt0.25-varp0.010-16x12drw-run2.log
n02-z30-dsr65-ndelt0.25-varp0.010-16x12drw-run3.log

As an example, the three files here are grouped together since they represent the running of the same programs with same parameters, for various running instances. These will then be separated from the others by a blank line.
Code:
n02-z30-dsr65-ndelt0.25-varp0.002-16x12drw-run1.log
n02-z30-dsr65-ndelt0.25-varp0.002-16x12drw-run2.log
n02-z30-dsr65-ndelt0.25-varp0.002-16x12drw-run3.log

For example
$4$5$6 gives ndelt0.25-varp0.002-16x12drw. When this changed I insert a blank line. The user would be able to supply the fields he wants for grouping things together.

As an example, he can pass to the command line argument --group=4/5/6

As you say, the user can use grep, but then he will also need to sort by numerical values and separate files related to the various runs himself. I would like for him to just run a script and does the work for him, like numeric sorting by certain fields first, then by others, then perform some group. The only things he would need to specify would be the sorting order (e.g. --sort=3/1/5) and the grouping field (e.g. --group=4/5/6 as in this example).

Last edited by kristinu; 03-01-2012 at 11:39 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Is there a UNIX command that can compare fields of files with differing number of fields?

Hi, Below are the sample files. x.txt is from an Excel file that is a list of users from Windows and y.txt is a list of database account. $ head -500 x.txt y.txt ==> x.txt <== TEST01 APP_USER_PROFILE USER03 APP_USER_PROFILE TEST02 APP_USER_EXP_PROFILE TEST04 APP_USER_PROFILE USER01 ... (3 Replies)
Discussion started by: newbie_01
3 Replies

2. Shell Programming and Scripting

Grouping files on pattern

I have this Requirement where i have to group the files, I have a folder say "temp" where many files resides...files are like this; 010020001_S-ABC-Sort-DEFAW_YYYYMMDD_HHMMSS.txt 010020004_S-PQR-Sort-DRTON_YYYYMMDD_HHMMSS.txt 010020009_S-JKL-Sort_MNOLO_YYYYMMDD_HHMMSS.txt... (8 Replies)
Discussion started by: gnnsprapa
8 Replies

3. Shell Programming and Scripting

grouping log files based on counter

I have my log file as below 00:18:02 - Nothing normal; Garbage Collection kicked off & running from last 3 min... 00:19:02 - Nothing normal; Garbage Collection kicked off & running from last 4 min... 00:19:02 - Nothing normal; Garbage Collection kicked off & running from last 4 min...... (11 Replies)
Discussion started by: manas_ranjan
11 Replies

4. Shell Programming and Scripting

Compare fields in files

Hi, I need the most efficient way of comparing the following and arriving at the result I have a file which has entries like, File1: 1|2|5|7|8|2|3|6|3|1 File2: 1|2|3|1|2|7|9|2 I need to compare the entries in these two file with those of a general file, 1|2|3|5|2|5|6|9|3|1... (7 Replies)
Discussion started by: pradebban
7 Replies

5. UNIX for Dummies Questions & Answers

Please help me to find out maximum value of a field based on grouping of other fields.

Please help me to find out maximum value of a field based on grouping of other fields, as we do in SQL. Like in SQL if we are having below records : Client_Name Associate_Name Date1 Value C1111 A1111 2012-01-17 10 C1111 A1111 ... (1 Reply)
Discussion started by: KamalKumarKalra
1 Replies

6. Shell Programming and Scripting

Add fields in different files only if some fields between them match

Hi everybody (first time posting here) I have a file1 that looks like > 1,101,0.1,0.1 1,26,0.1,0.1 1,3,0.1,0.1 1,97,0.5,0.5 1,98,8.1,0.218919 1,99,6.2,0.248 2,101,0.1,0.1 2,24,3.1,0.147619 2,25,23.5,0.559524 2,26,34,0.723404with 762 lines.. I have another 'similar' file2 > ... (10 Replies)
Discussion started by: murpholinox
10 Replies

7. Shell Programming and Scripting

combine 3 files by grouping

I have a file, which is really large but i shortened it: A3059GVS 1 A 01 Plate_1 40 25.37016 14.6298 A3059GVS 2 A 01 Plate_2 40 26.642002 13.3583 A3059GVS 3 A 02 Plate_1 40 25.381462 ... (4 Replies)
Discussion started by: mykey242
4 Replies

8. Shell Programming and Scripting

Comparing fields in two files

Hi, i want to compare two files by one field say $3 in file1 needs to compare with $2 in file2. sample file1 - reqd_charge_code 2263881188,24570896,439 2263881964,24339077,439 2263883220,22619162,228 2263884224,24631840,442 2263884246,22612161,442 sample file2 - rg_j ... (2 Replies)
Discussion started by: raghavendra.cse
2 Replies

9. Shell Programming and Scripting

parsing file names and then grouping similar files

Hello Friends, I have .tar files which exists under different directories after the below code is run: find . -name "*" -type f -print | grep .tar > tmp.txt cat tmp.txt ./dir1/subdir1/subdir2/database-db1_28112009.tar ./dir2/subdir3/database-db2_28112009.tar... (2 Replies)
Discussion started by: EAGL€
2 Replies

10. Shell Programming and Scripting

Grouping files into tars

Hi all, I have a problem where i have several files in a directory which I SCP from a server to my local machine and i would like to periodically tar/gzip them based on their naming convention. Here is the scenario: I SCP files (which all end with the same ending) periodically across to a... (3 Replies)
Discussion started by: muay_tb
3 Replies
Login or Register to Ask a Question