Grouping files on pattern


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Grouping files on pattern
# 1  
Old 11-09-2017
Grouping files on pattern

I have this Requirement where i have to group the files,
I have a folder say "temp" where many files resides...files are like this;
Code:
010020001_S-ABC-Sort-DEFAW_YYYYMMDD_HHMMSS.txt
010020004_S-PQR-Sort-DRTON_YYYYMMDD_HHMMSS.txt
010020009_S-JKL-Sort_MNOLO_YYYYMMDD_HHMMSS.txt
010020001_S-ABC-Sort-DEFAW_YYYYMMDD_HHMMSS.txt
010020004_S-PQR-Sort-DRTON_YYYYMMDD_HHMMSS.txt
010020001_S-ABC-Sort_DEFAW_YYYYMMDD_HHMMSS.txt
010020009_S-JKL-Sort-MNOLO_YYYYMMDD_HHMMSS.txt
010020001_S-ABC-Sort_DEFAW_YYYYMMDD_HHMMSS.txt

So here, in every file there are three patterns i.e.
010020001_S-ABC-Sort-DEFAW_YYYYMMDD_HHMMSS.txt

so these files should be grouped based on marked words(words can be distinguised either by "-" or "_"). and all 3 pattern are of descirbed length always.
After grouping files move in there particular directory for eg

one folder will be created with the name
/temp/010020001_S-ABC-Sort-DEFAW
and inside this file there will be corresponding grouped files i.e.
Code:
010020001_S-ABC-Sort-DEFAW_YYYYMMDD_HHMMSS.txt
010020001_S-ABC-Sort_DEFAW_YYYYMMDD_HHMMSS.txt
010020001_S-ABC-Sort_DEFAW_YYYYMMDD_HHMMSS.txt
010020001_S-ABC-Sort-DEFAW_YYYYMMDD_HHMMSS.txt

/temp/010020004_S-PQR-Sort-DRTON
Code:
010020004_S-PQR-Sort-DRTON_YYYYMMDD_HHMMSS.txt
010020004_S-PQR-Sort-DRTON_YYYYMMDD_HHMMSS.txt

/temp/010020009_S-JKL-Sort-MNOLO
Code:
010020009_S-JKL-Sort-MNOLO_YYYYMMDD_HHMMSS.txt
010020009_S-JKL-Sort_MNOLO_YYYYMMDD_HHMMSS.txt

Please help me in this. Let me know if require any other infornation.

TIASmilie


Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!

Last edited by RudiC; 11-09-2017 at 03:13 PM.. Reason: Added CODE tags.
# 2  
Old 11-09-2017
Any attempts / ideas / thoughts from your side?

What OS / shell version do you use?
# 3  
Old 11-10-2017
My bash version :
-bash-4.1$ echo $BASH_VERSION
4.1.2(1)-release

Is this the way to get the version or is it something else.

I guess i have made my requirement bit complicated,

so basically my first requirement is to group the files based on 3 patterns i.e.
Code:
010020001_S-ABC_Sort_DEFAW_YYYYMMDD_HHMMSS.txt

First Patter-last 3 char of first word

Code:
010020001_S-ABC_Sort_DEFAW_YYYYMMDD_HHMMSS.txt

second pattern- After S there can be "-" or "_" doesnt matter but after that 3 char

Code:
010020001_S-ABC_Sort_DEFAW_YYYYMMDD_HHMMSS.txt

third pattern- After Sort there can be "-" or "_" doesnt matter but after that 5 char

And these file naming and pattern is constant.
after grouping them need to move in seperate dir and those dir can be of any name like
Dir ABC-first set of grouped files
Dir BDC-second set of grouped files and so on

Last edited by gnnsprapa; 11-10-2017 at 05:56 AM.. Reason: formatting
# 4  
Old 11-10-2017
That's not really explaining your attempts ... howsoever, try
Code:
for FN in *.txt; do TMP="${FN:0:20}-${FN:21:5}"; echo mkdir -p "$TMP";  echo mv "${FN}" "$TMP"; done

Remove the echo commands if output deems acceptable...
# 5  
Old 11-10-2017
The question RudiC presented was not about your requirements, but about what operating system you're using (which you have yet to answer) and what shell you're using (which you have answered). He also asked what you have tried to solve this problem on your own. We are here to help you learn how to use the tools available on your operating system to do what you need to do. We are not here to act as your unpaid programming staff.

Please tell us what operating system you're using. The output from:
Code:
uname -a

is a great way to tell us what we need to know.

And, please show us what you have tried to solve this problem on your own.

If the various parts of your filenames are not all fixed length, you could also try something like:
Code:
#!/bin/bash
for file in 010020*[_-]S[_-]*[_-]*[_-]*[_-][0-9][0-9][0-9][0-9][01][0-9][0-3][0-9][_-][0-2][0-9][0-6][0-9][0-6][0-9].txt
do	IFS='[_-]'
	set -- $file
	unset IFS
	destination=/tmp/${1}_$2-$3-$4-$5
	mkdir -p "$destination"
	mv "$file" "$destination/$file"
done


Last edited by Don Cragun; 11-10-2017 at 06:17 AM.. Reason: Add possible solution.
This User Gave Thanks to Don Cragun For This Post:
# 6  
Old 11-10-2017
Thanks a lot Don Cragun,
please find below the output

Code:
-bash-4.1$ uname -a
Linux ucsdv181.symprod.com 2.6.32-696.10.3.el6.x86_64 #1 SMP Thu Sep 21 12:12:50 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

I tried your code and it worked perfectly fine.. just want to know couple of things in that script like

Code:
#!/bin/bash
for file in 010020*[_-]S[_-]*[_-]*[_-]*[_-][0-9][0-9][0-9][0-9][01][0-9][0-3][0-9][_-][0-2][0-9][0-6][0-9][0-6][0-9].txt
do	IFS='[_-]' --what is the use of IFS here
	set -- $file --What this command is doing
	unset IFS
	destination=/tmp/${1}_$2-$3-$4-$5
	mkdir -p "$destination"
	mv "$file" "$destination/$file"
done

Also, the first word in my file i.e
010020001_S-ABC-Sort-DEFAW_20170412_121224.txt need not to be start from these same values..what i am trying to say is

010020001_S-ABC-Sort-DEFAW_20170412_121224.txt
111020001_S-ABC-Sort-DEFAW_20180412_121224.txt
should be of same group and move to one dir instead of creating two directories..in first word only last 3 char matters to group them.
Again thanks

Last edited by gnnsprapa; 11-10-2017 at 08:44 AM.. Reason: Updating
# 7  
Old 11-10-2017
Quote:
Originally Posted by gnnsprapa
Thanks a lot Don Cragun,
please find below the output

Code:
-bash-4.1$ uname -a
Linux ucsdv181.symprod.com 2.6.32-696.10.3.el6.x86_64 #1 SMP Thu Sep 21 12:12:50 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

I tried your code and it worked perfectly fine.. just want to know couple of things in that script like

Code:
#!/bin/bash
for file in 010020*[_-]S[_-]*[_-]*[_-]*[_-][0-9][0-9][0-9][0-9][01][0-9][0-3][0-9][_-][0-2][0-9][0-6][0-9][0-6][0-9].txt
do	IFS='[_-]' --what is the use of IFS here
	set -- $file --What this command is doing
	unset IFS
	destination=/tmp/${1}_$2-$3-$4-$5
	mkdir -p "$destination"
	mv "$file" "$destination/$file"
done

Also, the first word in my file i.e
010020001_S-ABC-Sort-DEFAW_20170412_121224.txt need not to be start from these same values..what i am trying to say is

010020001_S-ABC-Sort-DEFAW_20170412_121224.txt
111020001_S-ABC-Sort-DEFAW_20180412_121224.txt
should be of same group and move to one dir instead of creating two directories..in first word only last 3 char matters to group them.
Again thanks
The IFS variable is used by the shell when splitting fields. Each character in the value of the string assigned to IFS will be used as a field delimiter (although there are some special cases for strings of adjacent characters in the space character class when characters in that class are field separators). The value I used happens to work for the filenames in your example, but it should have just been:
Code:
IFS='_-'

or:
Code:
IFS=_-

to have the shell split fields on just <underscore> and <hyphen> characters. The value I used in the script would also split fields on open and close square brackets. When IFS is unset, the shell behaves as if IFS had been set to a string containing the three characters <space>, <tab>, and <new-line>.

The command:
Code:
set -- $file

first clears all positional parameters and then sets the positional parameters for the current shell execution environment to the values obtained by performing field splitting on the expansion of the file variable. When $file expands to the string:
Code:
010020001_S-ABC-Sort-DEFAW_YYYYMMDD_HHMMSS.txt

that sets the positional parameters as follows:
Code:
$1 010020001
$2 S
$3 ABC
$4 Sort
$5 DEFAW
$6 YYYYMMDD
$7 HHMMSS.txt

and sets the special parameter # to the number of positional parameters (i.e., $# expands to 7).

In post #1 in this thread you said that all of the filenames you would be processing started with 010020, and I wrote the filename matching pattern used to identify the files to be processed by the for loop in the script I gave you to match that statement. If those 1st six characters aren't always 010020, how do we know what name is supposed to be used for the directory into which files are to be moved?

Are you now saying that files to be processed have names starting with 010020 or 111020, or can there be any string of six digits there? Or, any string of six characters? Or, any string of an arbitrary number of characters?

Making lots of unwarranted assumptions, maybe the following will come closer to what you want:
Code:
#!/bin/bash
for file in ??????[0-9][0-9][0-9][_-]S[_-]*[_-]*[_-]*[_-]*[_-]*.txt
do	IFS=_-
	set -- $file
	unset IFS
	destination=/tmp/010020${1#??????}_$2-$3-$4-$5
	mkdir -p "$destination"
	mv "$file" "$destination/$file"
done


Last edited by Don Cragun; 11-10-2017 at 11:59 PM.. Reason: Remove the echo from the mv command.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to grouping time and based on value with multiple pattern?

Hi All, need help... I have some log below : ### {"request_id":"e8395eb0-a8bd-11e9-b77b-d507ea5312aa","message":"when inquiry paybill 628524871 prevalidation cause : Invalid Transaction"} ### {"request_id":"043f2310-a8be-11e9-b57b-f9c7344998d7","message":"when inquiry paybill 62821615... (2 Replies)
Discussion started by: fajar_3t3
2 Replies

2. Shell Programming and Scripting

grouping log files based on counter

I have my log file as below 00:18:02 - Nothing normal; Garbage Collection kicked off & running from last 3 min... 00:19:02 - Nothing normal; Garbage Collection kicked off & running from last 4 min... 00:19:02 - Nothing normal; Garbage Collection kicked off & running from last 4 min...... (11 Replies)
Discussion started by: manas_ranjan
11 Replies

3. Shell Programming and Scripting

Grouping files according to certain fields in their name

I have a list of fils stored insortedLst, and want to select certain fields to group specific files together: Example of the files would be as below: n02-z30-dsr65-ndelt0.25-varp0.002-16x12drw-run1.log n02-z30-dsr65-ndelt0.25-varp0.002-16x12drw-run2.log... (2 Replies)
Discussion started by: kristinu
2 Replies

4. Shell Programming and Scripting

combine 3 files by grouping

I have a file, which is really large but i shortened it: A3059GVS 1 A 01 Plate_1 40 25.37016 14.6298 A3059GVS 2 A 01 Plate_2 40 26.642002 13.3583 A3059GVS 3 A 02 Plate_1 40 25.381462 ... (4 Replies)
Discussion started by: mykey242
4 Replies

5. Shell Programming and Scripting

Searching across multiple files if pattern is available in all files searched

I have a list of pattern in a file, I want each of these pattern been searched from 4 files. I was wondering this can be done in SED / AWK. say my 4 files to be searched are > cat f1 abc/x(12) 1 abc/x 3 cde 2 zzz 3 fdf 4 > cat f2 fdf 4 cde 3 abc 2... (6 Replies)
Discussion started by: novice_man
6 Replies

6. Shell Programming and Scripting

Find required files by pattern in xml files and the change the pattern on Linux

Hello, I need to find all *.xml files that matched by pattern on Linux. I need to have written the file name on the screen and then change the pattern in the file just was found. For instance. I can start the script with arguments for keyword and for value, i.e script.sh keyword... (1 Reply)
Discussion started by: yart
1 Replies

7. Shell Programming and Scripting

parsing file names and then grouping similar files

Hello Friends, I have .tar files which exists under different directories after the below code is run: find . -name "*" -type f -print | grep .tar > tmp.txt cat tmp.txt ./dir1/subdir1/subdir2/database-db1_28112009.tar ./dir2/subdir3/database-db2_28112009.tar... (2 Replies)
Discussion started by: EAGL€
2 Replies

8. Shell Programming and Scripting

Grouping files into tars

Hi all, I have a problem where i have several files in a directory which I SCP from a server to my local machine and i would like to periodically tar/gzip them based on their naming convention. Here is the scenario: I SCP files (which all end with the same ending) periodically across to a... (3 Replies)
Discussion started by: muay_tb
3 Replies

9. UNIX for Dummies Questions & Answers

copying a pattern of files in one directory into other with new pattern names...

Hi, I have to copy a set of files abc* in /path/ to /path1/ as abc*_bkp. The list of files appear as follows in /path/: abc1 xyszd abc2 re2345 abcx .. . abcxyz I have to copy them (abc* files only) into /path1/ as: abc1_bkp abc2_bkp abcx_bkp .. . (6 Replies)
Discussion started by: new_learner
6 Replies

10. Shell Programming and Scripting

Pattern searching pattern in c files

I have a problem in searching a specific pattern in c files. My requirement: I have to find all the division operator in all cfiles. The problem is, the multi line comments and single line comments will also have forward slash in it. Even after avoiding these comments also, if both... (6 Replies)
Discussion started by: murthybptl
6 Replies
Login or Register to Ask a Question