Sponsored Content
Special Forums UNIX Desktop Questions & Answers Combining files with specific patterns of naming in a directory Post 302736869 by Don Cragun on Wednesday 28th of November 2012 05:16:21 AM
Old 11-28-2012
Quote:
Originally Posted by A-V
Sorry for the confusions
Q1) yes, it is a capital X
Q2) directory name can be anything XXXX or count or ...
Q3) as 64 is a fixed digit it does not make any important role... the name should present the letter which indicates what area they are from + are they train or test - of so what group of it (letter+# for train and # only for test)
Q4) I dont know what difference it will make
Q5) I am not sure I understand the question

Q6) I am still learning Unix -- "what is a source file?" --- it can be in another directory --it would be easier to see the results


o wow... I just tested it and it works like magic

may I ask you to explain what "f%%" does?
and how can I make it read from higher directory and put the results in another
such as puredate/* to count/*

---------- Post updated at 05:56 PM ---------- Previous update was at 11:05 AM ----------

one more question?

would it be possible to put every letter in one new folder which will include both the train and the test? 64X, 64Y
OK. I think I understand what you want.

In this context a source file is any one of the input files that matches either your Train set pattern or your Test set pattern.

The construct ${var%%pattern} expands to the contents of the shell variable var with the longest string that matches pattern at the end of the string removed. Similarly ${var%pattern} expands to the contents of the shell variable var with the shortest string that matches pattern at the end of the string removed, ${var##pattern} expands to the contents of the shell variable var with the longest string that matches pattern at the start of the string removed, and ${var#pattern} expands to the contents of the shell variable var with the shortest string that matches pattern at the start of the string removed. If the given pattern doesn't match the appropriate part of the expansion of $var, $var is expanded in full.

So, for example if $src is set to
Code:
puredate/64Xtest14.txt-James-Maggie.txt

or to
Code:
/home/dwc/test/puredate/64Xtest14.txt-James-Maggie.txt

then the command:
Code:
sf=${src##*/}

will set sf to 64Xtest14.txt-James-Maggie.txt, and then the command:
Code:
df="${sf%%.txt*}"

will set df to 64Xtest14, and then the commands:
Code:
df=${df#64[A-Z]train}
df=${df#64[A-Z]test}

will set df to 14 (with the 1st command leaving df unchanged and the 2nd command removing the leading 64Xtest. (With a source filename matching the pattern with train in it, the 1st command would remove the leading part of the string up to and including train and the 2nd command would leave the value unchanged.)

If you save the following script in a file, name it consolidate, make it executable, and execute it; it will consolidate all text in the files in and under the current working directory that match the pattern 64[A-Z]test[0-9][0-9].txt-*.txt or the pattern 64[A-Z]train[A-Z][0-9].txt-*.txt into files named 64[A-Z]/[A-Z][0-9][0-9].txt or 64[A-Z]/[A-Z][A-Z][0-9].txt under the current working directory, respectively:
Code:
#!/bin/ksh
# Usage: consolidate
#  The consolidate utility copies the contents of source files with
#  names matching one of two patterns in or under the current working
#  directory into summary files in directories (with the directory
#  name and file name derived from the name of the source file).
#   */64[A-Z]test[0-9][0-9].txt-*.txt -> 64[A-Z]/[A-Z][0-9][0-9].txt
#   */64[A-Z]train[A-Z][0-9].txt-*.txt -> 64[A-Z]/[A-Z][A-Z][0-9].txt
ec=0    # Script exit code.
find .  -name '64[A-Z]test[0-9][0-9].txt-*.txt' -o \
        -name '64[A-Z]train[A-Z][0-9].txt-*.txt' | while read src
do
        # Get last component of pathname of source file ($sf).
        sf="${src##*/}"
        # Target directory ($dir) will be "64x" (where x is a single upper case
        # letter) after throwing away train* or test*.
        dir="${sf%%t*}"
        # Create the target directory if it doesn't already exist.
        if [ ! -d "$dir" ]
        then    mkdir "$dir"
                rc=$?
                if [ $rc -ne 0 ]
                then    ec=1
                        printf "%s: \"%s\" not processed.\n" "$0" "$src" >&2
                        continue
                fi
        fi
        # Change source filename ($sf) to destination filename ($df):
        df="${sf%%.txt*}"       # Get rid of trailing ".txt-*.txt"
        df="${df#64[A-Z]train}" # Get rid of leading "64[A-Z]train" or
        df="${df#64[A-Z]test}"  #   "64[A-Z]test".
        df="${dir#64}$df.txt"   # Put back the "[A-Z]" removed in last step and
                                #   add trailing ".txt".
        cat "$src" >> "$dir"/"$df"
        rc=$?
        if [ $rc -eq 0 ]
        then    ;# printf "%s: cat %s >> %s succeeded\n" "$0" "$src" "$dir/$df"
                # rm "$src"
        else    ec=1
                printf "%s: cat %s >> %s failed (%d)\n" \
                        "$0" "$src" "$dir/$df" "$rc" >&2
        fi
done
exit $ec

This was written and tested using ksh, but only uses shell features specified by the POSIX standards and the Single UNIX Specifications (so it should work the same with any shell that conforms to these standards). It could be made a little more efficient using features that are only available in more recent versions of ksh, but the script shown here should work with any version of ksh as well as any other standards conforming shell.

If you would like to see a status report of the files successfully processed while this script is running, remove the ;# from the then clause of the last if command.

If you want to remove the source files after they have been successfully written into one of the consolidation files, remove the # in front of the rm command if the same then clause. Note that if you do this, you should also check the exit status of this rm command like the script does with the mkdir and cat commands.

You could also add options to be interpreted by this script to enable removing the source files that have been successfully copied, to enable printing of successfully completed copies, to set a different source directory, and to set a different destination directory, but I'll leave that as an exercise for the reader.

Hope this helps,
Don
This User Gave Thanks to Don Cragun For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete all files if another files in the same directory has a matching occurence of a specific word

Hello, I have several files in a specific directory. A specific string in one file can occur in another files. If this string is in other files. Then all the files in which this string occured should be deleted and only 1 file should remain with the string. Example. file1 ShortName "Blue... (2 Replies)
Discussion started by: premier_de
2 Replies

2. Shell Programming and Scripting

Naming of directory problem

hi all suppose in particular directory i have lots of directory supoose 201009 201010 201011 201012 now by mistake i have rename all these directory as 201009.bk 201010.bk 201011.bk 201012.bk now how can i revert the changes back pls help me regarding this (2 Replies)
Discussion started by: aishsimplesweet
2 Replies

3. Shell Programming and Scripting

Find files that do not match specific patterns

Hi all, I have been searching online to find the answer for getting a list of files that do not match certain criteria but have been unsuccessful. I have a directory that has many jpg files. What I need to do is get a list of the files that do not match both of the following patterns (I have... (21 Replies)
Discussion started by: nikos-koutax
21 Replies

4. UNIX for Dummies Questions & Answers

Need Help in reading N days files from a Directory & combining the files

Hi All, Request your expertise in tackling one requirement in my project,(i dont have much expertise in Shell Scripting). The requirement is as below, 1) We store the last run date of a process in a file. When the batch run the next time, it should read this file, get the last run date from... (1 Reply)
Discussion started by: dsfreddie
1 Replies

5. Shell Programming and Scripting

How to copy a directory without specific files?

Hi I need to copy a huge directory with thousands of files onto another directory but without *.WMV files (and without *.wmv - perhaps we need to use *.). Pls advise how can I do that. Thanks (17 Replies)
Discussion started by: reddyr
17 Replies

6. Shell Programming and Scripting

Delete all files if another files in the same directory has a matching occurrence of a specific word

he following are the files available in my directory RSK_123_20141113_031500.txt RSK_123_20141113_081500.txt RSK_126_20141113_041500.txt RSK_126_20141113_081800.txt RSK_128_20141113_091600.txt Here, "RSK" is file prefix and 123 is a code name and rest is just timestamp of the file when its... (7 Replies)
Discussion started by: kridhick
7 Replies

7. UNIX for Dummies Questions & Answers

Combining grep patterns with OR condition?!

Hello! I have a question about how to combine patterns in grep commands with the OR operator. So I have this little assignment here: Provide a regular expression that matches email addresses for San Jose City College faculty. A San Jose City college faculty’s email address takes the form:... (1 Reply)
Discussion started by: kalpcalp
1 Replies

8. Shell Programming and Scripting

Concatenation of files with same naming patterns dynamically

Since my last threads were closed on account of spamming, keeping just this one opened! Hi, I have the following reports that get generated every 1 hour and this is my requirement: 1. 5 reports get generated every hour with the names "Report.Dddmmyy.Thhmiss.CTLR"... (5 Replies)
Discussion started by: Jesshelle David
5 Replies

9. Shell Programming and Scripting

Bash - Find files excluding file patterns and subfolder patterns

Hello. For a given folder, I want to select any files find $PATH1 -f \( -name "*" but omit any files like pattern name ! -iname "*.jpg" ! -iname "*.xsession*" ..... \) and also omit any subfolder like pattern name -type d \( -name "/etc/gconf/gconf.*" -o -name "*cache*" -o -name "*Cache*" -o... (2 Replies)
Discussion started by: jcdole
2 Replies

10. UNIX for Beginners Questions & Answers

How to print lines from a files with specific start and end patterns and pick only the last lines?

Hi, I need to print lines which are matching with start pattern "SELECT" and END PATTERN ";" and only select the last "select" statement including the ";" . I have attached sample input file and the desired input should be as: INPUT FORMAT: SELECT ABCD, DEFGH, DFGHJ, JKLMN, AXCVB,... (5 Replies)
Discussion started by: nani2019
5 Replies
All times are GMT -4. The time now is 05:17 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy