Find a list of files in directory, move to new, allow duplicates

08-04-2014

Registered User

2, 0

Join Date: Aug 2014

Last Activity: 4 November 2014, 7:34 PM EST

Location: Gig Harbor, WA

Posts: 2

Thanks Given: 0

Thanked 0 Times in 0 Posts

Find a list of files in directory, move to new, allow duplicates

Greetings. I know enough Unix to be dangerous (!) and know that there is a clever way to do the following and it will save me about a day of agony (this time) and I will use it forever after! (many days of agony saved in the future)!

Basically
I need to find any image files (JPGs, PSDs etc) that contain any one of about 300 different product SKUs in the filename that are within a directory and it's subdirectories and then move them out and into one new directory - on my desktop for example and keep (not over-write) duplicates.

Spelled out in details:

find
I need to search a directory and all it's subdirectories for any filenames that contain a certain set of characters (eg. a filename CP37-BL which might be CP37-BL.jpg or CP37-BL+140806.jpg or there might be several CP37-BL.jpg's in multiple different subdirectories). I have 300 DIFFERENT finds to perform and would love to be able to cut and paste a space or comma separated list or draw from a .txt or .csv file all the different filename portions
eg.

Code:

CP36-BL
CP36-BR
CP36-GR
CP36-OR
CP36-PK

-- any instances that contains any part of that in the filename. The above is 5 of the 300. I would like to run the command 1x not 300x

mv
So, I need to find all those different files and MOVE to a particular target directory - AND allow duplicates (e.g. if there are 2 CP37-BL.jpg files then I want to keep both (appending a _v1 or _copy1 or something to the end of the filename in the case of duplicates.

That's it!
Any thoughts? And THANK YOU in advance!

PS. I am running this in Terminal on Apple OSX 10.9.4

Last edited by jim mcnamara; 08-05-2014 at 12:03 AM..

Clyde Lovett

View Public Profile for Clyde Lovett

Find all posts by Clyde Lovett

08-05-2014

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

Your duplicates requirement is not met in this code - I did not quite get it. Sorry.

Code:

find /path/to/files -type f  | grep -Ff text_file_with_file_names > outputfile

while read filename
do
  mv $filename   /path/to/somewhere/
done <outputfile

That said - DO NOT agglomerate zillions of user data files adhoc in one file tree.
It is completely possible to have parked those files into meaningful directory names - i.e., the text you are using as a key. Then you can simply look for a directory and go from there. Pre-planning beats kludge like this every time.

PS: because you want to run this ONE time there is a huge performance penalty:
grep -Ff filename has to scan the entire 300 lines of filename for every file it finds. And since you seem to have large numbers of files be prepared to wait.

Stuff like this should probably be written in C. dirent.h is your friend.

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

08-05-2014

Registered User

6,384, 2,214

Join Date: May 2005

Last Activity: 28 October 2019, 4:59 PM EDT

Location: In the leftmost byte of /dev/kmem

Posts: 6,384

Thanks Given: 143

Thanked 2,214 Times in 1,548 Posts

Quote:

Originally Posted by jim mcnamara

Code:

find /path/to/files -type f  | grep -Ff text_file_with_file_names > outputfile

while read filename
do
  mv $filename   /path/to/somewhere/
done <outputfile

Concur! Especially for the last Sentence!

Quote:

Originally Posted by jim mcnamara

PS: because you want to run this ONE time there is a huge performance penalty:
grep -Ff filename has to scan the entire 300 lines of filename for every file it finds. And since you seem to have large numbers of files be prepared to wait.

True. Here is my take at it, based on yours. The rationale is that after the first scan the inode list is cached in memory and the next finds will go a lot faster:

Code:

while read FILEMASK ; do
     find /path/to/sourcedir -type f -name "*${FILEMASK}*" |\
          while read MOVEFILE ; do
               FNAME="${MOVEFILE##*/}"
               if [ -f "/path/to/targetdir/$FNAME" ] ; then
                    mv "$MOVEFILE" "/path/to/targetdir/${FNAME}.$$"
               else
                    mv "$MOVEFILE" "/path/to/targetdir"
               fi
          done
done < /path/to/list.of.filemasks

Note that no security provisions are put into place: checks for exhausted diskspace, successful move operation, etc.. are all missing and you should add them before putting this sketch into a fire-and-forget script.

I hope this helps.

bakunin

bakunin

View Public Profile for bakunin

Find all posts by bakunin

08-05-2014

Registered User

2, 0

Join Date: Aug 2014

Last Activity: 4 November 2014, 7:34 PM EST

Location: Gig Harbor, WA

Posts: 2

Thanks Given: 0

Thanked 0 Times in 0 Posts

You guys are great - thank you!

I hear you both re: the pre-planning, but actually I'm like the janitor here - housekeeping some shared directories where product photo is put into Dropbox by multiple users / employees etc. Images are placed in folders based on Vendor and sale name and then shared via the cloud. So these images could be buckshot all over the place & duplicated x-number of times as they might be needed in one folder for one particular sale and then some duplication to cover another sale with a different vendor and so forth. We have thousands of product SKUs and on this occasion these 300 are now sold out and we will not restock them so the task is to remove any photos that have the SKU number in the file name ... hence my conundrum and visit to this forum - which I am glad to have discovered!

I spend my day as the staff photographer but because I have some aptitude for computers I get the IT hat thrown at me on a regular basis. Hopefully this sheds some light on things

So a couple of things
1. I don't mind waiting for it to run - I can even run something overnight if it's going to take a very long time.

2. As the script finds and moves files, it would be great if there is no over-write of files with duplicate filenames, but rather if a duplicate filename is found that it just append the duplicate filename with something like copy, copy1, copy2, etc.
and ...

3. I understand this is a script and some of the things that I need to replace here - like path/to/targetdir ... but a few questions
a. what identifies the file that is the list of my "SKUs" (the search criteria)?
b. is my file with the list of SKUs a text file, one SKU per line? (probably yes)
c. in what way to I save this code and run it - I gather this is a script - what do I do with it (now I'm really showing my "dummy" status on this, but I learn fast so hang in there with me!).

Thank you again - I appreciate expanding my knowledge in this area and thank you all for your time.

Clyde

Clyde Lovett

View Public Profile for Clyde Lovett

Find all posts by Clyde Lovett

08-06-2014

Moderator

3,843, 841

Join Date: Jun 2007

Last Activity: 29 June 2020, 12:30 PM EDT

Location: Lancashire, UK

Posts: 3,843

Thanks Given: 2,004

Thanked 841 Times in 727 Posts

The suggestion you have from bakunin does try to avoid overwriting. If the matching file name is found, it tries to move it to one suffixed by the current process id using $$ (which is likely to be unique every time) However if these files are being referenced somehow, how will you update the index that points to them?

Might I suggest something more like this would be appropriate:-

Code:

find /path/to/sourcedir -type f -name "*${FILEMASK}*" -mtime +90

This will list off all the files that are over 90 days old. You can then adjust it to be:-

Code:

find /path/to/sourcedir -type f -name "*${FILEMASK}*" -mtime +90 -exec echo rm {} \+

..... to remove the files. If it's a dropbox, tell them to get it saved somewhere sensible within 2 months and then delete anything over 3 months (to allow a bit of grace)

That would make tidying up the reference file/table/index a little easier if that has a timestamp built in to the record.

It might be rather harsh, but the alternative is that you will fill your disk and have nowhere to go with an ever growing problem. Just my opinion though.

Robin

rbatte1

View Public Profile for rbatte1

Visit rbatte1's homepage!

Find all posts by rbatte1

08-06-2014

Registered User

6,384, 2,214

Join Date: May 2005

Last Activity: 28 October 2019, 4:59 PM EDT

Location: In the leftmost byte of /dev/kmem

Posts: 6,384

Thanks Given: 143

Thanked 2,214 Times in 1,548 Posts

Quote:

Originally Posted by Clyde Lovett

If this is any help: you have my pity. ;-))

Quote:

Originally Posted by Clyde Lovett

So a couple of things
1. I don't mind waiting for it to run - I can even run something overnight if it's going to take a very long time.

OK, but in this case error reporting is a MUST. Suppose there are some 1000 files to move. As the script works on the 457th of them something goes wrong. How would you find out? And how would you correct that? Or, another scenario, the 457th of them and any following fails (because, say, the disk being full). How are you going to correct that?

As a general advice: it doesn't matter that an automated procedure fails from time to time, but it should do so with a traceable, intelligible error message. Compare could not move /source/fileA to /target/B because disk is full. Aborting... to error in line 153. Exit. and ask yourself which supports prospective error correction attempts better.

If you do not want to program complex reporting features into your program you might want to sit and watch it run so that you can react immediately.

Quote:

Originally Posted by Clyde Lovett

2. As the script finds and moves files, it would be great if there is no over-write of files with duplicate filenames, but rather if a duplicate filename is found that it just append the duplicate filename with something like copy, copy1, copy2, etc.

My sketch of a script attempts that (see commented version below). I was under the impression that every filename can only be there once per run of the script, therefore only provisions for one additional copy per run are in place. It should be trivial to add additional code to cover for that.

Quote:

Originally Posted by Clyde Lovett

a. what identifies the file that is the list of my "SKUs" (the search criteria)?

My script expects a file "/path/to/list.of.filemasks" (see last line) to contain search criteria, one per line. An asterisk is prepended and appended to every criteria to create a wildcard expression A possible content would look like:

Code:

foo.jpg
bar.gif
baz

which would process all filea "*foo.jpg*", then "*bar.gif*", then "*baz*", etc..

Quote:

Originally Posted by Clyde Lovett

b. is my file with the list of SKUs a text file, one SKU per line? (probably yes)

As said above, yes.

Quote:

Originally Posted by Clyde Lovett

c. in what way to I save this code and run it - I gather this is a script - what do I do with it (now I'm really showing my "dummy" status on this, but I learn fast so hang in there with me!).

First, you copy it and save it as a simple text file. The name and extension does not matter, take whatever you like. I suggest you explicitly state that a certain shell is to execute it, therefore add such a line as the first line:

Code:

#! /path/to/some/shell

If you do not know which shell to use: issue "echo $SHELL" at the command line and take its output. Here is an example of one of my systems, yours might look different:

Code:

$ echo $SHELL
/usr/bin/ksh

$ cat template.ksh 
#! /usr/bin/ksh
# ----------------------------------------------------------------------
# template.ksh                               template for ksh scripts/functions
# ----------------------------------------------------------------------
...

Notice that everything after "#" is treated as a comment, but the first line (also called "shebang") has to be exactly as it is. I.e. inserting a space before "#!" would make it stop to work.

Now you need to make this file executable: execute

Code:

chmod 754 /your/filename

This sets read, write and execute rights for you (7), read and execute for members of your group (5) and read only for all other users (4). After this you can execute the file. Notice, though, that the current directory is NOT automatically in the path, unlike in Windoze. To execute a file in your current directory issue "./filename", not "filename".

Here is a commented version of my script, i have put echo-statements in place of the processing parts, so that you can try it out and see the inner workings:

Code:

while read FILEMASK ; do
     echo $FILEMASK
done < /path/to/list.of.filemasks

Read the file /path/to/list.of.filemasks and put each lines content into variable FILEMASK.

Code:

while read FILEMASK ; do
     find /path/to/sourcedir -type f -name "*${FILEMASK}*" |\
          while read MOVEFILE ; do
               echo $MOVEFILE
          done
done < /path/to/list.of.filemasks

"find" searches a complete directory hierarchy and produces a list of filenames. As you see it filters for "*FILEMASK*". This is where the content of your list of filemasks comes into play. Every filename found this way is fed to another while-loop and read into a variable "MOVEFILE". The content of this might be "/path/to/sourcedir/sub1/foo.FILEMASK.bar". If the wildcard delivers false positives then tinker withe the argument to "-name". Instead of "*${FILEMASK}*" you might want to try "*${FILEMASK}" (this will find "foo.FILEMASK" but not "FILEMASK.bar"), etc..

Code:

while read FILEMASK ; do
     find /path/to/sourcedir -type f -name "*${FILEMASK}*" |\
          while read MOVEFILE ; do
               echo "BEFORE: $MOVEFILE"
               FNAME="${MOVEFILE##*/}"
               echo "AFTER: $FNAME"
          done
done < /path/to/list.of.filemasks

This part just strips all the path information from the filename and assigns the stripped part to a variable "FNAME", like this:

MOVEFILE: "/path/to/sourcedir/sub1/some.FILEMASK.bla"
FNAME: "some.FILEMASK.bla"

Code:

               if [ -f "/path/to/targetdir/$FNAME" ] ; then
                    mv "$MOVEFILE" "/path/to/targetdir/${FNAME}.$$"
               else
                    mv "$MOVEFILE" "/path/to/targetdir"
               fi

This innermost part checks if the filename already exists at the prospective target place. If yes, the file is moved to a name with the current process-number ("$$") appended, else (if no such target exists), the original name is used.

To cover for multiple instances replace this part with the following:

Code:

               if [ -f "/path/to/targetdir/$FNAME" ] ; then
                    (( IDX = 1 ))
                    while [ -f "/path/to/targetdir/${FNAME}.${IDX}" ] ; do
                         (( IDX += 1 ))
                    done
                    mv "$MOVEFILE" "/path/to/targetdir/${FNAME}.${IDX}"
               else
                    mv "$MOVEFILE" "/path/to/targetdir"
               fi

If a file doesn't exist it is simply copied (the else-part). If such a file already exists, a counter is initialized with "1" and incremented each time such a file was found. The "while [ -f ...]" tests for "file.1", then "file.2", etc., until it finds a name that is not taken already. This is then used to move the file.

I hope this helps.

bakunin

Last edited by bakunin; 08-06-2014 at 01:25 PM..

bakunin

View Public Profile for bakunin

Find all posts by bakunin

UNIX for Dummies Questions & Answers

Find a list of files in directory, move to new, allow duplicates

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to move gz files from one source directory to destination directory?

Discussion started by: venkat918

2. Shell Programming and Scripting

List files with date, create directory, move to the created directory

Discussion started by: charli1

3. Shell Programming and Scripting

Copying files from one directory to another, renaming duplicates.

Discussion started by: Gilljambo

4. Shell Programming and Scripting

Please help list/find files greater 1G move to different directory

Discussion started by: dotran

5. Shell Programming and Scripting

Move files in a list to another directory

Discussion started by: kg6iia

6. Shell Programming and Scripting

find list of files from a list and copy to a directory

Discussion started by: manishabh

7. UNIX for Dummies Questions & Answers

Find files and display only directory list containing those files

Discussion started by: r7p

8. UNIX for Dummies Questions & Answers

Move all files in a directory tree to a signal directory?

Discussion started by: briandanielz

9. Shell Programming and Scripting

Find duplicates from multuple files with 2 diff types of files

Discussion started by: ricky007