Find a list of files in directory, move to new, allow duplicates


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Find a list of files in directory, move to new, allow duplicates
# 1  
Old 08-04-2014
Find a list of files in directory, move to new, allow duplicates

Greetings. I know enough Unix to be dangerous (!) and know that there is a clever way to do the following and it will save me about a day of agony (this time) and I will use it forever after! (many days of agony saved in the future)!

Basically
I need to find any image files (JPGs, PSDs etc) that contain any one of about 300 different product SKUs in the filename that are within a directory and it's subdirectories and then move them out and into one new directory - on my desktop for example and keep (not over-write) duplicates.


Spelled out in details:

find
I need to search a directory and all it's subdirectories for any filenames that contain a certain set of characters (eg. a filename CP37-BL which might be CP37-BL.jpg or CP37-BL+140806.jpg or there might be several CP37-BL.jpg's in multiple different subdirectories). I have 300 DIFFERENT finds to perform and would love to be able to cut and paste a space or comma separated list or draw from a .txt or .csv file all the different filename portions
eg.
Code:
CP36-BL
CP36-BR
CP36-GR
CP36-OR
CP36-PK

-- any instances that contains any part of that in the filename. The above is 5 of the 300. I would like to run the command 1x not 300x Smilie

mv
So, I need to find all those different files and MOVE to a particular target directory - AND allow duplicates (e.g. if there are 2 CP37-BL.jpg files then I want to keep both (appending a _v1 or _copy1 or something to the end of the filename in the case of duplicates.

That's it!
Any thoughts? And THANK YOU in advance! Smilie


PS. I am running this in Terminal on Apple OSX 10.9.4

Last edited by jim mcnamara; 08-05-2014 at 12:03 AM..
# 2  
Old 08-05-2014
Your duplicates requirement is not met in this code - I did not quite get it. Sorry.
Code:
find /path/to/files -type f  | grep -Ff text_file_with_file_names > outputfile

while read filename
do
  mv $filename   /path/to/somewhere/
done <outputfile

That said - DO NOT agglomerate zillions of user data files adhoc in one file tree.
It is completely possible to have parked those files into meaningful directory names - i.e., the text you are using as a key. Then you can simply look for a directory and go from there. Pre-planning beats kludge like this every time.

PS: because you want to run this ONE time there is a huge performance penalty:
grep -Ff filename has to scan the entire 300 lines of filename for every file it finds. And since you seem to have large numbers of files be prepared to wait.

Stuff like this should probably be written in C. dirent.h is your friend.
# 3  
Old 08-05-2014
Quote:
Originally Posted by jim mcnamara
Code:
find /path/to/files -type f  | grep -Ff text_file_with_file_names > outputfile

while read filename
do
  mv $filename   /path/to/somewhere/
done <outputfile

That said - DO NOT agglomerate zillions of user data files adhoc in one file tree.
It is completely possible to have parked those files into meaningful directory names - i.e., the text you are using as a key. Then you can simply look for a directory and go from there. Pre-planning beats kludge like this every time.
Concur! Especially for the last Sentence!

Quote:
Originally Posted by jim mcnamara
PS: because you want to run this ONE time there is a huge performance penalty:
grep -Ff filename has to scan the entire 300 lines of filename for every file it finds. And since you seem to have large numbers of files be prepared to wait.
True. Here is my take at it, based on yours. The rationale is that after the first scan the inode list is cached in memory and the next finds will go a lot faster:

Code:
while read FILEMASK ; do
     find /path/to/sourcedir -type f -name "*${FILEMASK}*" |\
          while read MOVEFILE ; do
               FNAME="${MOVEFILE##*/}"
               if [ -f "/path/to/targetdir/$FNAME" ] ; then
                    mv "$MOVEFILE" "/path/to/targetdir/${FNAME}.$$"
               else
                    mv "$MOVEFILE" "/path/to/targetdir"
               fi
          done
done < /path/to/list.of.filemasks

Note that no security provisions are put into place: checks for exhausted diskspace, successful move operation, etc.. are all missing and you should add them before putting this sketch into a fire-and-forget script.

I hope this helps.

bakunin
# 4  
Old 08-05-2014
You guys are great - thank you!

I hear you both re: the pre-planning, but actually I'm like the janitor here - housekeeping some shared directories where product photo is put into Dropbox by multiple users / employees etc. Images are placed in folders based on Vendor and sale name and then shared via the cloud. So these images could be buckshot all over the place & duplicated x-number of times as they might be needed in one folder for one particular sale and then some duplication to cover another sale with a different vendor and so forth. We have thousands of product SKUs and on this occasion these 300 are now sold out and we will not restock them so the task is to remove any photos that have the SKU number in the file name ... hence my conundrum and visit to this forum - which I am glad to have discovered!

I spend my day as the staff photographer but because I have some aptitude for computers I get the IT hat thrown at me on a regular basis. Hopefully this sheds some light on things Smilie

So a couple of things
1. I don't mind waiting for it to run - I can even run something overnight if it's going to take a very long time.

2. As the script finds and moves files, it would be great if there is no over-write of files with duplicate filenames, but rather if a duplicate filename is found that it just append the duplicate filename with something like copy, copy1, copy2, etc.
and ...

3. I understand this is a script and some of the things that I need to replace here - like path/to/targetdir ... but a few questions
a. what identifies the file that is the list of my "SKUs" (the search criteria)?
b. is my file with the list of SKUs a text file, one SKU per line? (probably yes)
c. in what way to I save this code and run it - I gather this is a script - what do I do with it (now I'm really showing my "dummy" status on this, but I learn fast so hang in there with me!).

Thank you again - I appreciate expanding my knowledge in this area and thank you all for your time.

Clyde
# 5  
Old 08-06-2014
The suggestion you have from bakunin does try to avoid overwriting. If the matching file name is found, it tries to move it to one suffixed by the current process id using $$ (which is likely to be unique every time) However if these files are being referenced somehow, how will you update the index that points to them?

Might I suggest something more like this would be appropriate:-
Code:
find /path/to/sourcedir -type f -name "*${FILEMASK}*" -mtime +90

This will list off all the files that are over 90 days old. You can then adjust it to be:-
Code:
find /path/to/sourcedir -type f -name "*${FILEMASK}*" -mtime +90 -exec echo rm {} \+

..... to remove the files. If it's a dropbox, tell them to get it saved somewhere sensible within 2 months and then delete anything over 3 months (to allow a bit of grace)

That would make tidying up the reference file/table/index a little easier if that has a timestamp built in to the record.

It might be rather harsh, but the alternative is that you will fill your disk and have nowhere to go with an ever growing problem. Just my opinion though.



Robin
# 6  
Old 08-06-2014
Quote:
Originally Posted by Clyde Lovett
You guys are great - thank you!

I hear you both re: the pre-planning, but actually I'm like the janitor here - housekeeping some shared directories where product photo is put into Dropbox by multiple users / employees etc. Images are placed in folders based on Vendor and sale name and then shared via the cloud. So these images could be buckshot all over the place & duplicated x-number of times as they might be needed in one folder for one particular sale and then some duplication to cover another sale with a different vendor and so forth. We have thousands of product SKUs and on this occasion these 300 are now sold out and we will not restock them so the task is to remove any photos that have the SKU number in the file name ... hence my conundrum and visit to this forum - which I am glad to have discovered!

I spend my day as the staff photographer but because I have some aptitude for computers I get the IT hat thrown at me on a regular basis. Hopefully this sheds some light on things Smilie
If this is any help: you have my pity. ;-))


Quote:
Originally Posted by Clyde Lovett
So a couple of things
1. I don't mind waiting for it to run - I can even run something overnight if it's going to take a very long time.
OK, but in this case error reporting is a MUST. Suppose there are some 1000 files to move. As the script works on the 457th of them something goes wrong. How would you find out? And how would you correct that? Or, another scenario, the 457th of them and any following fails (because, say, the disk being full). How are you going to correct that?

As a general advice: it doesn't matter that an automated procedure fails from time to time, but it should do so with a traceable, intelligible error message. Compare could not move /source/fileA to /target/B because disk is full. Aborting... to error in line 153. Exit. and ask yourself which supports prospective error correction attempts better.

If you do not want to program complex reporting features into your program you might want to sit and watch it run so that you can react immediately.

Quote:
Originally Posted by Clyde Lovett
2. As the script finds and moves files, it would be great if there is no over-write of files with duplicate filenames, but rather if a duplicate filename is found that it just append the duplicate filename with something like copy, copy1, copy2, etc.
My sketch of a script attempts that (see commented version below). I was under the impression that every filename can only be there once per run of the script, therefore only provisions for one additional copy per run are in place. It should be trivial to add additional code to cover for that.


Quote:
Originally Posted by Clyde Lovett
a. what identifies the file that is the list of my "SKUs" (the search criteria)?
My script expects a file "/path/to/list.of.filemasks" (see last line) to contain search criteria, one per line. An asterisk is prepended and appended to every criteria to create a wildcard expression A possible content would look like:

Code:
foo.jpg
bar.gif
baz

which would process all filea "*foo.jpg*", then "*bar.gif*", then "*baz*", etc..

Quote:
Originally Posted by Clyde Lovett
b. is my file with the list of SKUs a text file, one SKU per line? (probably yes)
As said above, yes.

Quote:
Originally Posted by Clyde Lovett
c. in what way to I save this code and run it - I gather this is a script - what do I do with it (now I'm really showing my "dummy" status on this, but I learn fast so hang in there with me!).
First, you copy it and save it as a simple text file. The name and extension does not matter, take whatever you like. I suggest you explicitly state that a certain shell is to execute it, therefore add such a line as the first line:

Code:
#! /path/to/some/shell

If you do not know which shell to use: issue "echo $SHELL" at the command line and take its output. Here is an example of one of my systems, yours might look different:

Code:
$ echo $SHELL
/usr/bin/ksh

$ cat template.ksh 
#! /usr/bin/ksh
# ----------------------------------------------------------------------
# template.ksh                               template for ksh scripts/functions
# ----------------------------------------------------------------------
...

Notice that everything after "#" is treated as a comment, but the first line (also called "shebang") has to be exactly as it is. I.e. inserting a space before "#!" would make it stop to work.

Now you need to make this file executable: execute

Code:
chmod 754 /your/filename

This sets read, write and execute rights for you (7), read and execute for members of your group (5) and read only for all other users (4). After this you can execute the file. Notice, though, that the current directory is NOT automatically in the path, unlike in Windoze. To execute a file in your current directory issue "./filename", not "filename".

Here is a commented version of my script, i have put echo-statements in place of the processing parts, so that you can try it out and see the inner workings:

Code:
while read FILEMASK ; do
     echo $FILEMASK
done < /path/to/list.of.filemasks

Read the file /path/to/list.of.filemasks and put each lines content into variable FILEMASK.


Code:
while read FILEMASK ; do
     find /path/to/sourcedir -type f -name "*${FILEMASK}*" |\
          while read MOVEFILE ; do
               echo $MOVEFILE
          done
done < /path/to/list.of.filemasks

"find" searches a complete directory hierarchy and produces a list of filenames. As you see it filters for "*FILEMASK*". This is where the content of your list of filemasks comes into play. Every filename found this way is fed to another while-loop and read into a variable "MOVEFILE". The content of this might be "/path/to/sourcedir/sub1/foo.FILEMASK.bar". If the wildcard delivers false positives then tinker withe the argument to "-name". Instead of "*${FILEMASK}*" you might want to try "*${FILEMASK}" (this will find "foo.FILEMASK" but not "FILEMASK.bar"), etc..

Code:
while read FILEMASK ; do
     find /path/to/sourcedir -type f -name "*${FILEMASK}*" |\
          while read MOVEFILE ; do
               echo "BEFORE: $MOVEFILE"
               FNAME="${MOVEFILE##*/}"
               echo "AFTER: $FNAME"
          done
done < /path/to/list.of.filemasks

This part just strips all the path information from the filename and assigns the stripped part to a variable "FNAME", like this:

MOVEFILE: "/path/to/sourcedir/sub1/some.FILEMASK.bla"
FNAME: "some.FILEMASK.bla"

Code:
               if [ -f "/path/to/targetdir/$FNAME" ] ; then
                    mv "$MOVEFILE" "/path/to/targetdir/${FNAME}.$$"
               else
                    mv "$MOVEFILE" "/path/to/targetdir"
               fi

This innermost part checks if the filename already exists at the prospective target place. If yes, the file is moved to a name with the current process-number ("$$") appended, else (if no such target exists), the original name is used.

To cover for multiple instances replace this part with the following:

Code:
               if [ -f "/path/to/targetdir/$FNAME" ] ; then
                    (( IDX = 1 ))
                    while [ -f "/path/to/targetdir/${FNAME}.${IDX}" ] ; do
                         (( IDX += 1 ))
                    done
                    mv "$MOVEFILE" "/path/to/targetdir/${FNAME}.${IDX}"
               else
                    mv "$MOVEFILE" "/path/to/targetdir"
               fi

If a file doesn't exist it is simply copied (the else-part). If such a file already exists, a counter is initialized with "1" and incremented each time such a file was found. The "while [ -f ...]" tests for "file.1", then "file.2", etc., until it finds a name that is not taken already. This is then used to move the file.

I hope this helps.

bakunin

Last edited by bakunin; 08-06-2014 at 01:25 PM..
 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to move gz files from one source directory to destination directory?

Hi All, Daily i am doing the house keeping in one of my server and manually moving the files which were older than 90 days and moving to destination folder. using the find command . Could you please assist me how to put the automation using the shell script . ... (11 Replies)
Discussion started by: venkat918
11 Replies

2. Shell Programming and Scripting

List files with date, create directory, move to the created directory

Hi all, i have a folder, with tons of files containing as following, on /my/folder/jobs/ some_name_2016-01-17-22-38-58_some name_0_0.zip.done some_name_2016-01-17-22-40-30_some name_0_0.zip.done some_name_2016-01-17-22-48-50_some name_0_0.zip.done and these can be lots of similar files,... (6 Replies)
Discussion started by: charli1
6 Replies

3. Shell Programming and Scripting

Copying files from one directory to another, renaming duplicates.

Below is the script i have but i would like simplified but still do the same job. I need a script to copy files not directories or sub-directories into a existing or new directory. The files, if have the same name but different extension; for example 01.doc 01.pdf then only copy the .doc file. ... (1 Reply)
Discussion started by: Gilljambo
1 Replies

4. Shell Programming and Scripting

Please help list/find files greater 1G move to different directory

I have have 6 empty directory below. I would like write bash scipt if any files less "1000000000" bytes then move to "/export/home/mytmp/final" folder first and any files greater than "1000000000" bytes then move to final1, final2, final3, final4, final4, final5 and that depend see how many files,... (6 Replies)
Discussion started by: dotran
6 Replies

5. Shell Programming and Scripting

Move files in a list to another directory

I have a number of files in a directory that can be grouped with something like "ls | grep SH2". I would like to move each file in this list to another directory. Thanks (4 Replies)
Discussion started by: kg6iia
4 Replies

6. Shell Programming and Scripting

find list of files from a list and copy to a directory

I will be very grateful if someone can help me with bash shell script that does the following: I have a list of filenames: A01_155716 A05_155780 A07_155812 A09_155844 A11_155876 that are kept in different sub directories within my current directory. I want to find these files and copy... (3 Replies)
Discussion started by: manishabh
3 Replies

7. UNIX for Dummies Questions & Answers

Find files and display only directory list containing those files

I have a directory (and many sub dirs beneath) on AIX system, containing thousands of file. I'm looking to get a list of all directory containing "*.pdf" file. I know basic syntax of find command, but it gives me list of all pdf files, which numbers in thousands. All I need to know is, which... (4 Replies)
Discussion started by: r7p
4 Replies

8. UNIX for Dummies Questions & Answers

Move all files in a directory tree to a signal directory?

Is this possible? Let me know If I need specify further on what I am trying to do- I just want to spare you the boring details of my personal file management. Thanks in advance- Brian- (2 Replies)
Discussion started by: briandanielz
2 Replies

9. Shell Programming and Scripting

Find duplicates from multuple files with 2 diff types of files

I need to compare 2 diff type of files and find out the duplicate after comparing each types of files: Type 1 file name is like: file1.abc (the extension abc could any 3 characters but I can narrow it down or hardcode for 10/15 combinations). The other file is file1.bcd01abc (the extension... (2 Replies)
Discussion started by: ricky007
2 Replies
Login or Register to Ask a Question