One directory, extracting only files to a new directoy


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting One directory, extracting only files to a new directoy
# 1  
Old 09-05-2015
One directory, extracting only files to a new directoy

I need a hint for extracting and copying only the loads of .mp3 files from one directory and its subdirectories to get them all into only one directory. This way I am doing gives me a identical copy with hundreds of subdirectories. But I just don't get it to extract only the files. Though I tried cpio as well, and with less success. Any hints how to handle this task? Thanks in advance, really. BTW, I need this checksum-routine to avoid duplicates. That is the implicit second task.

Code:
find /home/directory/origin ./ -iname "*.mp3" -exec sha256sum {} \; | 
sort -k 64 > /tmp/runner.txt | cp -nr /home/directory/origin /home/directory/destination | cat -n /tmp/runner.txt | cut -d " " -f1; exit 0;


Last edited by 1in10; 09-05-2015 at 09:43 PM.. Reason: one liner too long
# 2  
Old 09-06-2015
Try this and see if it gives you an idea of how to continue.
Code:
find /home/directory/origin -type f -iname "*.mp3" -exec sha256sum {} \+ | sort -u -k1,1 | cut -c 67- | xargs -I{} echo "cp {} /home/directory/destination" > copying_list_test

A file will be created named copying_list_test. Take a look at it and see what it would have been done. If that's what you want. Then modify the command to run as:
Code:
find /home/directory/origin -type f -iname "*.mp3" -exec sha256sum {} \+ | sort -u -k1,1 | cut -c 67- | xargs -I{} cp {} /home/directory/destination


Last edited by Aia; 09-06-2015 at 01:30 AM..
This User Gave Thanks to Aia For This Post:
# 3  
Old 09-06-2015
okay, I'll do so, just woke up, returning with the result in a few hours, thanks a lot for your quick reply.

---------- Post updated at 06:56 AM ---------- Previous update was at 06:06 AM ----------

Code:
xargs: unmatched simple   quotes ; by default quotes are special to xargs unless you use the -0 option

whole petaflops on duty, waiting for a result, and here is something that occurs before, the stat-command was not involved, but it surges as an error on some file, btw. the same message appeared in different versions of my version of this oneliner.

Code:
cp: calling of  stat for „/home/directory/origin/transit/World Party - Goodbye Jumbo - 05 - Aint Gonna Come Till Im Ready.mp3“ not possible: file or directory not found.

As if there should be a call for stat% A, though I do not understand this. And the directory with the content of hundreds of subdirectories just contains one single tune. Something went wrong. And the directory with the content of hundreds of subdirectories just contains one single tune. Something went wrong. Reading this list, it just listing one after another, a single tune out of an album that contains up to twenty songs. I am afraid, this is not the result I expected.

---------- Post updated at 03:49 PM ---------- Previous update was at 06:56 AM ----------

I do not even dare to offer my third attempt to flatten the hierarchy, but in a way it works. So here comes one attempt without checksum and cat-function. Be aware, as user you should act from the origin-directory. It gave me some headache to to this as root.

Code:
find . -type f -iname "*.mp3" -exec cp -nr {} destination \;

bit better

Code:
mkdir /path/to/destination | find /home/path/origin -type f -iname "*.mp3" -exec cp -nr {} destination \;

Looking at the result there are two questions to be scrutinized. The original directory has got a size of 22.8 GB, shoveling the files around to flatten the hierarchy it results to 22.1 GB, why?
And btw, anyone could help me to still insert the checksum and cut the value off? Thanks in advance.

Last edited by 1in10; 09-06-2015 at 03:51 PM.. Reason: reading the edited list and understanding the result, better switch
# 4  
Old 09-06-2015
Quote:
Originally Posted by 1in10
okay, I'll do so, just woke up, returning with the result in a few hours, thanks a lot for your quick reply.

---------- Post updated at 06:56 AM ---------- Previous update was at 06:06 AM ----------

Code:
xargs: unmatched simple   quotes ; by default quotes are special to xargs unless you use the -0 option

whole petaflops on duty, waiting for a result, and here is something that occurs before, the stat-command was not involved, but it surges as an error on some file, btw. the same message appeared in different versions of my version of this oneliner.

Code:
cp: calling of  stat for „/home/directory/origin/transit/World Party - Goodbye Jumbo - 05 - Aint Gonna Come Till Im Ready.mp3“ not possible: file or directory not found.

As if there should be a call for stat% A, though I do not understand this. And the directory with the content of hundreds of subdirectories just contains one single tune. Something went wrong. And the directory with the content of hundreds of subdirectories just contains one single tune. Something went wrong. Reading this list, it just listing one after another, a single tune out of an album that contains up to twenty songs. I am afraid, this is not the result I expected.

---------- Post updated at 03:49 PM ---------- Previous update was at 06:56 AM ----------

I do not even dare to offer my third attempt to flatten the hierarchy, but in a way it works. So here comes one attempt without checksum and cat-function. Be aware, as user you should act from the origin-directory. It gave me some headache to to this as root.

Code:
find . -type f -iname "*.mp3" -exec cp -nr {} destination \;

bit better

Code:
mkdir /path/to/destination | find /home/path/origin -type f -iname "*.mp3" -exec cp -nr {} destination \;

Looking at the result there are two questions to be scrutinized. The original directory has got a size of 22.8 GB, shoveling the files around to flatten the hierarchy it results to 22.1 GB, why?
And btw, anyone could help me to still insert the checksum and cut the value off? Thanks in advance.
I'm having trouble understanding your code.
The mkdir utility doesn't write anything to standard output and the find utility and the cp utilities it invokes don't read anything from standard input. Why do you have a pipeline connecting those two commands?

And, in the 1st post in this thread you had the single pipeline:
Code:
find /home/directory/origin ./ -iname "*.mp3" -exec sha256sum {} \; | 
sort -k 64 > /tmp/runner.txt | cp -nr /home/directory/origin /home/directory/destination | cat -n /tmp/runner.txt | cut -d " " -f1; exit 0;

where sort (with the arguments presented) does not write anything to standard output, cp (with the arguments presented) does not read anything from standard input, cp does not write anything to standard output, and cat (with the arguments presented) does not read anything from standard input. So, in this pipeline, the find, sort, cat, and cut have absolutely no affect on how the cp command in that pipeline behaves.

I don't have a sha256sum utility on my system. So I can't see what the 64th field is in the output it produces, and I don't understand why sorting on the 64th field is important when it looks like all that happens is that you print the 1st field to the terminal after sorting the data into a file, reading the file back in, and piping it into cut. There is nothing here that makes any attempt to compare the checksum for any file to the checksum of any other file.

If you are trying to copy files from multiple source directories into a single target directory, why does a checksum on a source file or on a target file matter? Only one file with a given name can exist in your target directory. Why does a checksum matter in deciding whether or not you want to replace an existing file in your destination directory?
This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 09-06-2015
Well, sha256sum is present on this system, and it is about having for example one file (in thousand others) that is at a first glance identical, but listening to it, it ain't, tons of samples for example. Not only complete songs.
So I want to
-first create the new destination,
-then look for the mp3-files making the checksum,
-sorting them with the checksum into a given .txt file
-printing this to the display, after cutting off the first (the checksum) field,
-furthermore copy them to the new destination.
mkdir should only create the new destination.

And
Code:
find

could come along like
Code:
./ -iname

or like
Code:
find . -type f

I tried diff3, diff, fdupes, Fslint, compare, nothing worked properly. So I want to chain these -exec-functions.

Code:
\( -exec sha256sum {} \; -o -exec sort -k64 > /tmp/runner.txt {} \;  -o -exec cut -d " " -f1 {} \; -o -exec cp -nr /path/to/origin  /path/to/destination {}\; \)

okay, for top pros this is rubbish, but I'll keep on trying. Sorry to bother you.
# 6  
Old 09-06-2015
Quote:
Originally Posted by Don Cragun
If you are trying to copy files from multiple source directories into a single target directory, why does a checksum on a source file or on a target file matter? Only one file with a given name can exist in your target directory. Why does a checksum matter in deciding whether or not you want to replace an existing file in your destination directory?
I took it as to prevent copying the same files with different names.

Hi 1in10,

I am going to address just your initial follow up post after my suggestion in post #2, since you appear to be a moving target.

Quote:
xargs: unmatched simple quotes ; by default quotes are special to xargs unless you use the -0 option
This happens because xargs does not like to receive arguments in the input stream if they contain a single unmatched quote. Apparently, some of your songs have apostrophe in their file names.

Quote:
cp: calling of stat for „/home/directory/origin/transit/World Party - Goodbye Jumbo - 05 - Aint Gonna Come Till Im Ready.mp3“ not possible: file or directory not found.
The meaning of this error is that cp cannot find a song named "World Party - Goodbye Jumbo - 05 - Aint Gonna Come Till Im Ready.mp3" in the directory /home/directory/orgin/transit to copy.

Go manually there and check if that file exist as is reported by the error.
Pay attention if Aint has an apostrophe in the real file as Ain't . If it does, then after passing through all the piping the apostrophe is being removed.

To address the issue of the apostrophe, I am not quite sure if any of the following techniques might help, but you may try:

Code:
find /home/directory/origin -type f -iname "*.mp3" -exec sha256sum {} \+ |
 sort -u -k1,1 | cut -c 67- | tr \\n \\0 | xargs -0 -I{} cp {} /home/directory/destination

The command tr will switch any newline for a NULL and then I am telling xargs to separate the arguments by NULL, doing this in my version, it does ignore the apostrophe and it let it be alone. I am following the suggestion given by the original error:
Quote:
by default quotes are special to xargs unless you use the -0 option
or

Code:
find /home/directory/origin -type f -iname "*.mp3" -exec sha256sum {} \+ |
sort -u -k1,1 | cut -c 67- | sed "s/'/\\\'/g"  | xargs -I{} cp {} /home/directory/destination

In this version, sed is escaping any apostrophes; by the time xargs sees the stream they are protected by a `\' and treated correctly. That's the hope.

Quote:
[...]Something went wrong. And the directory with the content of hundreds of subdirectories just contains one single tune.[...]
This could happen if the whole process stopped prematurely, or if sha256sum tagged those files with the same hash, which sort -u would have removed from the stream passed to xargs.

You can troubleshoot that by just running part of the command up to sha256sum and checking manually what's being done there.
Code:
find /home/directory/origin -type f -iname "*.mp3" -exec sha256sum {} \+

Check the hash of some of those file that you consider missing.

Last edited by Aia; 09-06-2015 at 09:25 PM..
This User Gave Thanks to Aia For This Post:
# 7  
Old 09-06-2015
You're not bothering me! I just don't understand what you're trying to do.

Please show us a sample of the output produced by the command:
Code:
sha256sum file

with file replaced by the name of one of the files you are processing.

If you sort the output of sha256sum file on the field that contains the checksum it produces, then lines in the output with the same checksum would presumably be copies of each other.

But, if you have several files with the same name with different checksums, the checksum will not tell you anything at all about which file contains the longest recording nor about which file has the highest quality audio.

Please just explain how your code is supposed to determine which file you want to end up in your destination directory if more than one pathname in your source file hierarchy has the same final component.

The only thing in your current code that copies files into your destination directory is the simple command:
Code:
cp -nr /home/directory/origin /home/directory/destination

which will search the file hierarchy rooted in /home/directory/origin and copy every non-directory file it finds to /home/directory/destination that doesn't already exist in that directory. If there is more than one file in the source hierarchy with the same name, the first one encountered will be copied (if there isn't already a file in that directory with that name before the cp was invoked). The order in which cp searches the source hierarchy is unspecified.

Assuming that the destination directory exists at the time you invoke the above cp command, nothing in the commands you have shown us (except that cp command), has any affect on copying files nor on selecting which files will be copied to the destination directory.
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Extracting directory portion.

Dear Experts, I have some directory structure something like follows. I would like to cut portion of it. Would you please help me? I have to run this on several sql's. The directory path is dynamic. I have cut what comes after first "sql" string. Input:... (3 Replies)
Discussion started by: srikanth38
3 Replies

2. UNIX for Beginners Questions & Answers

Extracting to another directory using cygwin

I have a file, in Windows, called php_0.27.tar.gz and want to extract it to C:\Program Files\PHP\script through Cygwin using this command: tar -xf php_0.27.tar.gz -C /Program Files/PHP/script But I got an error saying: tar: Files: Not found in archive I tried this command too: ... (8 Replies)
Discussion started by: steve120
8 Replies

3. Shell Programming and Scripting

How can we automaitcally sync/copy files from one directoy to another ?

Hi, I would like to achieve below requirement, I have a directory "/mydir" and I want to automatically sync/copy all the content of /mydir directory to "/yourdir" directory all the time. meaning, if some application creates a file in /mydir, it supposed to be copied/available in "/yourdir"... (4 Replies)
Discussion started by: aaron8667
4 Replies

4. Shell Programming and Scripting

Directoy Size - avoid cannot read directory

Hello, I need to write a script to check directory size on a linux server. I do not have access to some directories Inside the directory tree so I've got some warning in the output that say : du : cannot read directory .... Could you please help me. I did try Inside of my script to... (2 Replies)
Discussion started by: Aswex
2 Replies

5. Shell Programming and Scripting

Extracting into a remote directory

I need to fit in a module in my Korn Shell script which would extract file_archive.tar.gz residing in the folder /apps/Test of my local machine into a folder /global/ in a remote machine server1. Please help me on this regard. Thanks Kumarjit. (2 Replies)
Discussion started by: kumarjt
2 Replies

6. Shell Programming and Scripting

Zipping a directory and extracting to another server.

Hello everyone, I am trying to make a script in KSH that will zip an entire directory but leave out one file in that directory. I then need to send that zipped directory to another UNIX box. I am new to UNIX and would appreciate a good template to study from. (3 Replies)
Discussion started by: BrutalBryan
3 Replies

7. Shell Programming and Scripting

Extracting Directory From Path

Hi guys. I'm doing some bash scripting and have run into a snag. Say I have the path: /home/one/two/three/ All I need is the 'three' while making a filename. Is there an easy way to do this? I've tried using grep (because I'm that smart.) cut (as I'm unable to tell how many fields there... (3 Replies)
Discussion started by: Drayol
3 Replies

8. UNIX for Advanced & Expert Users

Extracting the different files from directory & its sub directories

Hi Everyone, It would be helpful if someone helps me on this. Requirement: I have a directory which includes different types of files(for example *.java,*.class),but not restricted for only these types. I need to find the same types of file extensions from its directories and subdirectories... (3 Replies)
Discussion started by: rcvasu
3 Replies

9. Shell Programming and Scripting

extracting a field from directory path ??????

Hi I'll be getting a directory path as the input to the script. E.g. 1 abc/fsg/sdfhgsa/fasgfsd/adfghad/XXX/fhsad e.g. 2 sadfg/sadgjhgds/sd/dtuc/cghcx/dtyue/dfghsdd/XXX/qytq This input will be stored in a variable. My query is how to extract the field in a variable VAR which occurs... (15 Replies)
Discussion started by: skyineyes
15 Replies

10. UNIX for Dummies Questions & Answers

Searching a directoy

Hi, How to search a directory and know it's path. Please Help. Thanks (3 Replies)
Discussion started by: gumsun
3 Replies
Login or Register to Ask a Question