find -size -7M finds files, but won't cp them all


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers find -size -7M finds files, but won't cp them all
# 8  
Old 12-29-2010
Quote:
Originally Posted by unclecameron
Code:
find /somefolder -type f -size -7M -exec /bin/cp -v {} /someotherfolder/ \;

have you tried with xargs ?

Code:
find /somefolder -type f -size -7M | xargs -iSourceFile cp -v SourceFile /path/to/some_other_folder

# 9  
Old 12-29-2010
One way to test the theory that there are duplicate filenames.

Search for say the first 5 duplicate filenames then re-search the folder tree to find out where they are.

Code:
find /somefolder -type f -size -7M -exec basename {} \; | sort | \
uniq -d | head -5 | while read filename
do
        find /somefolder -type f -name "${filename}" -exec ls -lad {} \;
done

# 10  
Old 01-03-2011
zedex -> that returned 173 files, so not sure exactly what happened

methyl -> that returned only 8 duplicate file names. I searched for duplicatfile.that.it.found.csv and indeed it only found 2 examples, which is what your script thought there would be.

Now I'm really baffled
# 11  
Old 01-03-2011
@unclecameron
My script limited the number of times we searched for duplicates (or it would have run for hours and hours).

This seems to be a design error. You are copying files from multiple directory trees and some of those filenames are not unique. This does not mean that the data in the duplicate filenames is identical (the actual data content is a matter for local knowledge) and there is a real danger of arbitarily overwriting one file with another file which happens to have the same name.

We seem to have reached stalemate. The "rsync" option in an earlier post seems sensible, but you do not wish to preserve the directory tree for some reason unknown.
I don't recall a post where you had a reason for this exercise.

In similar circumstances I have created a prefix for every copied filename composed from the source directory name. Taking into account good local knowledge of the filenames this is not impossible. You could for example substitute "__" for every "/" in a full hirearchial filename to create a unique filename in your copy directory.
# 12  
Old 01-03-2011
The reason is I am having trouble with a process I need to run which iterates through around 88K files, running a process on each of them, and speed is the key, and it turns out in testing I need to run an optimum 6 threads at the same time on a split up sort of the files by size, and also that if the directory structure is preserved within those sorts it adds around 30% to the processing time. So I start the laborious process of cp'ing them as files with no path into /somedir and then running md5sum's against them to correlate to their original file/path. It's really a pain, but I can't modify the binary app I'm running against them, hence the pain, and I need to overall processing time to be under a specific time, so we've added mem, cpu's, servers, more RAID volumes, and mounted /tmp on ssd's to make it happen, all of which helped. This is a dynamic set of files which changes so we have to re-sync/re-sort every few days.
# 13  
Old 01-04-2011
An alternative approach.
A sort by file size followed by a deal out to 6 lists should take about 10-15 mins as long as we use "du" not "ls".
I don't have your Operating System so do check the commands.

Code:
counter=0
du -ak /somefolder|sort -n -r|while read size filename
do
        counter=`expr ${counter} + 1`
        if [ ${counter} -gt 6 ]
        then
                counter=1
        fi
        if [ ! -f "${filename}" ]
        then
                continue
        fi
        echo "${filename}" >> "/tmp/selection_${counter}"
done
#
wc -l /tmp/selection_*

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Finds all duplicate files

Hi, How would you write bash script that given a directory as an argument and finds all duplicate files (with same contents - by using bytewise comparison) there and prints their names? (6 Replies)
Discussion started by: elior
6 Replies

2. Solaris

Find the total size of multiple files

If I have a number of files in a directory, for example, test.1 test.2 test.3 abc.1 abc.2 abc.3 and I need to find the total file size of all of the test.* files, I can use du -bc test.* in Linux. However, in Solaris, du does not have the -c option. What can I do in Solaris to get... (11 Replies)
Discussion started by: learnix
11 Replies

3. Shell Programming and Scripting

Find duplicate files by file size

Hi! I want to find duplicate files (criteria: file size) in my download folder. I try it like this: find /Users/frodo/Downloads \! -type d -exec du {} \; | sort > /Users/frodo/Desktop/duplicates_1.txt; cut -f 1 /Users/frodo/Desktop/duplicates_1.txt | uniq -d | grep -hif -... (9 Replies)
Discussion started by: Dirk Einecke
9 Replies

4. Shell Programming and Scripting

Find file size difference in two files using awk

Hi, Could anyone help me to solve this problem? I have two files "f1" and "f2" having 2 fields in each, a) file size and b) file name. The data are almost same in both the files except for few and new additional lines. Now, I have to find out and print the output as, the difference in the... (3 Replies)
Discussion started by: royalibrahim
3 Replies

5. HP-UX

How can I find the size of files added to a folder after a particular date

Hi, I want to find the size of the files added to a folder after a certain date(say 1st of october), I know we can list the files which were created after a certain date , but is there anyway to find the total size of those files ? (3 Replies)
Discussion started by: alookachaloo
3 Replies

6. Shell Programming and Scripting

How to find size 0-4 bytes files?

Hi I need to find and delete 0-4 bytes size files in a folder. How can I achieve that? (1 Reply)
Discussion started by: kapilk
1 Replies

7. UNIX for Dummies Questions & Answers

very urgent..need of a script which finds a file without the use of find command..hlp

im a beginner in shell scripting and i need a script which will find a file in a given path without the use of find or grep command.......i need some kind of code.....plzzz plzzzz help me......ive tried n searched every where but i couldn't find the solution for my particular problem..... (4 Replies)
Discussion started by: mishi
4 Replies

8. UNIX for Dummies Questions & Answers

find command returns files with spaces, mv won't work...

Hi guys. I am trying, to move files found with the find command... Script runs fine, until it reaches a file that contains spaces... Here is what i wrote up quickly. ROOTDIR=/apps/data SEARCH=$(find /data/HDTMPRestore/home/tmq/ -type f -print | grep Mods/Input/bck | cut -c19-) for i... (1 Reply)
Discussion started by: Stephan
1 Replies

9. UNIX for Dummies Questions & Answers

Find total size for some files?

Hi, I'm newbie to Unix. I'd like to count the total size of those files in my directory by date. For example, files on this period 05/01/08 - 05/31/08. If possible can we count by byte instead of kb. if I use $ du - ks , it will add up all files in the dir. thanks, Helen (5 Replies)
Discussion started by: helen008
5 Replies

10. UNIX for Dummies Questions & Answers

How can I find files by date or size from stout?

Hello all I wander if I make for example " ls -l " And it gives me all the files in the directory with the additional info like data size and privileges But what if I like to filter the stout result for example by date When I try to do: echo "`ls -l`" | grep "Jan 12" it gives me the... (2 Replies)
Discussion started by: umen
2 Replies
Login or Register to Ask a Question