Weird 'find' results

 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Weird 'find' results
# 8  
Old 02-16-2017
Centos 7 3.10.0-327.el7.x86_64. I've got multiple instances running on both VMware & VirtualBox. I've just tested it on a different Centos 7 guest and I get similar results when I use the exact "-size" option and it returns files that don't qualify with the criteria I specified. P.S. thanks for the assistance!

---------- Post updated at 09:54 PM ---------- Previous update was at 07:37 PM ----------

Quote:
Originally Posted by Don Cragun
Hi bodisha,
Guessing that the find that you're using behaves differently than the macOS/BSD find utility I'm using and that you really do only want to select files that are exactly of size 1G bytes, try:
Code:
find /home -type f -user database -size 1073741824c -ls

If you're looking for files that are at least 1G bytes, try:
Code:
find /home -type f -user database -size +1073741823c -ls



Here's a screenshot of the problem. It's the same version of Centos 7 but on a different laptop. As you can see when I use the "-size 1M" option I get more files that expected. When I use the "-size 1000k" I get the results expected.



Image
# 9  
Old 02-17-2017
I have no idea what is going on with the Centos (presumably GNU) find utility. With BSD and macOS find, -size 1024k and -size 1M should produce identical results.
# 10  
Old 02-17-2017
@Don Cragun: thanks for the explanation.

@bodisha: your posted screen shot suggests that the file is indeed 1000k (which is NOT 1MB, 1024k=1M) in size, which might add to the confusion.

After Don so concisely refuted my explanation attempt i am at a loss myself for what is going on.

I hope this helps.

bakunin
# 11  
Old 02-17-2017
Quote:
Originally Posted by bodisha
...
Here's a screenshot of the problem. It's the same version of Centos 7 but on a different laptop. As you can see when I use the "-size 1M" option I get more files that expected. When I use the "-size 1000k" I get the results expected.

Image
I am using Debian 8 "Jessie" that has GNU find 4.4.2
After running your commands on similarly sized files, here's my guess about what is happening.

When you say "-size nS", where "n" is an integer specifying "units of space" and "S" is the suffix (M, k etc.), then the find command searches for files that have a rounded up size of "nS".
That effectively means that the size of the file is > (n-1)S and <= nS.

So, as per this theory, "-size 1M" means files with "rounded up size of 1M", or sizes > 0M and <= 1M. 0M bytes = 0 bytes. Hence files with sizes 183, 2112 etc are displayed.

Code:
$ find . -type f -size 1M -ls
6029496 1000 -rw-r--r--   1 r2d2     r2d2      1024000 Feb 17 02:32 ./large1.log
6029492    4 -rw-r--r--   1 r2d2     r2d2          183 Feb 17 02:29 ./graphical.txt
6029493    4 -rw-r--r--   1 r2d2     r2d2         2112 Feb 17 02:29 ./strace.out
6029494    8 -rw-r--r--   1 r2d2     r2d2         4659 Feb 17 02:29 ./vmstat.out
6029495    8 -rw-r--r--   1 r2d2     r2d2         4660 Feb 17 02:30 ./vmstat1.out
$

If you say "-size 2M", then it would mean files with "rounded up size of 2M", or sizes > 1M and <= 2M. That will not display anything, since there is no file with size > 1M or 1048576 bytes and <= 2M or 2097152 bytes.

Code:
$ find . -type f -size 2M -ls
$

Similar case could be argued for size 3M.
Code:
$ find . -type f -size 3M -ls
$

Now, in case of "-size = 1000k", notice that k = 1024 bytes, so it searches for files with rounded up size of 1000k i.e. sizes > 999k or (999 * 1024 =) 1022976 bytes and <= 1000k or (1000 * 1024 =) 1024000 bytes.
That displays your one file.

Code:
$ find . -type f -size 1000k -ls
6029496 1000 -rw-r--r--   1 r2d2     r2d2      1024000 Feb 17 02:32 ./large1.log
$

To test this logic further, I created three files with sizes:
Code:
(1) 999k bytes - 1 byte = 1022975 bytes
(2) 999k bytes          = 1022976 bytes
(3) 999k bytes + 1 byte = 1022977 bytes

using the following commands:

Code:
perl -e 'foreach (1..1022){foreach (1..999){print chr(97+int(rand(26)))}; print "\n"}'  >large1_v1.log
perl -e 'foreach (1..1)   {foreach (1..974){print chr(97+int(rand(26)))}; print "\n"}' >>large1_v1.log

perl -e 'foreach (1..1022){foreach (1..999){print chr(97+int(rand(26)))}; print "\n"}'  >large1_v2.log
perl -e 'foreach (1..1)   {foreach (1..975){print chr(97+int(rand(26)))}; print "\n"}' >>large1_v2.log

perl -e 'foreach (1..1022){foreach (1..999){print chr(97+int(rand(26)))}; print "\n"}'  >large1_v3.log
perl -e 'foreach (1..1)   {foreach (1..976){print chr(97+int(rand(26)))}; print "\n"}' >>large1_v3.log

My pwd now looks like this:

Code:
$ ls -l
total 4024
-rw-r--r-- 1 r2d2 r2d2     183 Feb 17 02:29 graphical.txt
-rw-r--r-- 1 r2d2 r2d2 1024000 Feb 17 02:32 large1.log
-rw-r--r-- 1 r2d2 r2d2 1022975 Feb 17 02:55 large1_v1.log
-rw-r--r-- 1 r2d2 r2d2 1022976 Feb 17 02:55 large1_v2.log
-rw-r--r-- 1 r2d2 r2d2 1022977 Feb 17 02:56 large1_v3.log
-rw-r--r-- 1 r2d2 r2d2    2112 Feb 17 02:29 strace.out
-rw-r--r-- 1 r2d2 r2d2    4660 Feb 17 02:30 vmstat1.out
-rw-r--r-- 1 r2d2 r2d2    4659 Feb 17 02:29 vmstat.out
$

Now specifying "-size 1000k" should display files large1.log and large1_v3.log since they both have sizes > 1022976 (999k) and <= 1024000 (1000k)

Code:
$ find . -type f -size 1000k -ls
6029496 1000 -rw-r--r--   1 r2d2     r2d2      1024000 Feb 17 02:32 ./large1.log
6029500 1000 -rw-r--r--   1 r2d2     r2d2      1022977 Feb 17 02:56 ./large1_v3.log
$

And "-size 999k" should display files large1_v1.log and large1_v2.log since they both have sizes > 1021952 (998k) and <= 1022976 (999k)

Code:
$ find . -type f -size 999k -ls
6029497 1000 -rw-r--r--   1 r2d2     r2d2      1022975 Feb 17 02:55 ./large1_v1.log
6029498 1000 -rw-r--r--   1 r2d2     r2d2      1022976 Feb 17 02:55 ./large1_v2.log
$

##################
More tests follow:

Code:
$ 
$ # size 1k = sizes in the range (0k, 1k] or (0, 1024]
$ find . -type f -size 1k -ls
6029492    4 -rw-r--r--   1 r2d2     r2d2          183 Feb 17 02:29 ./graphical.txt
$ 
$ # size 2k = sizes in the range (1k, 2k] or (1024, 2048]
$ find . -type f -size 2k -ls
$ 
$ # size 3k = sizes in the range (2k, 3k] or (2048, 3072]
$ find . -type f -size 3k -ls
6029493    4 -rw-r--r--   1 r2d2     r2d2         2112 Feb 17 02:29 ./strace.out
$ 
$ # size 4k = sizes in the range (3k, 4k] or (3072, 4096]
$ find . -type f -size 4k -ls
$ 
$ # size 5k = sizes in the range (4k, 5k] or (4096, 5120]
$ find . -type f -size 5k -ls
6029494    8 -rw-r--r--   1 r2d2     r2d2         4659 Feb 17 02:29 ./vmstat.out
6029495    8 -rw-r--r--   1 r2d2     r2d2         4660 Feb 17 02:30 ./vmstat1.out
$ 
$

So essentially, if a file is using up:
(a) 10.3 blocks i.e. 10 blocks + a fraction of the next block, then its size is considered to be 11 blocks
(b) 4k blocks + a fraction of the next 1k block, then its size is considered to be 5k blocks
(c) 2M blocks + a fraction of the next 1M block, then its size is considered to be 3M blocks

Last edited by durden_tyler; 02-17-2017 at 05:09 AM..
These 3 Users Gave Thanks to durden_tyler For This Post:
# 12  
Old 02-17-2017
@durden_tyler: Thanks, this is exactly (admittedly not that detailed) what I found when testing with my find (GNU findutils) 4.7.0-git on linux, hence my highlighting of the "rounding up to unit size" in the man page citation (commented by Don Cragun in post#3).
I see that with other versions on other systems, the size test is handled differently. In FreeBSD, for instance, rounding is done only for 512 byte blocks, and 1k means exactly 1024 bytes, -1k includes 1023 bytes but excludes 1024, +1k shows 1025 but not 1024.
This User Gave Thanks to RudiC For This Post:
# 13  
Old 02-17-2017
Quote:
Originally Posted by RudiC
@durden_tyler: Thanks, this is exactly (admittedly not that detailed) what I found when testing with my find (GNU findutils) 4.7.0-git on linux
Now, this is funny - this is eactly what i thought to be the case, until Don said it can't be that way. The same reasoning led me to think that any file sized >0c is selected by -size 1G - because it is "rounded up" to the next full GB.

Now completely confused.

bakunin
This User Gave Thanks to bakunin For This Post:
# 14  
Old 02-18-2017
Quote:
Originally Posted by bakunin
Now, this is funny - this is eactly what i thought to be the case, until Don said it can't be that way. The same reasoning led me to think that any file sized >0c is selected by -size 1G - because it is "rounded up" to the next full GB.

Now completely confused.

bakunin
Hi Bakunin,
Don't be confused. What we see here is another case where GNU utilities and BSD utilities behave differently. (And, some UNIX systems don't offer the extension at all.) You get exactly the same behavior on BSD, Linux, and UNIX systems for:
Code:
find file... ... -size [+|-]number[c] ...

which are the -size primary argument formats required by the POSIX standards, but the behavior of:
Code:
find file... ... -size [+|-]number[k|M|G|T|F] ...

where one of the optional size multipliers is supplied is likely to give you a syntax error on some UNIX-branded systems, one of the two behaviors that we have discussed in this thread on Linux systems (and maybe on some UNIX-branded systems), and the other behavior on BSD-based systems and at least one UNIX-branded system.
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to find out the weird blank characters?

I have a text file downloaded from the web, I want to count the unique words used in the file, and a person's speaking length during conversation by counting the words between the opening and closing quotation marks which differ from the standard ASCII code. Also I found out the file contains some... (2 Replies)
Discussion started by: yifangt
2 Replies

2. UNIX for Dummies Questions & Answers

[Solved] weird in find -exec command

i feel weird with this 2 command find /tmp/*test* -user `whoami` -mtime +1 -type f -exec rm -f {}\; find /tmp/*test* -user `whoami` -mtime +1 -type f -exec ls -lrt {}\; the first one return correct which only delete those filename that consist *test* where second command it listed all the... (12 Replies)
Discussion started by: lsy
12 Replies

3. UNIX for Dummies Questions & Answers

How to do ls -l on results of grep and find?

Hi, Am running the command below to search for files that contains a certain string. grep -il "shutdown" `find . -type f -mtime -1 -print` | grep "^./scripts/active" How do I get it to do a ls -l on the list of files? I tried doing ls -l `grep -il "shutdown" `find . -type f -mtime -1... (5 Replies)
Discussion started by: newbie_01
5 Replies

4. UNIX for Dummies Questions & Answers

sort find results

Hi, I have a problem with a shell script. The script should find all .cpp and .h files and list them. With: for file in `find $src -name '*.h' -o -name '*.cpp' it gives out this: H:\FileList\A\E\F\G\newCppFile.cpp H:\FileList\header01.h H:\FileList\B\nextCppFile.cpp ... (4 Replies)
Discussion started by: shellBeginner75
4 Replies

5. Shell Programming and Scripting

Find files older than X with a weird file format

I have an issue with a korn shell script that I am writing. The script parses through a configuration file which lists a heap of path/directories for some files which need to be FTP'd. Now the script needs to check whether there are any files which have not been processed and are X minutes old. ... (2 Replies)
Discussion started by: MickAAA
2 Replies

6. Programming

Weird timing results in C

I was running some timings in my code to see which of several functions was the best and I've been getting some odd results. Here's the code I'm using: static double time_loop(int (*foo)(int)) { clock_t start, end; int n = 0, i = 0; start = clock(); for (; i <= MAXN; i++) if... (6 Replies)
Discussion started by: CRGreathouse
6 Replies

7. UNIX for Advanced & Expert Users

byte swapping 32-bit float and weird od results

I'm attempting to read a file that is composed of complex 32-bit floating point values on Solaris 10 that came from a 64-bit Red Hat computer. When I first tried reading the file, it looked like there was a byte-swapping problem and after running the od command on the file Solaris and Red Hat... (2 Replies)
Discussion started by: GoDonkeys
2 Replies

8. Shell Programming and Scripting

need to move find results

I am looking for files of a certian type and logging them. After they are logged they need to be moved to a different directory. HOw can i incorporate that in my current script? CSV_OUTFILE="somefile.csv" find . -name W\* -exec printf "%s,%s,OK" {} `date '+%Y%m%d%H%M%S'` \; > ${CSV_OUTFILE} ... (9 Replies)
Discussion started by: pimentelgg
9 Replies

9. UNIX for Dummies Questions & Answers

How to sort find results

Hi-- Ok. I have now found that: find -x -ls will do what I need as far as finding all files on a particular volume. Now I need to sort the results by the file's modification date/time. Is there a way to do that? Also, I notice that for many files, whereas the man for find says ls is... (8 Replies)
Discussion started by: groundlevel
8 Replies

10. UNIX for Dummies Questions & Answers

find results

Hi, how can I get only useful results from find / -size 10000000 without the "Permissions denied" files ? tks C (5 Replies)
Discussion started by: Carmen123
5 Replies
Login or Register to Ask a Question