Sponsored Content
Special Forums Hardware Filesystems, Disks and Memory Performance Hit With Many Files Post 302293868 by cyner on Wednesday 4th of March 2009 03:33:45 AM
Old 03-04-2009
Windows XP Test

I haven't had the time to do a test on Linux yet, but I just finished a test on my Windows XP desktop machine (NTFS). I'm not sure how valuable this test is, but it's very interesting... Please give your thoughts on this.

Opening and closing 100 selected files randomly 100,000 times from directories containing different amount of files (relative times):

100 files: 100.0
1000 files: 100.4
10,000 files: 101.3
100,000 files: 109.6
1,000,000 files: 130.9

A performance hit of 30% when going from 100 to 1,000,000 files in a directory!

When I ran the tests again, they were not only faster, but the differences were almost zero:

100 files: 100.0
1000 files: 100.0
10,000 files: 100.6
100,000 files: 100.2
1,000,000 files: 100.3

Obviously, some caching is going on. So, if you open the same files over and over (and the number of files is small enough), it doesn't seem to matter how many files you keep in the directories.

This caching could suggest that the performance hit above would be larger if I had opened more files than 100. Another way of doing this test would be to read every single file in random order.

Maybe I should have used the same 1,000,000 files in each test case and instead distributed them differently (100 files per directory, 1000 files per directory etc). But then other variables would affect the results, such as how I distributed them -- path depth, number of directories etc.

Details

I used a script to create files with random names of 10+3 characters. I copied the files from the "100 directory" to the other directories, then added additional files. The files were almost empty (72 bytes).

Then I ran a Python script that opened and closed randomly selected files (from the 100 files above) in each directory. The source code is:

Code:
import datetime
import random

def getMS():
    dt = datetime.datetime.now()
    ms = dt.microsecond / 1000
    ms += dt.second * 1000
    ms += dt.minute * 60000
    ms += dt.hour * 3600000
    return ms

fh = open("files.txt", "r")
filenames = map(lambda fn: fn.strip(), fh.readlines())
fh.close()

random.seed()

NUMBER_OF_OPENS = 100000
TIMES_PER_CASE = 3

testcases = ["1000000", "100000", "10000", "1000", "100"]

for i in range(TIMES_PER_CASE):
    for testcase in testcases:
        starttime = getMS()
        for j in range(NUMBER_OF_OPENS):
            filename = "c:\\temp\\test" + testcase + "\\" + random.choice(filenames)
            open(filename, "rb").close()
        endtime = getMS()

        print testcase, i, endtime - starttime

And the results:

Code:
C:\Temp>python -OO openfiles.py
1000000 0 16156
100000 0 13531
10000 0 12508
1000 0 12399
100 0 12346
1000000 1 12291
100000 1 12274
10000 1 11886
1000 1 11265
100 1 11117
1000000 2 11199
100000 2 11183
10000 2 11232
1000 2 11166
100 2 11166

Machine Specifications

I ran the tests on my old desktop DELL Optiplex 280 with a Pentium 4 CPU (2.8 GHz), 2 GB DDR2 SDRAM and 80 GB Serial ATA-150, 7200 rpm hard drive (cache size unknown).

I'm using Windows XP SP3 with NTFS. I shut down all anti-virus, indexing and updating services and most programs before running the tests.

The hard drive was defragmented after creating the small files and before running the tests. I also rebooted before running the tests.
 

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Anyone else see a performance hit from ext3

I reinstalled my Linux box with RedHat 7.2 and used the ext3 journaling file system. This thing is a pig now. There isn't much running on the box, and performance is sad. (1 Reply)
Discussion started by: 98_1LE
1 Replies

2. Shell Programming and Scripting

Shell Script to hit a url

Hi all, I have a php file that grabs xml, parses it and updates my db accordingly. I want to automate the execution of this process, rather than having to hit the url manually. I have been looking into using cron to execute a script to do this, however i'm not exactly sure what command i would... (1 Reply)
Discussion started by: restivz77
1 Replies

3. Programming

why multiple SIGINT raises when i hit C-c

hi, in my application, i have set up to capture SIGINT and execute a handler.the problem is whenever i hit C-c, multiple SIGINT are sent to the application.I have blocked the SIGINT right after catching the first one but it is unsuccessful.Here is what i do : jmp_buf main_loop; int... (1 Reply)
Discussion started by: Sedighzadeh
1 Replies

4. Shell Programming and Scripting

Hit count on a shell script

I have a unix shell script (ex.sh) written. How to find out how many users (incl. myself) have run this .sh ? I can insert code snipet at top of script if need be. - Ravi (2 Replies)
Discussion started by: ravi368
2 Replies

5. Shell Programming and Scripting

Getting Next Best Hit..

Hi.. I need to get the following output from the input file like this INPUT GRM1 GRM1 0 GRM1 ABC1 1 GRM1 FEQ1 2 GRM1 SED1 3 ABC2 GRM1 0 ABC2 ABC2 1 ABC2 FEQ1 2 ABC2 BED1 3 SED1 SED1 0 SED1 SED1 1 SED1 SED1 2 SED1 ABC1 3 OUTPUT: (7 Replies)
Discussion started by: empyrean
7 Replies

6. SuSE

Java hit

Hello, I'm having trouble looking for info for SUSIE on this CVE-2012-4681. This is basically the newest Java hit. It is mostly a web browser issue but I would like to see if the versions on our servers are vulnerable. I already found the pages/info for Solaris and RHEL. Any help would be... (4 Replies)
Discussion started by: bitlord
4 Replies

7. Cybersecurity

vnc password hit from Retina

Hello, I'm having an issue with VNC. Security at work says that they scanned my servers (Solaris, RHEL, SLES) and found that you don't need a password to access a VNC session. I have tested this and you can't login to the VNC session without a password. Can someone tell what the Retina scanner... (1 Reply)
Discussion started by: bitlord
1 Replies

8. Shell Programming and Scripting

Curl to hit the submit button

Hello, I am looking to hit a URL using curl and click on submit button so that I can get the results. The below is the code <input name="tos_accepted" id="tos_accepted" class="button" type="submit" value="Yes, I Agree"/> <input name="tos_discarded" id="tos_discarded"... (1 Reply)
Discussion started by: Kochappa
1 Replies

9. Shell Programming and Scripting

Best performance to merge two files

Hi Gurus, I need to merge two files. file1 (small file, only one line) this is first linefile2 (large file) abc def ghi ... I use below command to merge the file, since the file2 is really large file, the command read whole file2, the performance is not good. cat file1 > file3... (7 Replies)
Discussion started by: green_k
7 Replies
All times are GMT -4. The time now is 12:14 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy