Visit Our UNIX and Linux User Community


deleting 100k log files quickly


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers deleting 100k log files quickly
# 8  
Old 08-29-2002
Quote:
Originally posted by oombera
You're saying it's easier on Unix to remove a thousand files from a thousand different directories than it is to remove them from one directory?
I didn't say that. There is overhead in opening a directory. Even if each of the thousand directories contained only one file, that would not be win.

Directories can grow in size but they cannot shrink. Consider a directory with 100,000 files in it. Now you want to unlink the very last file. This will take some time because unix must scan all 100,000 entries looking for that directory entry. Now suppose that you know which file is the very last entry in the directory. And you first delete the 99,999 other files. Now you go and unlink() that final entry. It still takes the same amount of time.

If we delete each of 100,000 files from a directory, we must scan the directory 100,000 times. On average, we will scan half way before we find the entry we want. That is 100,000 * 100,000 / 2 directory entries read otherwise known as 5,000,000,000. That is a lot. Now suppose that the 100,000 files are evenly distributed in 10 directories. That is 10,000 * 10,000 /2 directory entry scan or 50,000,000 per directory. We need to do that 10 times, once for each directory. That bring us up to 500,000,000 directory entry scans or one tenth of the total work. We pay for this improvement by needing to open 9 more directories but that is a win. 9 directory opens beats reading 4,500,000,000 directory entries.

If use 100 directories, that is one hundredth of the total directory entries to read balanced against the need to open 100 times as many directories. And so on. And, yes, by the time you get to one file per directory, that is dumb. But so is 100,000 files per directory.

I would not suggest that you try to find the exact optimum directory entries per directory and always go with that. The exact number will vary from filesystem to filesystem. And that wouldn't be convienent to a user. But a directory with 100,000 files is way over the top. Users will control C out of an ls rather than let it finish. And they can't figure out how to prune the directory down. When a command like "wc -l *" fails because there are too many filenames, that's a good sign that things have gotten out of control. At that point, the directory is too large for the user to handle. And if, on a quiet system, "ls -l" takes more than 3 seconds to start printing, that's a good sign that the directory is too large for unix to handle efficiently.
# 9  
Old 08-29-2002
Okay, now that I've read your post and thought about it, that makes complete sense. Usually, you don't even think about the overhead in reading a directory, finding the correct file, etc. when removing files.

Then again, I have wondered why deleting one 100MB file in Windows is alot faster than deleting a hundred files that are each much smaller than a meg... same principle I'm sure.
 

Previous Thread | Next Thread
Test Your Knowledge in Computers #117
Difficulty: Easy
BSD (Berkley Software Distribution) is not an example of an OS derived from Unix.
True or False?

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Deleting dated log files whihc are older than an year

Hi. There is a process which creates log files in the below naming format processname.20140425.log processname1.20140425.log And in this path along with these logs there are other files which have the timestamp from 2012 and i want to have a script in place to delete only the above log files... (13 Replies)
Discussion started by: nanz143
13 Replies

2. Solaris

ZFS does not release space even after deleting application log files in a non-global zone

Hi Guys, I have a non-global zone in which has apache application on it. There is a ZFS filesystem where the app saves the log. Even after deleting the logfiles I dont see the space being freed up. There are no snapshots or anything at all Zpool info NAME SIZE ALLOC FREE CAP HEALTH ALTROOT... (8 Replies)
Discussion started by: RDX
8 Replies

3. Shell Programming and Scripting

Getting folder more than 100K size

Hi , I am trying to get the folder details having size more than sme specified value and also the name of the folder should be like TEST. so 1. In the current directory search for all the folders having name like TEST 2. Print the list of the folder names having size more than 100... (3 Replies)
Discussion started by: Anupam_Halder
3 Replies

4. Shell Programming and Scripting

Need to delete large set of files (i.e) close to 100K from a directory based on the input file

Hi all, I need a script to delete a large set of files from a directory under / based on an input file and want to redirect errors into separate file. I have already prepared a list of files in the input file. Kndly help me. Thanks, Prash (36 Replies)
Discussion started by: prash358
36 Replies

5. Shell Programming and Scripting

Moving 100K file to another folder using 1 command

Hi, I need to move 1000s of files from one folder to another. Actually there are 100K+ files. Source dir : source1 Target dir : target1 Now if try cp or mv commands I am getting an error message : Argument List too long. I tried to do it by the time the files are created in the source... (6 Replies)
Discussion started by: unx100
6 Replies

6. Shell Programming and Scripting

Kron Shell: deleting all but most recent log files

I am trying to create a Korn Shell script to be run every 5-10 minute from a crontab. This script needs to look for log files (transaction_<date>.log). If there are more than 5 such files, it needs to delete all but the most current 5. How often these files are create varies - can be every minute... (2 Replies)
Discussion started by: WmShaw
2 Replies

7. UNIX for Dummies Questions & Answers

need solution for this quickly. please quickly.

Write a nawk script that will produce the following report: ***FIRST QUARTERLY REPORT*** ***CAMPAIGN 2004 CONTRIBUTIONS*** ------------------------------------------------------------------------- NAME PHONE Jan | ... (5 Replies)
Discussion started by: p.palakj.shah
5 Replies

8. Shell Programming and Scripting

command for deleting log files based on some condition

Hello, Can anyone pls. provide me with the command for deleting files older then 15 days with a restriction to keep at least 5 files in a directory even if they are older then 15 days. Any help will be highly appreciated. Thanks, Pulkit (4 Replies)
Discussion started by: pulkit
4 Replies

9. UNIX for Dummies Questions & Answers

deleting log files only in particular directories

Hi My problem is i have to remove some log files in specific named directories on a regular basis using shell scripts. What i want my shell script to do is i give the shell script some listing of directories from which to delete all log files recursively. Can anyone please help me. ... (2 Replies)
Discussion started by: sameervs
2 Replies

Featured Tech Videos