Delete files having less than 200 lines


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Delete files having less than 200 lines
# 1  
Old 07-23-2011
Delete files having less than 200 lines

Hi All,

I have some 30,000 files in one directory. The files look like this:

Code:
computer
networks
router
wire

I want to remove those files which have less than 200 lines for example in the above file, the number of lines is 4.

I am trying something like this:
Code:
find /path/to/dir -type f -size -200c -exec rm {}  \;

I am not sure whether this is the correct way of deleting files with 200 lines. I am using Linux with BASH.
# 2  
Old 07-23-2011
Try: (make a backup first Smilie)
Code:
find . -type f | xargs -i bash -c 'if [ $(wc -l {}|cut -d" " -f1) -lt 200 ]; then rm -f {}; fi'

This User Gave Thanks to bartus11 For This Post:
# 3  
Old 07-23-2011
Code:
find . -type f | while read
do
    (($(cat $REPLY | /usr/compat/linux/usr/bin/wc -l) < 200)) && rm -vf "$REPLY"
done

This User Gave Thanks to leogtzr For This Post:
# 4  
Old 07-23-2011
And another one:

Code:
find . -type f -exec sh -c '
  [ $(
        wc -l < "$1"
		) -lt 200 ] &&
          rm  -- "$1"
		  ' inline {} \;

This User Gave Thanks to radoulov For This Post:
# 5  
Old 07-23-2011
Quote:
Originally Posted by bartus11
Try: (make a backup first Smilie)
Code:
find . -type f | xargs -i bash -c 'if [ $(wc -l {}|cut -d" " -f1) -lt 200 ]; then rm -f {}; fi'

While that solution is just fine for most cases, it doesn't scale very well. For 30,000 files it will need to fork-exec 1 find + 1 xargs + 30,000 bash + 30,000 wc + 30,000 cut + somewhere between 0 to 30,000 rm processes.

I used the following script to generate some test files in an empty directory:
Code:
$ ./create.sh 30000
$ cat create.sh
#!/bin/sh

jot 200 > 200
i=$1
while [ $i -gt 0 ]; do
    > empty.$i
    i=$((i-1))
done

Except for the file named "200", all files are empty.

On my 4 yr old laptop, even when all but one file has zero bytes, and after using redirection to eliminate the need for cut (no filename in wc's output), it takes a while:
Code:
$ cat bartus.sh
#!/bin/sh

find "$1" -type f | xargs -I {} bash -c 'if [ $(wc -l < {}) -lt 200 ]; then rm -f {}; fi'

Code:
$ time ./bartus.sh test

real    6m13.537s
user    1m15.476s
sys     4m7.636s

The following solution is not as succinct as bartus11's, but it is much faster. Aside from find and xargs, it only fork-execs one shell and (on my system) one rm per 5000 files to delete
Code:
$ cat alister.sh
#/bin/sh

find "$1" -type f |
while read f; do
	i=0
	while read line; do
		i=$((i+1))
		[ $i -eq 200 ] && continue 2
	done < "$f"
	printf %s\\n "$f"
done |
xargs rm -f

The run-time difference is substantial:
Code:
$ time ./alister.sh test

real    0m8.800s
user    0m2.287s
sys     0m5.345s

Over 6 minutes versus 9 seconds, to process 30,001 files of which 30,000 are empty and the remaining file only has 200 lines totaling just 692 bytes.

Note that as the files grow the disparity between the approaches will only increase. wc in the original solution must read the files in their entirey whereas the while-read loop in my alternative aborts a file's processing if the threshold line count (200) is reached. If the files become large, so do the i/o savings.

Again, I realize that bartus11's solution is perfectly fine for 99.9% of cases; I'm not criticizing it. My intent is only to show that there's a lot to be gained should a 0.1% situation arise.

Regards,
Alister
These 2 Users Gave Thanks to alister For This Post:
# 6  
Old 07-23-2011
Very good point again alister!
Going to the next file right after the line limit is reached is definitely better.
This User Gave Thanks to radoulov For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare two files and lines which are the same just delete it

I am having a two files and different days, and this is example: file1: 06.09.2017. abcd 123 file2: 07.09.2017. abcd 1234 So what I want is that file2 with today's date contains only 1234, so where is a problem you would ask? Problem is here that I put these commands into routers,. and... (3 Replies)
Discussion started by: tomislav91
3 Replies

2. Shell Programming and Scripting

Delete specific lines from files based on another file

I have some text files in a folder named ff as follows. I need to delete the lines (in-place editing)in these files based on another file aa.txt. 32bm.txt: 249 253 A P - 0 0 8 0, 0.0 6,-1.4 0, 0.0 2,-0.4 -0.287 25.6-102.0 -74.4 161.1 37.1 13.3 10.9 250... (2 Replies)
Discussion started by: aden
2 Replies

3. Shell Programming and Scripting

Delete several lines if the first line contain numbers > 200

I have a file of the following format: $data1 size 1278 dataw datat datau datai $data2 size 456 datak dataf datat datay datal $data3 size 154 datag datas datat datar datas (8 Replies)
Discussion started by: FelipeAd
8 Replies

4. UNIX for Dummies Questions & Answers

Take 100MB worth files from 200,000 Files

Hi, I have a process which creates almost 200K files. Each file ranging from 1kb to 5kb. This is almost 2GB of data in all files. I have a requirement where the business needs only 100MB worth of files. Is there a way to get files around 100MB (doesn't have to be exactly 100MB) from all the... (2 Replies)
Discussion started by: grep_me
2 Replies

5. Shell Programming and Scripting

delete all lines with string, process all files in directory

Simply, I have a directory of text files and I need to delete every line in every file containing a specific string. I want to write the modified files to an empty sub directory. I can't seem to get the sed command to delete the lines containing the string, and not just the string, in other... (3 Replies)
Discussion started by: LMHmedchem
3 Replies

6. Shell Programming and Scripting

delete first 3 lines of files

I want just delete the first 3 lines of files,nothing else to do, but sed -i.bak '1,3d' *txt seems very slow for huge files, and I tried sed -i.bak '1,3d;4q' *.txt, remaining only one line, is there any suggestion? (3 Replies)
Discussion started by: frewise
3 Replies

7. Shell Programming and Scripting

need to delete all lines from a group of files except the 1st 2 lines

Hello, I have a group of text files with many lines in each file. I need to delete all the lines in each and only leave 2 lines in each file. (3 Replies)
Discussion started by: script_op2a
3 Replies

8. Solaris

how to delete lines from audit files

hi all, i have enabled audit in our server it is working fine,but now i want to reduce the space with out removing audit.i.e..i want to delete some lines from audit file. here if i use vi editer, audit is not geting up. i want to delete the data i.e..logs for every 6 days in audit file plz... (3 Replies)
Discussion started by: spandhan
3 Replies

9. Shell Programming and Scripting

How to delete first 5 lines and last five lines in all text files

Hi I want to delete first five and last five lines in text files without opening the file and also i want to keep the same file name for all the files. Thanks in advance!!! Ragav (10 Replies)
Discussion started by: ragavendran31
10 Replies
Login or Register to Ask a Question