Delete files having less than 200 lines

07-23-2011

Registered User

190, 1

Join Date: Jan 2011

Last Activity: 11 October 2017, 1:35 PM EDT

Location: Nowhere

Posts: 190

Thanks Given: 227

Thanked 1 Time in 1 Post

Delete files having less than 200 lines

Hi All,

I have some 30,000 files in one directory. The files look like this:

Code:

computer
networks
router
wire

I want to remove those files which have less than 200 lines for example in the above file, the number of lines is 4.

I am trying something like this:

Code:

find /path/to/dir -type f -size -200c -exec rm {}  \;

I am not sure whether this is the correct way of deleting files with 200 lines. I am using Linux with BASH.

shoaibjameel123

View Public Profile for shoaibjameel123

Find all posts by shoaibjameel123

07-23-2011

Registered User

3,733, 1,154

Join Date: Apr 2009

Last Activity: 3 August 2016, 11:03 AM EDT

Posts: 3,733

Thanks Given: 7

Thanked 1,154 Times in 1,124 Posts

Try: (make a backup first

)

Code:

find . -type f | xargs -i bash -c 'if [ $(wc -l {}|cut -d" " -f1) -lt 200 ]; then rm -f {}; fi'

This User Gave Thanks to bartus11 For This Post:

bartus11

View Public Profile for bartus11

Find all posts by bartus11

07-23-2011

Registered User

1, 1

Join Date: Jul 2011

Last Activity: 29 December 2014, 12:45 AM EST

Posts: 1

Thanks Given: 0

Thanked 1 Time in 1 Post

Code:

find . -type f | while read
do
    (($(cat $REPLY | /usr/compat/linux/usr/bin/wc -l) < 200)) && rm -vf "$REPLY"
done

This User Gave Thanks to leogtzr For This Post:

leogtzr

View Public Profile for leogtzr

Find all posts by leogtzr

07-23-2011

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

And another one:

Code:

find . -type f -exec sh -c '
  [ $(
        wc -l < "$1"
		) -lt 200 ] &&
          rm  -- "$1"
		  ' inline {} \;

This User Gave Thanks to radoulov For This Post:

radoulov

View Public Profile for radoulov

Find all posts by radoulov

07-23-2011

Registered User

3,231, 978

Join Date: Dec 2009

Last Activity: 11 June 2014, 8:40 PM EDT

Posts: 3,231

Thanks Given: 179

Thanked 978 Times in 791 Posts

Quote:

Originally Posted by bartus11

Try: (make a backup first Smilie

)

Code:

find . -type f | xargs -i bash -c 'if [ $(wc -l {}|cut -d" " -f1) -lt 200 ]; then rm -f {}; fi'

While that solution is just fine for most cases, it doesn't scale very well. For 30,000 files it will need to fork-exec 1 find + 1 xargs + 30,000 bash + 30,000 wc + 30,000 cut + somewhere between 0 to 30,000 rm processes.

I used the following script to generate some test files in an empty directory:

Code:

$ ./create.sh 30000
$ cat create.sh
#!/bin/sh

jot 200 > 200
i=$1
while [ $i -gt 0 ]; do
    > empty.$i
    i=$((i-1))
done

Except for the file named "200", all files are empty.

On my 4 yr old laptop, even when all but one file has zero bytes, and after using redirection to eliminate the need for cut (no filename in wc's output), it takes a while:

Code:

$ cat bartus.sh
#!/bin/sh

find "$1" -type f | xargs -I {} bash -c 'if [ $(wc -l < {}) -lt 200 ]; then rm -f {}; fi'

Code:

$ time ./bartus.sh test

real    6m13.537s
user    1m15.476s
sys     4m7.636s

The following solution is not as succinct as bartus11's, but it is much faster. Aside from find and xargs, it only fork-execs one shell and (on my system) one rm per 5000 files to delete

Code:

$ cat alister.sh
#/bin/sh

find "$1" -type f |
while read f; do
	i=0
	while read line; do
		i=$((i+1))
		[ $i -eq 200 ] && continue 2
	done < "$f"
	printf %s\\n "$f"
done |
xargs rm -f

The run-time difference is substantial:

Code:

$ time ./alister.sh test

real    0m8.800s
user    0m2.287s
sys     0m5.345s

Over 6 minutes versus 9 seconds, to process 30,001 files of which 30,000 are empty and the remaining file only has 200 lines totaling just 692 bytes.

Note that as the files grow the disparity between the approaches will only increase. wc in the original solution must read the files in their entirey whereas the while-read loop in my alternative aborts a file's processing if the threshold line count (200) is reached. If the files become large, so do the i/o savings.

Again, I realize that bartus11's solution is perfectly fine for 99.9% of cases; I'm not criticizing it. My intent is only to show that there's a lot to be gained should a 0.1% situation arise.

Regards,
Alister

These 2 Users Gave Thanks to alister For This Post:

alister

View Public Profile for alister

Find all posts by alister

07-23-2011

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

Very good point again alister!
Going to the next file right after the line limit is reached is definitely better.

This User Gave Thanks to radoulov For This Post:

radoulov

View Public Profile for radoulov

Find all posts by radoulov

Shell Programming and Scripting

Delete files having less than 200 lines

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare two files and lines which are the same just delete it

Discussion started by: tomislav91

2. Shell Programming and Scripting

Delete specific lines from files based on another file

Discussion started by: aden

3. Shell Programming and Scripting

Delete several lines if the first line contain numbers > 200

Discussion started by: FelipeAd

4. UNIX for Dummies Questions & Answers

Take 100MB worth files from 200,000 Files

Discussion started by: grep_me

5. Shell Programming and Scripting

delete all lines with string, process all files in directory

Discussion started by: LMHmedchem

6. Shell Programming and Scripting

delete first 3 lines of files

Discussion started by: frewise

7. Shell Programming and Scripting

need to delete all lines from a group of files except the 1st 2 lines

Discussion started by: script_op2a

8. Solaris

how to delete lines from audit files

Discussion started by: spandhan

9. Shell Programming and Scripting

How to delete first 5 lines and last five lines in all text files

Discussion started by: ragavendran31