speed test +20,000 file existance checks too slow

12-13-2008

Registered User

25, 0

Join Date: Sep 2008

Last Activity: 27 December 2008, 9:57 PM EST

Posts: 25

Thanks Given: 0

Thanked 0 Times in 0 Posts

speed test +20,000 file existance checks too slow

Need to make a very fast file existence checker. Passing in 20-50K num of files

In the code below ${file} is a file with a listing of +20,000 files. test_speed is the script. I am commenting out the results of <time test_speed try>.

The normal "test -f" is much much too slow when a system call inside awk or perl. basic grep on +20,000 files is super fast, why does doing a file existence test slow it down so much.

Yes i am on try 55, and still i can not get this thing to go faster. I think try 55 would be very fast but i can not actauyly pass a file listing of +20,000 into a for loop becuase i run out of memory. anyone have any ideas on how to speed up a file check inside awk or perl or chell?

This would be fast if it actually worked

how can i pipe into pram $1 ?

awk '{print $10}' ${file} | if [ -f $1 ];then echo 1; else echo 0; fi

how can you pipe into an if statement?

Quote:

#!/bin/ksh

file=spySD.Dec10_aha~
u=aha2231

user=$USER

## No file existance test.

## time test_speed 1
## real 0m3.32s
## user 0m0.68s
## sys 0m0.19s

if [[ $1 = 1 ]];then
awk -v u=${u} '$5~u {print}' ${file} > /tmp/junk_${user}_f1
fi

## With existence test: Try 22

## time test_speed 22
##
## real 3h13m25.76s
## user 1h14m20.86s
## sys 52m23.13s

if [[ $1 = 22 ]];then

awk -v u=${u} '
$5~u {
sysA="if [[ -f " $10 " ]] ;then echo 1;else echo 0;fi"
sysA | getline chk
close(sysA)
if(chk=="1") {print}
}
' ${file} > /tmp/junk_${user}_f2

fi

## With existance test: Try 3
## This is slow too....

if [[ $1 = 3 ]];then

awk -v u=${u} '
$5~u {
sysA="ls " $10 " | grep -c " $10 " 2>/dev/null"
sysA | getline chk
close(sysA)
if(chk=="1") {print}
}
' ${file} > /tmp/junk_${user}_f3

fi

## With existence test: Try 55
if [[ $1 = 55 ]];then
for i in `awk '{print $10}' ${file}`
do

[ -f $i ] && echo 1 || echo 0

done > /tmp/junk_${user}_f55
fi

nullwhat

View Public Profile for nullwhat

Find all posts by nullwhat

12-15-2008

Registered User

2,898, 136

Join Date: Mar 2007

Last Activity: 11 July 2016, 2:55 PM EDT

Location: Toronto, Canada

Posts: 2,898

Thanks Given: 0

Thanked 136 Times in 120 Posts

Quote:

Originally Posted by nullwhat

Need to make a very fast file existence checker. Passing in 20-50K num of files

Write it in C.

Quote:

how can you pipe into an if statement?

You can't; an if statement is not a loop and it doesn't read standard input.

cfajohnson

View Public Profile for cfajohnson

Find all posts by cfajohnson

12-18-2008

Registered User

614, 110

Join Date: May 2005

Last Activity: 27 June 2016, 2:12 PM EDT

Posts: 614

Thanks Given: 4

Thanked 110 Times in 107 Posts

Don't pipe "it" into a if statement, create the shell script on the fly and pipe that:

Code:

awk '{print "if [ -f \"" $10 "\" ]; then echo 1; else echo 0;fi"}' | sh

cjcox

View Public Profile for cjcox

Find all posts by cjcox

UNIX for Advanced & Expert Users

speed test +20,000 file existance checks too slow

6 More Discussions You Might Find Interesting

1. Solaris

Rsync quite slow (using very little cpu): how to improve its speed?

Discussion started by: priyadarshan

2. Shell Programming and Scripting

Slow Perl script: how to speed up?

Discussion started by: gimley

3. UNIX for Dummies Questions & Answers

Test existance of a file

Discussion started by: siba.s.nayak

4. Shell Programming and Scripting

Test File Existance Remotely?

Discussion started by: Korn0474

5. News, Links, Events and Announcements

Intel Benchmark Test: Linux Goes to 600,000

Discussion started by: Neo

6. UNIX for Advanced & Expert Users

network speed is slow

Discussion started by: q30