Finiding Files with Perl or awk?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Finiding Files with Perl or awk?
# 1  
Old 05-19-2015
Finiding Files with Perl or awk?

I posted last week about how the find command (known to be slow to begin with), is slowing down by 75x on a windows remote share.

Do awk or Perl have the capability to find files (pretty sure the answer for Perl is yes). I want to duplicate find "$dataDir" -type d -name '*.aps' (recursive search for directories with a certain name returning full path).

Mike
# 2  
Old 05-19-2015
I doubt seriously you'll be able to magically speed things up. Find isn't a slouch. There's only so "fast" you can make an exhaustive search. Now... if you can change the logic of whatever you are doing so that there is a way to do a fast lookup of the file, then fine...

(e.g. whenever I create a *.aps I create a name-value pair in my high speed indexable database... or something else)
# 3  
Old 05-19-2015
The problem is the speed it takes to read 100,000 directory entries on the remote share in question. The language in question, be it C, Perl, Java, or Python, does not matter. They all use the same system calls.
# 4  
Old 05-19-2015
I agree with cjcox there is no magic here...plus you are searching on a remote filesystem.

Depending upon your os (i.e. Redhat) you may want to use the locate, mlocate or slocate command. See the man pages for locate, slocate, mlocate and updatedb. This facility creates an index for finding files.
# 5  
Old 05-19-2015
I know the remote share is responsible for a good part of the overhead, but I am accostomed to finding 3-4x differences in speed trying the same thing in sed, awk, perl, etc. When searching for a faster find on line there are some enticing mentions of perl being particuarly fast/efficient.

Even a 2-3x improvement would help tramendously.

Mike
# 6  
Old 05-19-2015
Quote:
Originally Posted by Michael Stora
Even a 2-3x improvement would help tramendously.
Computers do not work that way.

A slow system call is slow in any language, and the slower it is, the less there is to be gained by 'optimizing' it.

Suppose your program is spending 98% of its time waiting for NFS and 2% of its time actually running. If you find a 200% faster proram, it will be spending 1% of its time actually running and 99% of its time waiting on NFS with theoretical a speed gain of 1% and a realistic speed gain of absolutely zip.

This is also why you can't turbo charge a slow disk with a fast program. No matter how fast your program is, the underlying I/O can't actually move faster.

You might be able to parallelize it, but only to a point.

Perhaps your network connection or NFS can be fine-tuned? That's beyond my expertise, though.

P.S. The find command is not 'known to be slow', certainly not slower than any other file tree walker I know. If you don't understand why it's 'slow' when used on huge file trees, you don't actually know what it's doing.

Last edited by Corona688; 05-19-2015 at 04:12 PM..
# 7  
Old 05-19-2015
The script runs side by side with an extremely un-user-friendly (and expensive) data aquisition software allowing the techs to edit comments after the fact and renename runs (requires editing an xml file, renaming 6 data files, and renaming the *.aps file). It needs to update file and folder lists in real time. When in share drive mode, it is for editing after the fact and it doesn't. I just added code to skip the find command, so the old result is parsed unless the user hits "r" for reset if running in remote mode. You don't need constant updating when you are not running along side the data acquisition software.

I understand how bottlenecks work but am not 100% certain that NFS is the bottleneck.

Mike
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Awk or Perl - to selectively merge two files.

I have two files, these have to be selectively merged into two other files. In addition there will require to be a edit to the last field, where the date format is changed. The first file is a csv file with around 300k lines the data is now nearly 20 years old and I have been asked to move this... (7 Replies)
Discussion started by: gull04
7 Replies

2. Shell Programming and Scripting

awk command to compare a file with set of files in a directory using 'awk'

Hi, I have a situation to compare one file, say file1.txt with a set of files in directory.The directory contains more than 100 files. To be more precise, the requirement is to compare the first field of file1.txt with the first field in all the files in the directory.The files in the... (10 Replies)
Discussion started by: anandek
10 Replies

3. Shell Programming and Scripting

Compare intervals (columns) from two files (awk, grep, Perl?)

Hi dear users, I need to compare numeric columns in two files. These files have the following structure. K.txt (4 columns) A001 chr21 9805831 9846011 A002 chr21 9806202 9846263 A003 chr21 9887188 9988593 A003 chr21 9887188 ... (2 Replies)
Discussion started by: jcvivar
2 Replies

4. Shell Programming and Scripting

Apply 'awk' to all files in a directory or individual files from a command line

Hi All, I am using the awk command to replace ',' by '\t' (tabs) in a csv file. I would like to apply this to all .csv files in a directory and create .txt files with the tabs. How would I do this in a script? I have the following script called "csvtabs": awk 'BEGIN { FS... (4 Replies)
Discussion started by: ScKaSx
4 Replies

5. Shell Programming and Scripting

Compare two files and set a third one using awk or perl

Folks I need your help cuz I've a file with 100,000 records that need to be compared against a passwd file (300) and then create a third one with the data in the first one and the passwd from the second one set in it. The format of the first file is: host xxxxxx "" 0,0 Closed control00/... (4 Replies)
Discussion started by: ranrodrig
4 Replies

6. Shell Programming and Scripting

Comparison and editing of files using awk.(And also a possible bug in awk for loop?)

I have two files which I would like to compare and then manipulate in a way. File1: pictures.txt 1.1 1.3 dance.txt 1.2 1.4 treehouse.txt 1.3 1.5 File2: pictures.txt 1.5 ref2313 1.4 ref2345 1.3 ref5432 1.2 ref4244 dance.txt 1.6 ref2342 1.5 ref2352 1.4 ref0695 1.3 ref5738 1.2... (1 Reply)
Discussion started by: linuxkid
1 Replies

7. Shell Programming and Scripting

Finiding filenames with specific index string

Hi All, I have a file (Names.txt) and the contents of the file is give below. $ cat Names.txt FF313207008.txt FF223207007.txt FF143207006.txt FF372150600.txt FF063407005.txt FF063307005.txt $ From these given file names I want to find the files which has the 6th index value as 2. So... (5 Replies)
Discussion started by: krish_indus
5 Replies

8. Shell Programming and Scripting

perl script for listing files and mailing the all files

Hi, I am new to perl: I need to write perl script to list all the files present in directory and mail should be come to my inbox with all the files present in that directory. advanced thanks for valuable inputs. Thanks Prakash GR (1 Reply)
Discussion started by: prakash.gr
1 Replies

9. Shell Programming and Scripting

Merge files of differrent size with one field common in both files using awk

hi, i am facing a problem in merging two files using awk, the problem is as stated below, file1: A|B|C|D|E|F|G|H|I|1 M|N|O|P|Q|R|S|T|U|2 AA|BB|CC|DD|EE|FF|GG|HH|II|1 .... .... .... file2 : 1|Mn|op|qr (2 Replies)
Discussion started by: shashi1982
2 Replies

10. Shell Programming and Scripting

Perl or awk/egrep from big files??

Hi experts. In one thread i have asked you how to grep the string from the below sample file- Unfortunately the script did not gave proper output (it missed many strings). It happened may be i did gave you the proper contents of the file That was the script- "$ perl -00nle'print join... (13 Replies)
Discussion started by: thepurple
13 Replies
Login or Register to Ask a Question