Speeding up a Shell Script (find, grep and a for loop) Post: 302222477

Sponsored Content

Top Forums UNIX for Dummies Questions & Answers Speeding up a Shell Script (find, grep and a for loop) Post 302222477 by era on Thursday 7th of August 2008 01:14:17 AM

08-07-2008

Registered User

Quote:

Originally Posted by Dave Stockdale

Code:

echo "Finding All PDFs..."
ls -R | grep .pdf > /tmp/pdfs/all_pdfs.out
echo "Done."

# Remove rubbish from list

echo "Removing Rubbish From List..."
sed 's|^\./[a-zA-Z0-9_ &./:]*$||g' /tmp/pdfs/all_pdfs.out > /tmp/pdfs/all_pdfs2.out
sed '/^$/d' /tmp/pdfs/all_pdfs2.out > /tmp/pdfs/all_pdfs.out
echo "Done."

You could trim this down to avoid using so many temporary files.

Code:

ls -R | sed -e '/\.pdf$/!d' -e 's|^\./[a-zA-Z0-9_ &./:]*$||g' -e '/^$/d' >/tmp/pdfs/all_pdfs.out

The first sed command is somewhat more specific than just grep ,pdf -- instead of accepting any character (sic) followed by "pdf" anywhere in the file name, it looks specifically for .pdf at the end of the line. Maybe that's not what you want; if so, take out the $ perhaps.

Quote:

Originally Posted by Dave Stockdale

Code:

echo "Finding All PDFs..."
# List all PDFs Linked to

echo "Gathering List of PDF Links..."
find . -name "*.htm*" -exec grep -o "[a-zA-Z0-9_]\{1,\}\.pdf" {} \; > /tmp/pdfs/all_links.out
find . -name "*.php" -exec grep -o "[a-zA-Z0-9_]\{1,\}\.pdf" {} \; >> /tmp/pdfs/all_links.out
echo "Done."

Also, you could run a single find here; that should reduce running time significantly if the directory tree is big.

Code:

 find . -name "*.htm*" -o -name "*.php" \
  -exec grep -o "[a-zA-Z0-9_]\{1,\}\.pdf" {} \; > /tmp/pdfs/all_links.out

(The wrapping with a backslash is insignificant; I just did that here to avoid getting a very wide forum posting.)

era

View Public Profile for era

Find all posts by era

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

grep'ing and sed'ing chunks in bash... need help on speeding up a log parser.

I have a file that is 20 - 80+ MB in size that is a certain type of log file. It logs one of our processes and this process is multi-threaded. Therefore the log file is kind of a mess. Here's an example: The logfile looks like: "DATE TIME - THREAD ID - Details", and a new file is created...

2. Shell Programming and Scripting

Bash script (using find and grep)

I'm trying to make a simple search script but cannot get it right. The script should search for keywords inside files. Then return the file paths in a variable. (Each file path separated with \n). #!/bin/bash SEARCHQUERY="searchword1 searchword2 searchword3"; for WORD in $SEARCHQUERY do ...

3. Shell Programming and Scripting

Shell script / Grep / Awk to variable and Loop

Hi, I have a text file with data in that I wish to extract, assign to a variable and process through a loop. Kind of the process that I am after: 1: Grep the text file for the values. Currently using: cat /root/test.txt | grep TESTING= | awk -F"=" '{ a = $2 } {print a}' | sort -u ...

4. UNIX for Dummies Questions & Answers

Speeding/Optimizing GREP search on CSV files

Hi all, I have problem with searching hundreds of CSV files, the problem is that search is lasting too long (over 5min). Csv files are "," delimited, and have 30 fields each line, but I always grep same 4 fields - so is there a way to grep just those 4 fields to speed-up search. Example:...

5. Shell Programming and Scripting

Speeding up search and replace in a for loop

Hello, I am using sed in a for loop to replace text in a 100MB file. I have about 55,000 entries to convert in a csv file with two entries per line. The following script works to search file.txt for the first field from conversion.csv and then replace it with the second field. While it works fine,...

6. Shell Programming and Scripting

Help speeding up script

This is my first experience writing unix script. I've created the following script. It does what I want it to do, but I need it to be a lot faster. Is there any way to speed it up? cat 'Tax_Provision_Sample.dat' | sort | while read p; do fn=`echo $p|cut -d~ -f2,4,3,8,9`; echo $p >> "$fn.txt";...

7. Shell Programming and Scripting

How to use grep in a loop using a bash script?

Dear all, Please help with the following. I have a file, let's call it data.txt, that has 3 columns and approx 700,000 lines, and looks like this: rs1234 A C rs1236 T G rs2345 G T Please use code tags as required by forum rules! I have a second file, called reference.txt,...

8. Shell Programming and Scripting

Speeding up shell script with grep

HI Guys hoping some one can help I have two files on both containing uk phone numbers master is a file which has been collated over a few years ad currently contains around 4 million numbers new is a file which also contains 4 million number i need to split new nto two separate files...

9. Shell Programming and Scripting

Help 'speeding' up this 'parsing' script - taking 24+ hours to run

Hi, I've written a ksh script that read a file and parse/filter/format each line. The script runs as expected but it runs for 24+ hours for a file that has 2million lines. And sometimes, the input file has 10million lines which means it can be running for more than 2 days and still not finish....

10. Shell Programming and Scripting

Help with speeding up my working script to take less time - how to use more CPU usage for a script

Hello experts, we have input files with 700K lines each (one generated for every hour). and we need to convert them as below and move them to another directory once. Sample INPUT:- # cat test1 1559205600000,8474,NormalizedPortInfo,PctDiscards,0.0,Interface,BG-CTA-AX1.test.com,Vl111...

LEARN ABOUT DEBIAN

pdfseparate

pdfseparate(1)						      General Commands Manual						    pdfseparate(1)

NAME

       pdfseparate - Portable Document Format (PDF) page extractor

SYNOPSIS

       pdfseparate [options] PDF-file PDF-page-pattern

DESCRIPTION

       pdfseparate extract single pages from a Portable Document Format (PDF).

       pdfseparate  reads  the PDF file PDF-file, extracts one or more pages, and writes one PDF file for each page to PDF-page-pattern, PDF-page-
       pattern should contain %d

       The PDF-file should not be encrypted.

OPTIONS

       -f number
	      Specifies the first page to extract. If -f is omitted, extraction starts with page 1.

       -l number
	      Specifies the last page to extract. if -p is omitted, extraction ends with the last page.

       -v     Print copyright and version information.

       -h     Print usage information.	(-help and --help are equivalent.)

EXAMPLE

       pdfseparate sample.pdf sample-%d.pdf

       extracts all pages from sample.pdf, if i.e. sample.pdf has 3 pages, it produces

       sample-1.pdf, sample-2.pdf, sample-3.pdf

AUTHOR

       The pdfseparate software and documentation are copyright 1996-2004 Glyph & Cog, LLC  and  copyright  2005-2011  The  Poppler  Developers  -
       http://poppler.freedesktop.org

SEE ALSO

       pdfunite(1),

								 15 September 2011						    pdfseparate(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

grep'ing and sed'ing chunks in bash... need help on speeding up a log parser.

Discussion started by: elinenbe

2. Shell Programming and Scripting

Bash script (using find and grep)

Discussion started by: limmer

3. Shell Programming and Scripting

Shell script / Grep / Awk to variable and Loop

Discussion started by: Spoonless

4. UNIX for Dummies Questions & Answers

Speeding/Optimizing GREP search on CSV files

Discussion started by: Whit3H0rse

5. Shell Programming and Scripting

Speeding up search and replace in a for loop

Discussion started by: pbluescript

6. Shell Programming and Scripting

Help speeding up script

Discussion started by: JohnN6

7. Shell Programming and Scripting

How to use grep in a loop using a bash script?

Discussion started by: aberg

8. Shell Programming and Scripting

Speeding up shell script with grep

Discussion started by: dunryc

9. Shell Programming and Scripting

Help 'speeding' up this 'parsing' script - taking 24+ hours to run

Discussion started by: newbie_01

10. Shell Programming and Scripting

Help with speeding up my working script to take less time - how to use more CPU usage for a script

Discussion started by: prvnrk

LEARN ABOUT DEBIAN

pdfseparate