Script to batch pdfjoin based on pdfgrep output Post: 302606162

Sponsored Content

Top Forums Shell Programming and Scripting Script to batch pdfjoin based on pdfgrep output Post 302606162 by nopposan on Friday 9th of March 2012 05:11:47 PM

03-09-2012

Registered User

printf is nifty

Thanks!

Here's my updated script. I haven't tested it yet as I don't have any practice material at the moment.

Code:

#!bin/bash
# This script is written for files that are named with the pattern
#pg_0001.pdf, with leading zeros.

a=1 #Sets the starting page. 'Could set with user input.
b=$(($a+1))#Sets the next page equal to one more than the starting page.

Filename="pg_"`printf "%04d" $a`".pdf" # This is the filename of the
#starting page. The printf command is used to format the number with
#up to three leading zeros.

until [ $b == 680 ]; do # This line sets the maximum pages to be
#considered for concatenation.
Filename2="pg_"`printf "%04d" $b`".pdf" # This is the filename of the
#next file to be considered for concatenation to the current file.
x=`pdfgrep -C 0 [0-9]\{7\} $Filename | head -n 1` # pdfgrep with
#option to, -C, capture 0 characters other than the 7 digits in the
#employee ID; this is piped to "head" in order to get just the first
#occurrence.
y=`pdfgrep -C 0 [0-9]\{7\} $Filename2 | head -n 1`

if [ "$x" == "$y" ]; then 
    pdfjoin --rotateoversize 'false' $Filename $Filename2 --outfile $Filename # If the employee ID's are equal, then the pdf files
#are concatenated into a new file, which is given the name of the
#first file that's added to.
    rm $Filename2 # If the file is concatenated to a previous
#file, it is removed.
else
cp $Filename Empl_ID_"$x".pdf # Replace page number name with name
#based on Empl_ID.
#rm $Filename # Uncomment to remove the original file.
Filename=$Filename2 # If no match is found and the file is not
#concatenated, then advance the current origination file, Filename, to
#the name of the non-concatenated/non-matching file.
fi

b=$(($b+1)) # Advance the page number of the next file, Filename2.

done

cp $Filename Empl_ID_"$x".pdf # Finally, when all else is done,
#replace page number name with name of last file with one based on
#Empl_ID.

#rm $Filename # Uncomment to remove the original last file.

exit

Last edited by Corona688; 03-09-2012 at 06:32 PM.. Reason: Code tags, please.

nopposan

View Public Profile for nopposan

Find all posts by nopposan

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Shell script to email based on flat file output

Hi All, I am a newbee in unix but still have written a shell script which should trigger a mail based on certain conditions but the problem is that my file is not being read. Below is the code please advise. I do not know where is it failing. Note $ and the no followed with it is the no of...

2. Shell Programming and Scripting

Automatically Rerun a script based on previous execution output

Hi Experts, I have a shell script called "updatevs" that is scheduled to run at 6.00 am everyday via cronjob. The cronjob will execute this script and output to a log file. The functionality of this script is to read the database and run a set of commands. This script is generally successful...

3. Shell Programming and Scripting

Converting line output to column based output

Hi Guys, I am trying to convert a file which has a row based output to a column based output. My original file looks like this: 1 2 3 4 5 6 1 2 3 1 2 3

4. Shell Programming and Scripting

help with email to be triggered based on fatal error detection from batch run log file neded

Hi, I require need help in two aspects actually: 1) Fatal error that gets generated as %F% from a log file say ABClog.dat to trigger a mail. At present I manually grep the log file as <grep %F% ABClog.dat| cut-d "%" -f1>. The idea is to use this same logic to grep the log file which is...

5. Shell Programming and Scripting

Executing a batch of files within a shell script with option to refire the individual files in batch

Hello everyone. I am new to shell scripting and i am required to create a shell script, the purpose of which i will explain below. I am on a solaris server btw. Before delving into the requirements, i will give youse an overview of what is currently in place and its purpose. ...

6. UNIX for Advanced & Expert Users

Limiting size of rsync batch output

Anyone know if there's a way to limit the size of rsync batch output blob? I need each batch to fix on a 64GB USB key. Using syntax like: rsync -av --only-write-batch=/Volumes/usb/batch --stats /Users/dfbadmin/sandbox/ /Users/dfbadmin/archives/

7. Shell Programming and Scripting

Extract batch based on condition

HI, I have a file as mentioned below. Here one batch is for one user id.Batch starts from |T row and ends at .T row. I want to create a new file by reading this file. The condition is for record 10(position 1-2), if position 3 to position 17 is 0 then delete the entire batch and write into the new...

8. Shell Programming and Scripting

SFTP or scp with password in a batch script without using SSH keys and expect script

Dear All, I have a requirement where I have to SFTP or SCP a file in a batch script. Unfortunately, the destination server setup is such that it doesn't allow for shell command line login. So, I am not able to set up SSH keys. My source server is having issues with Expect. So, unable to use...

9. UNIX for Beginners Questions & Answers

Finding The Complete SQL statement Using PDFGREP Or Grep

Linux Gods, I am simply attempting to parse SQL statements from a PDF doc in creating a base SQL script at a later time but for the life of me, am having a tough time extracting this data.This exact string worked perfectly a couple of months ago and now it doesnt. Below is an example of the data...

LEARN ABOUT DEBIAN

pdfseparate

pdfseparate(1)						      General Commands Manual						    pdfseparate(1)

NAME

       pdfseparate - Portable Document Format (PDF) page extractor

SYNOPSIS

       pdfseparate [options] PDF-file PDF-page-pattern

DESCRIPTION

       pdfseparate extract single pages from a Portable Document Format (PDF).

       pdfseparate  reads  the PDF file PDF-file, extracts one or more pages, and writes one PDF file for each page to PDF-page-pattern, PDF-page-
       pattern should contain %d

       The PDF-file should not be encrypted.

OPTIONS

       -f number
	      Specifies the first page to extract. If -f is omitted, extraction starts with page 1.

       -l number
	      Specifies the last page to extract. if -p is omitted, extraction ends with the last page.

       -v     Print copyright and version information.

       -h     Print usage information.	(-help and --help are equivalent.)

EXAMPLE

       pdfseparate sample.pdf sample-%d.pdf

       extracts all pages from sample.pdf, if i.e. sample.pdf has 3 pages, it produces

       sample-1.pdf, sample-2.pdf, sample-3.pdf

AUTHOR

       The pdfseparate software and documentation are copyright 1996-2004 Glyph & Cog, LLC  and  copyright  2005-2011  The  Poppler  Developers  -
       http://poppler.freedesktop.org

SEE ALSO

       pdfunite(1),

								 15 September 2011						    pdfseparate(1)