03-06-2012
Script to batch pdfjoin based on pdfgrep output
I have a situation in which I'm given a bunch of pdf files which are all single pages with employee ID's on an independent line. I need to collate all of the pages by employee ID.
Piecemeal, I can find a particular employee ID by just using pdfgrep.
I could also do something like this:
find . -name "*.pdf" -print0 | xargs -0 -I FILENAME bash -c "if { pdftotext FILENAME - | grep -q <IDnumberHere>; } ; then echo FILENAME; fi"
Searching for an employee ID with pdfgrep will list all files containing the employee ID as well as the text content of the whole line which contains the employee ID (This is the same for each employee; the ID number is all that changes.)
The task of joining these files together with pdfjoin is quite simple.
However, I'm a novice at writing bash scripts. Doing all of this piecemeal takes longer than just shuffling actual pages! I need to know how to automate the joining of files that have identical employee ID lines output by pdfgrep.
The number of pages per employee ID varies but is, currently, a maximum of six.
Pseudocode
![Embarrassment Smilie](https://www.unix.com/images/smilies/redface.gif)
would be something like:
Filename = pg_0001.pdf
do until [ Filename = pg_<Lastpage>.pdf ]
Filename2= <somehow increment Filename by 1>
x= `pdfgrep "Employee ID" $Filename` #Not sure how to insert variable to be read as filename for pdfgrep
y= `pdfgrep "Employee ID" $Filename2`
if [ "$x" == "$y" ]; then
pdfjoin $Filename $Filename2 --outfile $Filename #Intended to join the two files under the name of Filename, i.e. replacing the first file with the joined file in the same directory.
fi
Filename = $Filename2
Thanks for any help you can offer.
Last edited by nopposan; 03-06-2012 at 01:29 PM..
9 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi All,
I am a newbee in unix but still have written a shell script which should trigger a mail based on certain conditions but the problem is that my file is not being read. Below is the code please advise. I do not know where is it failing.
Note $ and the no followed with it is the no of... (1 Reply)
Discussion started by: apoorva
1 Replies
2. Shell Programming and Scripting
Hi Experts,
I have a shell script called "updatevs" that is scheduled to run at 6.00 am everyday via cronjob. The cronjob will execute this script and output to a log file. The functionality of this script is to read the database and run a set of commands. This script is generally successful... (6 Replies)
Discussion started by: forumthreads
6 Replies
3. Shell Programming and Scripting
Hi Guys,
I am trying to convert a file which has a row based output to a column based output. My original file looks like this:
1
2
3
4
5
6
1
2
3
1
2
3 (8 Replies)
Discussion started by: npatwardhan
8 Replies
4. Shell Programming and Scripting
Hi,
I require need help in two aspects actually:
1) Fatal error that gets generated as %F% from a log file say ABClog.dat to trigger a mail. At present I manually grep the log file as <grep %F% ABClog.dat| cut-d "%" -f1>. The idea is to use this same logic to grep the log file which is... (1 Reply)
Discussion started by: zico1986
1 Replies
5. Shell Programming and Scripting
Hello everyone. I am new to shell scripting and i am required to create a shell script, the purpose of which i will explain below.
I am on a solaris server btw.
Before delving into the requirements, i will give youse an overview of what is currently in place and its purpose.
... (2 Replies)
Discussion started by: goddevil
2 Replies
6. UNIX for Advanced & Expert Users
Anyone know if there's a way to limit the size of rsync batch output blob? I need each batch to fix on a 64GB USB key.
Using syntax like:
rsync -av --only-write-batch=/Volumes/usb/batch --stats /Users/dfbadmin/sandbox/ /Users/dfbadmin/archives/ (7 Replies)
Discussion started by: dfbills
7 Replies
7. Shell Programming and Scripting
HI,
I have a file as mentioned below. Here one batch is for one user id.Batch starts from |T row and ends at .T row. I want to create a new file by reading this file. The condition is for record 10(position 1-2), if position 3 to position 17 is 0 then delete the entire batch and write into the new... (9 Replies)
Discussion started by: abhi.mit32
9 Replies
8. Shell Programming and Scripting
Dear All,
I have a requirement where I have to SFTP or SCP a file in a batch script. Unfortunately, the destination server setup is such that it doesn't allow for shell command line login. So, I am not able to set up SSH keys. My source server is having issues with Expect. So, unable to use... (5 Replies)
Discussion started by: ss112233
5 Replies
9. UNIX for Beginners Questions & Answers
Linux Gods,
I am simply attempting to parse SQL statements from a PDF doc in creating a base SQL script at a later time but for the life of me, am having a tough time extracting this data.This exact string worked perfectly a couple of months ago and now it doesnt. Below is an example of the data... (4 Replies)
Discussion started by: metallica1973
4 Replies
LEARN ABOUT HPUX
ipsec_config_batch
ipsec_config_batch(1M) ipsec_config_batch(1M)
NAME
ipsec_config_batch - allow for processing of IPsec config operations in a single batch file
SYNOPSIS
batch_file_name profile_file]
DESCRIPTION
The command allows you to specify multiple and operations in a single batch file for processing. HP-UX IPSec processes the operations in a
batch file as a group. This mode is useful if you are adding or deleting configuration records that may affect other records.
If one operation is invalid, all operations in the batch file fail. The utility first verifies each operation in the batch file for syntax
errors and collisions (object names and priority values) with existing entries in the configuration database. If all operations in the
batch file are valid, the HP-UX IPSec infrastructure updates the configuration database with all operations at the same time. If HP-UX
IPSec is active and running, the HP-UX IPSec infrastructure also updates the runtime policy database.
Options and Operands
The batch operation recognizes the following options and operands:
batch_file_name
The name of the batch file containing and operations.
A batch file cannot contain operations that operate on the following objects:
For example, the
operation is illegal in a batch file.
In addition, a batch file cannot contain the following commands:
o does not allow recursive batch files) or commands.
o
o
Lines starting with a pound sign are interpreted as comments. Comment lines within an operation are not allowed.
Maximum length: 1023 characters.
Default: None.
The utility verifies the and operations, but does not add or delete entries in the configuration database. This option applies to
all operations in the batch file. Individual operations in the batch file cannot specify the option.
Specifies the name of the profile file containing default argument
values for this policy. The argument values are evaluated once, when the policy is added to the configuration database. Val-
ues used from the profile file become part of the configuration record for the policy.
This argument applies to all operations in the batch file. Individual operations in the batch file cannot specify the profile
argument.
Maximum length: 1023 characters.
Default:
EXAMPLES
The file contains the following entries:
AUTHOR
was developed by HP.
FILES
configuration database.
default profile file.
SEE ALSO
ipsec_admin(1M), ipsec_config(1M), ipsec_config_add(1M), ipsec_config_delete(1M), ipsec_config_export(1M), ipsec_config_show(1M),
ipsec_migrate(1M), ipsec_policy(1M), ipsec_report(1M).
HP-UX IPSec Software Required ipsec_config_batch(1M)