Sponsored Content
Top Forums Shell Programming and Scripting Script to batch pdfjoin based on pdfgrep output Post 302604995 by nopposan on Tuesday 6th of March 2012 12:02:43 PM
Old 03-06-2012
Lightbulb Script to batch pdfjoin based on pdfgrep output

I have a situation in which I'm given a bunch of pdf files which are all single pages with employee ID's on an independent line. I need to collate all of the pages by employee ID.

Piecemeal, I can find a particular employee ID by just using pdfgrep.

I could also do something like this:
find . -name "*.pdf" -print0 | xargs -0 -I FILENAME bash -c "if { pdftotext FILENAME - | grep -q <IDnumberHere>; } ; then echo FILENAME; fi"
Searching for an employee ID with pdfgrep will list all files containing the employee ID as well as the text content of the whole line which contains the employee ID (This is the same for each employee; the ID number is all that changes.)

The task of joining these files together with pdfjoin is quite simple.

However, I'm a novice at writing bash scripts. Doing all of this piecemeal takes longer than just shuffling actual pages! I need to know how to automate the joining of files that have identical employee ID lines output by pdfgrep.

The number of pages per employee ID varies but is, currently, a maximum of six.

Pseudocode Smilie would be something like:
Filename = pg_0001.pdf

do until [ Filename = pg_<Lastpage>.pdf ]
Filename2= <somehow increment Filename by 1>
x= `pdfgrep "Employee ID" $Filename` #Not sure how to insert variable to be read as filename for pdfgrep
y= `pdfgrep "Employee ID" $Filename2`

if [ "$x" == "$y" ]; then
pdfjoin $Filename $Filename2 --outfile $Filename #Intended to join the two files under the name of Filename, i.e. replacing the first file with the joined file in the same directory.
fi

Filename = $Filename2
Thanks for any help you can offer.

Last edited by nopposan; 03-06-2012 at 01:29 PM..
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Shell script to email based on flat file output

Hi All, I am a newbee in unix but still have written a shell script which should trigger a mail based on certain conditions but the problem is that my file is not being read. Below is the code please advise. I do not know where is it failing. Note $ and the no followed with it is the no of... (1 Reply)
Discussion started by: apoorva
1 Replies

2. Shell Programming and Scripting

Automatically Rerun a script based on previous execution output

Hi Experts, I have a shell script called "updatevs" that is scheduled to run at 6.00 am everyday via cronjob. The cronjob will execute this script and output to a log file. The functionality of this script is to read the database and run a set of commands. This script is generally successful... (6 Replies)
Discussion started by: forumthreads
6 Replies

3. Shell Programming and Scripting

Converting line output to column based output

Hi Guys, I am trying to convert a file which has a row based output to a column based output. My original file looks like this: 1 2 3 4 5 6 1 2 3 1 2 3 (8 Replies)
Discussion started by: npatwardhan
8 Replies

4. Shell Programming and Scripting

help with email to be triggered based on fatal error detection from batch run log file neded

Hi, I require need help in two aspects actually: 1) Fatal error that gets generated as %F% from a log file say ABClog.dat to trigger a mail. At present I manually grep the log file as <grep %F% ABClog.dat| cut-d "%" -f1>. The idea is to use this same logic to grep the log file which is... (1 Reply)
Discussion started by: zico1986
1 Replies

5. Shell Programming and Scripting

Executing a batch of files within a shell script with option to refire the individual files in batch

Hello everyone. I am new to shell scripting and i am required to create a shell script, the purpose of which i will explain below. I am on a solaris server btw. Before delving into the requirements, i will give youse an overview of what is currently in place and its purpose. ... (2 Replies)
Discussion started by: goddevil
2 Replies

6. UNIX for Advanced & Expert Users

Limiting size of rsync batch output

Anyone know if there's a way to limit the size of rsync batch output blob? I need each batch to fix on a 64GB USB key. Using syntax like: rsync -av --only-write-batch=/Volumes/usb/batch --stats /Users/dfbadmin/sandbox/ /Users/dfbadmin/archives/ (7 Replies)
Discussion started by: dfbills
7 Replies

7. Shell Programming and Scripting

Extract batch based on condition

HI, I have a file as mentioned below. Here one batch is for one user id.Batch starts from |T row and ends at .T row. I want to create a new file by reading this file. The condition is for record 10(position 1-2), if position 3 to position 17 is 0 then delete the entire batch and write into the new... (9 Replies)
Discussion started by: abhi.mit32
9 Replies

8. Shell Programming and Scripting

SFTP or scp with password in a batch script without using SSH keys and expect script

Dear All, I have a requirement where I have to SFTP or SCP a file in a batch script. Unfortunately, the destination server setup is such that it doesn't allow for shell command line login. So, I am not able to set up SSH keys. My source server is having issues with Expect. So, unable to use... (5 Replies)
Discussion started by: ss112233
5 Replies

9. UNIX for Beginners Questions & Answers

Finding The Complete SQL statement Using PDFGREP Or Grep

Linux Gods, I am simply attempting to parse SQL statements from a PDF doc in creating a base SQL script at a later time but for the life of me, am having a tough time extracting this data.This exact string worked perfectly a couple of months ago and now it doesnt. Below is an example of the data... (4 Replies)
Discussion started by: metallica1973
4 Replies
pdfgrep(1)							   USER COMMANDS							pdfgrep(1)

NAME
pdfgrep - search pdf files for a regular expression SYNOPSIS
pdfgrep [OPTION...] PATTERN FILE... DESCRIPTION
Search for PATTERN in each FILE. PATTERN is an extended regular expression. pdfgrep works much like grep, with one distinction: It operates on pages and not on lines. OPTIONS
-i, --ignore-case Ignore case distinctions in both the PATTERN and the input files. -H, --with-filename Print the file name for each match. This is the default setting when there is more than one file to search. -h, --no-filename Suppress the prefixing of file name on output. This is the default setting when there is only one file to search. -n, --page-number Prefix each match with the number of the page where it was found. -c, --count Suppress normal output. Instead print the number of matches for each input file. Note that unlike grep, multiple matches on the same page will be counted individually. -C, --context NUM Print at most NUM characters of context around each match. The exact number will vary, because pdfgrep tries to respect word bound- aries. If NUM is "line", the whole line will be printed. If this option is not set, pdfgrep tries to print lines that are not longer than the terminal width. --color WHEN Surround file names, page numbers and matched text with escape sequences to display them in color on the terminal. (The default set- ting is auto). WHEN can be: always Always use colors, even when stdout is not a terminal. never Do not use colors. auto Use colors only when stdout is a terminal. -R, -r, --recursive Recursively search all files (restricted by --include and --exclude) under each directory. --exclude=GLOB Skip files whose base name matches GLOB. See glob(7) for wildcards you can use. You can use this option multiple times to exclude more patterns. It takes precedence over --include. Note, that in- and excludes apply only to files found via --recursive and not to the argument list. --include=GLOB Only search files whose base name matches GLOB. See --exclude for details. The default is *.pdf. --unac Remove accents and ligatures from both the search pattern and the PDF documents. This is useful if you want to search for a word containing 'ae', but the PDF uses the single character 'ae' instead. See unac(3) and unaccent(1) for details. [This option is experimental and only available if pdfgrep is compiled with unac support.] -q, --quiet Suppress all normal output to stdout. Errors will be printed and the exit codes will be returned (see below). --help Print a short summary of the options. -V, --version Show version information ENVIRONMENT VARIABLES
The behavior of pdfgrep is affected by the following environment variable. GREP_COLORS Specifies the colors and other attributes used to highlight various parts of the output. The syntax and values are like GREP_COLORS of grep. See grep(1) for more details. Currently only the capabilities mt, ms, mc, fn, ln and se are used by pdfgrep, where mt, ms and mc have the same effect on pdfgrep. EXIT STATUS
Normally, the exit status is 0 if at least one match is found, 1 if no match is found and 2 if an error occurred. But if the --quiet or -q option is used and a match was found, pdfgrep will return 0 regardless of errors. AUTHOR
Hans-Peter Deifel <hpdeifel at gmx.de> SEE ALSO
grep(1), regex(7) version 1.2 February 14, 2012 pdfgrep(1)
All times are GMT -4. The time now is 06:04 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy