Sponsored Content
Full Discussion: Search pdfs in command line
Top Forums Shell Programming and Scripting Search pdfs in command line Post 302705015 by lost.identity on Monday 24th of September 2012 05:59:41 AM
Old 09-24-2012
Search pdfs in command line

Hi,

I'm trying to search for a particular phrase in a large number of PDFs in a particular directory.

What I've done so far only prints out the line, but I haven't been able to display in which file the phrase appears.

Code:
find . -name '*.pdf' -exec pdftotext {} - \; | grep "search phrase"

I've been told that this could be achieved using pdfgrep, but I don't have root access on this machine and it appears that I'm missing some libraries when I tried to install it, so would prefer if I'm able to do this using pdftotext. Many thanks!
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

command to search for a wod in a line

Can some one help me in finding a "word" in a line ?? ex : line : = "You may choose an icon for your message " i want to search for the existance of the word "message" thanks sri (2 Replies)
Discussion started by: Srini75
2 Replies

2. Shell Programming and Scripting

Perl: Search for string on line then search and replace text

Hi All, I have a file that I need to be able to find a pattern match on a line, search that line for a text pattern, and replace that text. An example of 4 lines in my file is: 1. MatchText_randomNumberOfText moreData ReplaceMe moreData 2. MatchText_randomNumberOfText moreData moreData... (4 Replies)
Discussion started by: Crypto
4 Replies

3. UNIX for Dummies Questions & Answers

How to search and replace a particular line in file with sed command

Hello, I have a file and in that, I want to search for a aprticular word and then replace another word in the same line with something else. Example: In file abc.txt, there is a line <host oa_var="s_hostname">test</host> I want to search with s_hostname text and then replace test with... (2 Replies)
Discussion started by: sshah1001
2 Replies

4. Shell Programming and Scripting

perl search and replace - search in first line and replance in 2nd line

Dear All, i want to search particular string and want to replance next line value. following is the test file. search string is tmp,??? ,10:1 "???" may contain any 3 character it should remain the same and next line replace with ,10:50 tmp,123 --- if match tmp,??? then... (3 Replies)
Discussion started by: arvindng
3 Replies

5. Shell Programming and Scripting

Search several string and convert into a single line for each search string using awk command AIX?.

I need to search the file using strings "Request Type" , " Request Method" , "Response Type" and by using result set find the xml tags and convert into a single line?. below are the scenarios. Cat test Nov 10, 2012 5:17:53 AM INFO: Request Type Line 1.... (5 Replies)
Discussion started by: laknar
5 Replies

6. Shell Programming and Scripting

Search: find current line, then search back

Hello. I want to find a line that has "new = 0" in it, then search back based on field $4 () in the current line, and find the first line that has field $4 and "last fetch" Grep or Awk preferred. Here is what the data looks like: 2013-12-12 12:10:30,117 TRACE last fetch: Thu Dec 12... (7 Replies)
Discussion started by: JimBurns
7 Replies

7. Shell Programming and Scripting

Bash script monitor directory and subdirectories for new pdfs

I need bash script that monitor folders for new pdf files and create xml file for rss feed with newest files on the list. I have some script, but it reports errors. #!/bin/bash SYSDIR="/var/www/html/Intranet" HTTPLINK="http://TYPE.IP.ADDRESS.HERE/pdfs" FEEDTITLE="Najnoviji dokumenti na... (20 Replies)
Discussion started by: markus1981
20 Replies

8. Shell Programming and Scripting

Grep command to search a regular expression in a line an only print the string after the match

Hello, one step in a shell script i am writing, involves Grep command to search a regular expression in a line an only print the string after the match an example line is below /logs/GRAS/LGT/applogs/lgt-2016-08-24/2016-08-24.8.log.zip:2016-08-24 19:12:48,602 ERROR... (9 Replies)
Discussion started by: Ramneekgupta91
9 Replies

9. UNIX for Beginners Questions & Answers

Search a multi-line shell command output and execute logic based on result

The following is a multi-line shell command example: $cargo build Compiling prawn v0.1.0 (/Users/ag/rust/prawn) error: failed to resolve: could not find `setup_panix` in `human_panic` --> src/main.rs:14:22 | 14 | human_panic::setup_panix!(); | ... (2 Replies)
Discussion started by: yogi
2 Replies
pdfgrep(1)							   USER COMMANDS							pdfgrep(1)

NAME
pdfgrep - search pdf files for a regular expression SYNOPSIS
pdfgrep [OPTION...] PATTERN FILE... DESCRIPTION
Search for PATTERN in each FILE. PATTERN is an extended regular expression. pdfgrep works much like grep, with one distinction: It operates on pages and not on lines. OPTIONS
-i, --ignore-case Ignore case distinctions in both the PATTERN and the input files. -H, --with-filename Print the file name for each match. This is the default setting when there is more than one file to search. -h, --no-filename Suppress the prefixing of file name on output. This is the default setting when there is only one file to search. -n, --page-number Prefix each match with the number of the page where it was found. -c, --count Suppress normal output. Instead print the number of matches for each input file. Note that unlike grep, multiple matches on the same page will be counted individually. -C, --context NUM Print at most NUM characters of context around each match. The exact number will vary, because pdfgrep tries to respect word bound- aries. If NUM is "line", the whole line will be printed. If this option is not set, pdfgrep tries to print lines that are not longer than the terminal width. --color WHEN Surround file names, page numbers and matched text with escape sequences to display them in color on the terminal. (The default set- ting is auto). WHEN can be: always Always use colors, even when stdout is not a terminal. never Do not use colors. auto Use colors only when stdout is a terminal. -R, -r, --recursive Recursively search all files (restricted by --include and --exclude) under each directory. --exclude=GLOB Skip files whose base name matches GLOB. See glob(7) for wildcards you can use. You can use this option multiple times to exclude more patterns. It takes precedence over --include. Note, that in- and excludes apply only to files found via --recursive and not to the argument list. --include=GLOB Only search files whose base name matches GLOB. See --exclude for details. The default is *.pdf. --unac Remove accents and ligatures from both the search pattern and the PDF documents. This is useful if you want to search for a word containing 'ae', but the PDF uses the single character 'ae' instead. See unac(3) and unaccent(1) for details. [This option is experimental and only available if pdfgrep is compiled with unac support.] -q, --quiet Suppress all normal output to stdout. Errors will be printed and the exit codes will be returned (see below). --help Print a short summary of the options. -V, --version Show version information ENVIRONMENT VARIABLES
The behavior of pdfgrep is affected by the following environment variable. GREP_COLORS Specifies the colors and other attributes used to highlight various parts of the output. The syntax and values are like GREP_COLORS of grep. See grep(1) for more details. Currently only the capabilities mt, ms, mc, fn, ln and se are used by pdfgrep, where mt, ms and mc have the same effect on pdfgrep. EXIT STATUS
Normally, the exit status is 0 if at least one match is found, 1 if no match is found and 2 if an error occurred. But if the --quiet or -q option is used and a match was found, pdfgrep will return 0 regardless of errors. AUTHOR
Hans-Peter Deifel <hpdeifel at gmx.de> SEE ALSO
grep(1), regex(7) version 1.2 February 14, 2012 pdfgrep(1)
All times are GMT -4. The time now is 08:23 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy