Sponsored Content
Top Forums Shell Programming and Scripting Scanning a pdf file in Linux shell Post 302954897 by protocomm on Saturday 12th of September 2015 11:09:08 AM
Old 09-12-2015
convert your pdf file to text with the command:

Code:
pdftotext

and parse the title and author's name
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

scanning for '0' value in .txt file

Hello I am a novice shell scripting programmer, so please bare with me. I have embedded a simple SQL statement into a shell script, which simply returns an integer (its a count (*) statement). The result of the statement is then oputput to .txt file. So, the number could be 0, 1,2, 10,... (4 Replies)
Discussion started by: man80
4 Replies

2. UNIX for Advanced & Expert Users

Scanning file backwards

Is there any way to look for a directory path that is listed any number of lines *before* a keyword in an error message? I have a script that is trying to process different files that are always down a certain portion of a path, and if there is an error, then says there is an error, contact... (2 Replies)
Discussion started by: tekster757
2 Replies

3. Programming

Linux C - how to open a pdf file with default reader

sorry if i repost this... hi.. i want to ask how to open pdf files using C in Linux in Windows, i just use this code: ShellExecute(GetDesktopWindow(), "open", "D:\\Folder\\File.pdf", NULL, NULL, SW_SHOWNORMAL); thanks for advance... (3 Replies)
Discussion started by: sunardo
3 Replies

4. UNIX for Dummies Questions & Answers

scanning the file for a particular column

I have a file containing 4 columns. need to scan that file, if all the rows in the column4 have a value ZERO, it should print "everything is fine". And if all are not ZERO , at the first encounter of non ZERO value of 4th column it should print "some problem " may be a silly question, but at... (11 Replies)
Discussion started by: gotam
11 Replies

5. Red Hat

Setting Password For PDF File--Linux

Hi, I am in need of help. My requirements are : 1) To convert the existing files (irrespective of their format) in a directory to PDF format 2) To make the converted files password protected. I did the attempt to do the same. Though the existing files (irrespective of their format) are... (1 Reply)
Discussion started by: MKR
1 Replies

6. Shell Programming and Scripting

Shell Script to Dynamically Extract file content based on Parameters from a pdf file

Hi Guru's, I am new to shell scripting. I have a unique requirement: The system generates a single pdf(/tmp/ABC.pdf) file with Invoices for Multiple Customers, the format is something like this: Page1 >> Customer 1 >>Invoice1 + invoice 2 >> Page1 end Page2 >> Customer 2 >>Invoice 3 + Invoice 4... (3 Replies)
Discussion started by: DIps
3 Replies

7. Shell Programming and Scripting

Convert excel file to PDF file using shell script

Hi All, Is it possible to convert the excel file to PDF file(Without loosing any format) using unix shell scripting ??? If yes Kindly help me on the code Thanks in advance!!! (5 Replies)
Discussion started by: Balasankar
5 Replies

8. Shell Programming and Scripting

Reg scanning time based log file

Hi, I have a requirement to scan Oracle's alert log file. This file logs all event for Oracle database and each line will have timestamp followed by messages (which might be one or more lines). Example. Thu Aug 15 17:35:59 2013 VKTM detected a time drift. Please check trace file for more... (1 Reply)
Discussion started by: manickaraja
1 Replies

9. Shell Programming and Scripting

Retrieving a paragraph from a pdf file using shell commands

In the reference section of a research paper(in pdf form), many other paper names are cited which have been used inside the pdf at different places. If I give an input, the name of a paper which has been cited in the reference section and want to display the section (the paragraph) inside the pdf... (1 Reply)
Discussion started by: SK33
1 Replies

10. Shell Programming and Scripting

Create a text file and a pdf file from Linux command results.

Hello. The task : Using multiple commands like : gdisk -l $SOME_DISK >> $SOME_FILEI generate some text file. For readiness I must insert page break. When the program is finished I want to convert the final text file to a pdf file. When finished, I got two files : One text file and One pdf... (1 Reply)
Discussion started by: jcdole
1 Replies
pdftotext(1)						      General Commands Manual						      pdftotext(1)

NAME
pdftotext - Portable Document Format (PDF) to text converter (version 3.00) SYNOPSIS
pdftotext [options] [PDF-file [text-file]] DESCRIPTION
Pdftotext converts Portable Document Format (PDF) files to plain text. Pdftotext reads the PDF file, PDF-file, and writes a text file, text-file. If text-file is not specified, pdftotext converts file.pdf to file.txt. If text-file is '-', the text is sent to stdout. OPTIONS
-f number Specifies the first page to convert. -l number Specifies the last page to convert. -r number Specifies the resolution, in DPI. The default is 72 DPI. -x number Specifies the x-coordinate of the crop area top left corner -y number Specifies the y-coordinate of the crop area top left corner -W number Specifies the width of crop area in pixels (default is 0) -H number Specifies the height of crop area in pixels (default is 0) -layout Maintain (as best as possible) the original physical layout of the text. The default is to 'undo' physical layout (columns, hyphen- ation, etc.) and output the text in reading order. -raw Keep the text in content stream order. This is a hack which often "undoes" column formatting, etc. Use of raw mode is no longer recommended. -htmlmeta Generate a simple HTML file, including the meta information. This simply wraps the text in <pre> and </pre> and prepends the meta headers. -bbox Generate an XHTML file containing bounding box information for each word in the file. -enc encoding-name Sets the encoding to use for text output. This defaults to "UTF-8". -listenc Lits the available encodings -eol unix | dos | mac Sets the end-of-line convention to use for text output. -nopgbrk Don't insert page breaks (form feed characters) between pages. -opw password Specify the owner password for the PDF file. Providing this will bypass all security restrictions. -upw password Specify the user password for the PDF file. -q Don't print any messages or errors. -v Print copyright and version information. -h Print usage information. (-help and --help are equivalent.) BUGS
Some PDF files contain fonts whose encodings have been mangled beyond recognition. There is no way (short of OCR) to extract text from these files. EXIT CODES
The Xpdf tools use the following exit codes: 0 No error. 1 Error opening a PDF file. 2 Error opening an output file. 3 Error related to PDF permissions. 99 Other error. AUTHOR
The pdftotext software and documentation are copyright 1996-2004 Glyph & Cog, LLC. pdffonts(1), pdfimages(1), pdfinfo(1), pdftocairo(1), pdftohtml(1), pdftoppm(1), pdftops(1) 22 January 2004 pdftotext(1)
All times are GMT -4. The time now is 02:06 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy