06-07-2013
remove last newline character..
As I mentioned in my mail when I take the count of the file, it is giving me the value without the next line. However when I open the file in vi mode the cursor goes till the line 'ijkl'. This is looking strange. Any suggestions regarding the file?
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
hi,
I want to print the below lines
"Message from bac logistics
The Confirmation File has not been received."
When i give like this in the code
"Message from bac logistics\n The Confirmation File has not been received."
It is giving only
Message from bac logistics\n The... (9 Replies)
Discussion started by: trichyselva
9 Replies
2. Shell Programming and Scripting
Hi All,
I have named a file with current date,time and year as follows:
month=`date | awk '{print $2}'`
date=`date | awk '{print $3}'`
year=`date | awk '{print $6}'`
time=`date +%Hh_%Mm_%Ss'`
filename="test_"$month"_"$date"_"$year"_"$time".txt"
> $filename
The file is created with a... (2 Replies)
Discussion started by: amio
2 Replies
3. Shell Programming and Scripting
Hi All,
We append the output of a file's size in a file. But a newline character is appended after the variable.
Pls help how to clear this.
filesize=`ls -l test.txt | awk `{print $5}'`
echo File size of test.txt is $filesize bytes >> logfile.txt
The output we got is,
File size of... (4 Replies)
Discussion started by: amio
4 Replies
4. Shell Programming and Scripting
Hi All,
I have 5000 records like this
Request_id|Type|Status|Priority|Ticket Submitted Date and Time|Actual Resolved Date and Time|Current Ticket Owner Group|Case final Ticket Owner Group|Customer Severity|Reported Symptom/Request|Component|Hot Topic|Reason for Missed SLA|Current Ticket... (2 Replies)
Discussion started by: j_53933
2 Replies
5. Shell Programming and Scripting
I'd like to remove (do a pattern or precise replacement - this I can handle in SED using Regex )
---AFTER THE 1ST Occurrence ( i.e. on the 2nd occurrence - from the 2nd to fourth occurance ) of a specific string : type 1
-- After the 1st occurrence of 1 string1 till the 1st occurrence of... (4 Replies)
Discussion started by: sieger007
4 Replies
6. Shell Programming and Scripting
Hi,
In my file, I have '\n' characters inside a single record. Because of this, a single records appears in many lines and looks like multiple records. In the below file.
File 1
====
1,nmae,lctn,da\n
t
2,ghjik,o\n
ut,de\n
fk
Expected output after the \n removed
File 2
=====... (5 Replies)
Discussion started by: machomaddy
5 Replies
7. Shell Programming and Scripting
Hi,
I have a very huge file, around 1GB of data.
I want to remove the newline characters in the file but not preceded by the original end delimiter {}
sample data will look like this
1234567
abcd{}
1234sssss
as67
abcd{}
12dsad3dad
4sdad567
abcdsadd{}
this should look like this... (6 Replies)
Discussion started by: ratheeshjulk
6 Replies
8. Shell Programming and Scripting
hi i am having delimited .dat file having content like below.
test.dat(5 line of records)
======
PT2~Stag~Pt2 Stag Test.
Updated~PT2 S T~Area~~UNCEF R20~~2012-05-24 ~2014-05-24~~
PT2~Stag y~Pt2 Stag Test.
Updated~PT2 S T~Area~METR~~~2012-05-24~2014-05-24~~test
PT2~Pt2 Stag Test~~PT2 S... (4 Replies)
Discussion started by: sushine11
4 Replies
9. Shell Programming and Scripting
I have a file which comes every day and the file data look's as below.
Vi abc.txt
a|b|c|d\n
a|g|h|j\n
Some times we receive the file with only a new line character in the file like
vi abc.txt
\n (8 Replies)
Discussion started by: rak Kundra
8 Replies
10. UNIX for Beginners Questions & Answers
Hi,
I came across one issue recently where output from one of the columns of the table from where i am creating input file has newline characters hence, record in the file is spread over multiple lines. Fields in the file are separated by pipe (|) delimiter. As header will never have newline... (4 Replies)
Discussion started by: Prathmesh
4 Replies
LEARN ABOUT DEBIAN
pdf2txt
PDF2TXT(1) PDFMiner Manual PDF2TXT(1)
NAME
pdf2txt - extracts text contents of PDF files
SYNOPSIS
pdf2txt [option...] file...
DESCRIPTION
pdf2txt extracts text contents from a PDF file. It extracts all the text that is to be rendered programmatically, i.e. text represented as
ASCII or Unicode strings. It cannot recognize text drawn as images that would require optical character recognition. It also extracts the
corresponding locations, font names, font sizes, writing direction (horizontal or vertical) for each text portion. You need to provide a
password for protected PDF documents when its access is restricted. You cannot extract any text from a PDF document which does not have
extraction permission.
OPTIONS
-o file
Specifies the output file name. The default is to print the extracted contents to standand output in text format.
-p pageno[,pageno,...]
Specifies the comma-separated list of the page numbers to be extracted. Page numbers start at one. By default, it extracts text from
all the pages.
-c codec
Specifies the output codec.
-t type
Specifies the output format. The following formats are currently supported:
text
Text format. This is the default.
html
HTML format. It is not recommended.
xml
XML format. It provides the most information.
tag
"Tagged PDF" format. A tagged PDF has its own contents annotated with HTML-like tags. pdf2txt tries to extract its content streams
rather than inferring its text locations. Tags used here are defined in the PDF Reference, Sixth Edition[1] (S10.7 "Tagged PDF").
-D writing-mode
Specifies the writing mode of text outputs:
lr-tb
Left-to-right, top-to-bottom.
tb-rl
Top-to-bottom, right-to-left.
auto
Determine writing mode automatically
-M char-margin, -L line-margin, -W word-margin
These are the parameters used for layout analysis. In an actual PDF file, text portions might be split into several chunks in the
middle of its running, depending on the authoring software. Therefore, text extraction needs to splice text chunks. In the figure
below, two text chunks whose distance is closer than the char-margin is considered continuous and get grouped into one. Also, two lines
whose distance is closer than the line-margin is grouped as a text box, which is a rectangular area that contains a "cluster" of text
portions. Furthermore, it may be required to insert blank characters (spaces) as necessary if the distance between two words is greater
than the word-margin, as a blank between words might not be represented as a space, but indicated by the positioning of each word.
Each value is specified not as an actual length, but as a proportion of the length to the size of each character in question. The
default values are char-margin = 1.0, line-margin = 0.3, and W = 0.2, respectively.
-n
Suppress layout analysis.
-A
Force layout analysis for all the text strings, including text contained in figures.
-V
Enable detection of vertical writing.
-s scale
Specifies the output scale. This option can be used in HTML format only.
-m n
Specifies the maximum number of pages to extract. By default, all the pages in a document are extracted.
-P password
Provides the user password to access PDF contents.
-d
Increase the debug level.
EXAMPLES
Extract text as an HTML file whose filename is output.html:
$ pdf2txt -o output.html samples/naacl06-shinyama.pdf
Extract a Japanese HTML file in vertical writing:
$ pdf2txt -c euc-jp -D tb-rl -o output.html samples/jo.pdf
Extract text from an encrypted PDF file:
$ pdf2txt -P mypassword -o output.txt secret.pdf
SEE ALSO
dumppdf(1)
AUTHORS
Jakub Wilk <jwilk@debian.org>
Wrote this manual page for the Debian system.
Yusuke Shinyama <yusuke@cs.nyu.edu>
Author of PDFMiner and its original HTML documentation.
NOTES
1. PDF Reference, Sixth Edition
http://www.adobe.com/devnet/acrobat/pdfs/pdf_reference_1-7.pdf
pdf2txt 08/24/2011 PDF2TXT(1)