Man Page: cam::pdf::pagetext
Operating Environment: debian
Section: 3pm
CAM::PDF::PageText(3pm) User Contributed Perl Documentation CAM::PDF::PageText(3pm)NAMECAM::PDF::PageText - Extract text from PDF page treeSYNOPSISmy $pdf = CAM::PDF->new($filename); my $pageone_tree = $pdf->getPageContentTree(1); print CAM::PDF::PageText->render($pageone_tree);DESCRIPTIONThis module attempts to extract sequential text from a PDF page. This is not a robust process, as PDF text is graphically laid out in arbitrary order. This module uses a few heuristics to try to guess what text goes next to what other text, but may be fooled easily by, say, subscripts, non-horizontal text, changes in font, form fields etc. All those disclaimers aside, it is useful for a quick dump of text from a simple PDF file.LICENSESame as CAM::PDFFUNCTIONS$pkg->render($pagetree) $pkg->render($pagetree, $verbose) Turn a page content tree into a string. This is a class method that should be called like: CAM::PDF::PageText->render($pagetree);AUTHORSee CAM::PDF perl v5.14.2 2012-07-08 CAM::PDF::PageText(3pm)
| Related Man Pages |
|---|
| cam::pdf::decrypt(3pm) - debian |
| cam::pdf::gs(3pm) - debian |
| text::pdf::dict(3pm) - debian |
| text::pdf::page(3pm) - debian |
| text::pdf::ttfont0(3pm) - debian |
| Similar Topics in the Unix Linux Community |
|---|
| awk or sed - Convert 2 lines to 1 line |
| Is UNIX an open source OS ? |
| Introduction |
| Weird 'find' results |
| A (ksh) Library For and From UNIX.com |