I have several hundreds of PDFfiles number 01.pdf, 02.pdf, 03.pdf, etc in one folder. These are vey long documentd with a lot of information (text, tables, figures, etc). I need to extract the information asociated with one disease in particular (Varicella). The information I need to retrieve is the number of cases in each and every state and save it into a CSV file (please see attached examples). In the output file, the first column should have the data from file 01, the second one the data from file 02, so on and so forth. Thus, I will end up having a file that will look something like this:
Quote:
Georgia 1283 583
South Carolina 9454 54
Alabama 1049 1089
Florida 5409 7409
California 10475 8475
I only need the information in tha particular page. The following page presents a different table listing the states in the same order but related to other diseases.
Any help will be greatly appreciated!
PROJECT: Extracting data from an employee timesheet. The timesheets are done in excel (for user ease) and then converted to .csv files that look like this (see color code key below):
,,,,,,,,,,,,,,,,,,,
9/14/2003,<-- Week Ending,,,,,,,,,,,,,,,,,,
Craig Brennan,,,,,,,,,,,,,,,,,,,... (3 Replies)
frnds,
I m having prob woth doing some 2-3 task simultaneously...
what I want is...
I have lots ( lacs ) of files in a dir...
I want.. these info from arround 2-3 months files
filename convention is - abc20080403sdas.xyz ( for todays files )
I want
1. total no of files for 1 dec... (1 Reply)
I've got two large csv text table files with different number of columns each.
I have to compare them based on first two columns and create resulting file
that would in case of matched first two columns include all values from first one and all values (except first two colums) from second one. I... (5 Replies)
I have a .csv file
equipment,bandtype
abc,aws
def,mmds
ghi,umts
jkl,mmds
I can get the equipment from `hostname`.
In my script i want to check what is the hostname. then see if it exists in the.csv file. if it does then i want to store the second parameter(bandtype) for the corresponding... (3 Replies)
Hi all,
I am new to shell script.I need your help to write a shell script.
I need to write a shell script to extract data from a .csv file where columns are ',' separated.
The file has 5 columns having values say column 1,column 2.....column 5 as below along with their valuesm.... (3 Replies)
Hi, I am newbie in shell script.
I need your help to solve my problem.
Firstly, I have 2 files of csv and i want to compare of the contents then the output will be written in a new csv file.
File1:
SourceFile,DateTimeOriginal
/home/intannf/foto/IMG_0713.JPG,2015:02:17 11:14:07... (8 Replies)
I have a series of csv files in the following format
eg file1
Experiment Name,XYZ_07/28/15,
Specimen Name,Specimen_001,
Tube Name, Control,
Record Date,7/28/2015 14:50,
$OP,XYZYZ,
GUID,abc,
Population,#Events,%Parent
All Events,10500,
P1,10071,95.9
Early Apoptosis,1113,11.1
Late... (6 Replies)
Hi All,
I have log files as below.
log1.txt
<table name="content_analyzer" primary-key="id">
<type="global" />
</table>
<table name="content_analyzer2" primary-key="id">
<type="global" />
</table>
Time taken: 1.008 seconds
ID = gd54321bbvbvbcvb
<table name="content_analyzer"... (7 Replies)
Discussion started by: ROCK_PLSQL
7 Replies
LEARN ABOUT SUSE
pdftohtml
PDFTOHTML(1) General Commands Manual PDFTOHTML(1)NAME
pdftohtml - program to convert pdf files into html, xml and png images
SYNOPSIS
pdftohtml [options] <PDF-file> [<html-file> <xml-file>]
DESCRIPTION
This manual page documents briefly the pdftohtml command. This manual page was written for the Debian GNU/Linux distribution because the
original program does not have a manual page.
pdftohtml is a program that converts pdf documents into html. It generates its output in the current working directory.
OPTIONS
A summary of options are included below.
-h, -help
Show summary of options.
-f <int>
first page to print
-l <int>
last page to print
-q dont print any messages or errors
-v print copyright and version info
-p exchange .pdf links with .html
-c generate complex output
-i ignore images
-noframes
generate no frames. Not supported in complex output mode.
-stdout
use standard output
-zoom <fp>
zoom the pdf document (default 1.5)
-xml output for XML post-processing
-enc <string>
output text encoding name
-opw <string>
owner password (for encrypted files)
-upw <string>
user password (for encrypted files)
-hidden
force hidden text extraction
-dev output device name for Ghostscript (png16m, jpeg etc)
-nomerge
do not merge paragraphs
-nodrm override document DRM settings
AUTHOR
Pdftohtml was developed by Gueorgui Ovtcharov and Rainer Dorsch. It is based and benefits a lot from Derek Noonburg's xpdf package.
This manual page was written by Soren Boll Overgaard <boll@debian.org>, for the Debian GNU/Linux system (but may be used by others).
PDFTOHTML(1)