Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Extracting data from PDF files into CSV file Post 302553000 by Xterra on Tuesday 6th of September 2011 10:05:14 AM
Old 09-06-2011
Extracting data from PDF files into CSV file

Hi,

I have several hundreds of PDFfiles number 01.pdf, 02.pdf, 03.pdf, etc in one folder. These are vey long documentd with a lot of information (text, tables, figures, etc). I need to extract the information asociated with one disease in particular (Varicella). The information I need to retrieve is the number of cases in each and every state and save it into a CSV file (please see attached examples). In the output file, the first column should have the data from file 01, the second one the data from file 02, so on and so forth. Thus, I will end up having a file that will look something like this:
Quote:
Georgia 1283 583
South Carolina 9454 54
Alabama 1049 1089
Florida 5409 7409
California 10475 8475
I only need the information in tha particular page. The following page presents a different table listing the states in the same order but related to other diseases.
Any help will be greatly appreciated!
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl - extracting data from .csv files

PROJECT: Extracting data from an employee timesheet. The timesheets are done in excel (for user ease) and then converted to .csv files that look like this (see color code key below): ,,,,,,,,,,,,,,,,,,, 9/14/2003,<-- Week Ending,,,,,,,,,,,,,,,,,, Craig Brennan,,,,,,,,,,,,,,,,,,,... (3 Replies)
Discussion started by: kregh99
3 Replies

2. Shell Programming and Scripting

extracting data from files..

frnds, I m having prob woth doing some 2-3 task simultaneously... what I want is... I have lots ( lacs ) of files in a dir... I want.. these info from arround 2-3 months files filename convention is - abc20080403sdas.xyz ( for todays files ) I want 1. total no of files for 1 dec... (1 Reply)
Discussion started by: clx
1 Replies

3. Shell Programming and Scripting

Compare two csv files by two colums and create third file combining data from them.

I've got two large csv text table files with different number of columns each. I have to compare them based on first two columns and create resulting file that would in case of matched first two columns include all values from first one and all values (except first two colums) from second one. I... (5 Replies)
Discussion started by: agb2008
5 Replies

4. Shell Programming and Scripting

extracting data from a .csv file

I have a .csv file equipment,bandtype abc,aws def,mmds ghi,umts jkl,mmds I can get the equipment from `hostname`. In my script i want to check what is the hostname. then see if it exists in the.csv file. if it does then i want to store the second parameter(bandtype) for the corresponding... (3 Replies)
Discussion started by: lassimanji
3 Replies

5. Solaris

Convert csv file into pdf file from putty

Hi, My requirement is that i have to convenrt a csv file inyo a pdf file . So is there any command which will do that ??? thanks Sambuddha (2 Replies)
Discussion started by: Sambuddha
2 Replies

6. Shell Programming and Scripting

Script for extracting data from csv file based on column values.

Hi all, I am new to shell script.I need your help to write a shell script. I need to write a shell script to extract data from a .csv file where columns are ',' separated. The file has 5 columns having values say column 1,column 2.....column 5 as below along with their valuesm.... (3 Replies)
Discussion started by: Vivekit82
3 Replies

7. Shell Programming and Scripting

How to create or convert to pdf files from csv files using shell script?

Hi, Can anyone help me how to convert a .csv file to a .pdf file using shell script Thanks (2 Replies)
Discussion started by: ssk250
2 Replies

8. Shell Programming and Scripting

Compare 2 files of csv file and match column data and create a new csv file of them

Hi, I am newbie in shell script. I need your help to solve my problem. Firstly, I have 2 files of csv and i want to compare of the contents then the output will be written in a new csv file. File1: SourceFile,DateTimeOriginal /home/intannf/foto/IMG_0713.JPG,2015:02:17 11:14:07... (8 Replies)
Discussion started by: refrain
8 Replies

9. Shell Programming and Scripting

Extracting data from specific rows and columns from multiple csv files

I have a series of csv files in the following format eg file1 Experiment Name,XYZ_07/28/15, Specimen Name,Specimen_001, Tube Name, Control, Record Date,7/28/2015 14:50, $OP,XYZYZ, GUID,abc, Population,#Events,%Parent All Events,10500, P1,10071,95.9 Early Apoptosis,1113,11.1 Late... (6 Replies)
Discussion started by: pawannoel
6 Replies

10. Shell Programming and Scripting

Extracting part of data from files

Hi All, I have log files as below. log1.txt <table name="content_analyzer" primary-key="id"> <type="global" /> </table> <table name="content_analyzer2" primary-key="id"> <type="global" /> </table> Time taken: 1.008 seconds ID = gd54321bbvbvbcvb <table name="content_analyzer"... (7 Replies)
Discussion started by: ROCK_PLSQL
7 Replies
PDFTOHTML(1)						      General Commands Manual						      PDFTOHTML(1)

NAME
pdftohtml - program to convert pdf files into html, xml and png images SYNOPSIS
pdftohtml [options] <PDF-file> [<html-file> <xml-file>] DESCRIPTION
This manual page documents briefly the pdftohtml command. This manual page was written for the Debian GNU/Linux distribution because the original program does not have a manual page. pdftohtml is a program that converts pdf documents into html. It generates its output in the current working directory. OPTIONS
A summary of options are included below. -h, -help Show summary of options. -f <int> first page to print -l <int> last page to print -q dont print any messages or errors -v print copyright and version info -p exchange .pdf links with .html -c generate complex output -i ignore images -noframes generate no frames. Not supported in complex output mode. -stdout use standard output -zoom <fp> zoom the pdf document (default 1.5) -xml output for XML post-processing -enc <string> output text encoding name -opw <string> owner password (for encrypted files) -upw <string> user password (for encrypted files) -hidden force hidden text extraction -dev output device name for Ghostscript (png16m, jpeg etc) -nomerge do not merge paragraphs -nodrm override document DRM settings AUTHOR
Pdftohtml was developed by Gueorgui Ovtcharov and Rainer Dorsch. It is based and benefits a lot from Derek Noonburg's xpdf package. This manual page was written by Soren Boll Overgaard <boll@debian.org>, for the Debian GNU/Linux system (but may be used by others). PDFTOHTML(1)
All times are GMT -4. The time now is 08:09 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy