Download pdf's using wget convert to txt


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Download pdf's using wget convert to txt
# 1  
Old 08-13-2014
Download pdf's using wget convert to txt

Code:
wget -i genedx.txt

The code above will download multiple pdf files from a site, but how can i download and convert these to .txt?

I have attached the master list (genedx.txt - which contains the url and file names)

as well as the two PDF's that are downloaded. I am trying to have those two files download as text files. Thank you.
# 2  
Old 08-13-2014
pdftotext
# 3  
Old 08-13-2014
is that a seperate command or can it be used with the wget command? Thanks.
# 4  
Old 08-13-2014
It is a separate command, which -- like any other separate command -- you can use with wget, either by piping the output or by feeding the resulting file into it once wget is done.
# 5  
Old 08-14-2014
So would the command be:

Code:
 wget -i genedx.txt | info_sheet_ube.pdf Info_Sheet_XomeDx.pdf

and where do I download access pdftotext? Thanks.
# 6  
Old 08-14-2014
No, pipes do not work that way.

What you would actually do depends on the contents of genedx.txt, and what you want to do with it.

Here is the second google hit.
# 7  
Old 08-17-2014
After installing PDFMiner, do batch conversion with a for loop. Nothing to do with pipe here.

Code:
$ for f in `ls *.pdf`; do pdf2txt.py $f > ${f%.pdf}.txt; done

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Solaris

How to convert pdf file to txt?

Hello Unix gurus, I am learning unix. I have lots pdf data files. I need to convert them into txt files. Can you please guide me how to do that? Thanks in advance. Rao (1 Reply)
Discussion started by: raopatwari
1 Replies

2. Shell Programming and Scripting

Wget - working in browser but cannot download from wget

Hi, I need to download a zip file from my the below US govt link. https://www.sam.gov/SAMPortal/extractfiledownload?role=WW&version=SAM&filename=SAM_PUBLIC_MONTHLY_20160207.ZIP I only have wget utility installed on the server. When I use the below command, I am getting error 403... (2 Replies)
Discussion started by: Prasannag87
2 Replies

3. Red Hat

How to convert TXT to PDF in RHEL 6?

Hello friends, I need to convert ASCII text to PDF on RHEL 6 so I did the below and could generate PDF but it has lot of junk/special characters. yum install enscript ghostscript enscript -p output.ps input.txt ps2pdf output.ps output.pdf So I download latest source of Ghostscript... (4 Replies)
Discussion started by: magnus29
4 Replies

4. Shell Programming and Scripting

How to cancel wget download after 1%?

I am running a video download test and automating that. I wanna know how to stop a wget download session when downloads reached 1% Thanks in advance, Tamil (11 Replies)
Discussion started by: tamil.pamaran
11 Replies

5. Shell Programming and Scripting

Files download using wget

Hi, I need to implement below logic to download files daily from a URL. * Need to check if it is yesterday's file (YYYY-DD-MM.dat) * If present then download from URL (sample_url/2013-01-28.dat) * Need to implement wait logic if not present * if it still not able to find the file... (1 Reply)
Discussion started by: rakesh5300
1 Replies

6. UNIX for Dummies Questions & Answers

wget pdf downloading problem

Hi. I am trying to make a mirror of this free online journal: http://www.informaworld.com/smpp/title~content=t716100758~db=all Under the individual issues, the link location for the "Full Text PDF" does not have ".pdf" as an extension -- so when I use wget it misses the file. However clicking... (5 Replies)
Discussion started by: obo1234
5 Replies

7. Shell Programming and Scripting

Perl - Convert html to pdf - PDF::FromHTML

Hi, I am trying to convert html to pdf using perl module PDF::FromHTML, am getting the error as given below. not well-formed (invalid token) at line 2, column 17, byte 56 at C:/Perl/lib/XML/Parser.pm line 187 at C:/Perl/site/lib/PDF/FromHTML.pm line 140 The perl code is as given... (2 Replies)
Discussion started by: DILEEP410
2 Replies

8. UNIX and Linux Applications

download file using wget

I need to download the following srs8.3.0.1.standard.linux26_32.tar.gz file from the following website: http://downloads.biowisdomsrs.com/srs83_dist There are many gzip files along with the above one in the above site but I want to download the srs8.3.0.1.standard.linux26_32.tar.gz only from... (1 Reply)
Discussion started by: alphasahoo
1 Replies

9. Shell Programming and Scripting

download a particular file using wget

Hi All I want to download srs8.3.0.1.standard.linux24_EM64T.tar.gz file from the following website : http://downloads.biowisdomsrs.com/srs83_dist/ But this website contains lots of zipped files I want to download the above file only discarding other zipped files. When I am trying the... (1 Reply)
Discussion started by: alphasahoo
1 Replies

10. UNIX for Dummies Questions & Answers

Using wget to download a file

Hello Everyone, I'm trying to use wget recursively to download a file. Only html files are being downloaded, instead of the target file. I'm trying this for the first time, here's what I've tried: wget -r -O jdk.bin... (4 Replies)
Discussion started by: thoughts
4 Replies
Login or Register to Ask a Question