Sponsored Content
Top Forums Shell Programming and Scripting Script to extract forum posts Post 302492673 by KidCactus on Tuesday 1st of February 2011 06:07:40 AM
Old 02-01-2011
That would of course be a better idea.

I have this html file (attached to the post), and I want to cut out all text between:

thread_id=666&page=6666#666666">

and

</a>

And the between

<div style="padding:2px 0px 3px 0px;">

and

</div>

wherever that occurs in the html file.
 

5 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script to monitor forum

Hello. I am attempting to write a pretty complex script that monitors a forum and alerts me whenever a new post is made (this part of the script is done). I then want to have the script auto reply to the post with a predetermined message. The one catch here is this is a VERY popular forum. ... (0 Replies)
Discussion started by: yousillygoose
0 Replies

2. UNIX for Advanced & Expert Users

Help! SHELL or AWK script - only the masters of the forum will solve

Hello everybody! I have no experience with shell Programmer, but I need to compare 02 files. Txt and generate an output or a new file, after the comparisons. see: If the column 1 of file1 is equal to column 1 of file2, and column 3 of file2 contains the column 4 of file1, output: column1... (4 Replies)
Discussion started by: He2
4 Replies

3. UNIX for Dummies Questions & Answers

Script required (Example of a Bad Forum Subject)

A file contains the following data Name, Age, Sex, city, country abc, 20, m, tokyo, Japan def, 21, f, sydney, Australia ghd, 23, m, chicago, USA rww, 29, f, london, UK I need the city column to be replaced with XXX as follows Name, Age, Sex, city, country abc, 20, m, XXX, Japan... (8 Replies)
Discussion started by: vva
8 Replies

4. What is on Your Mind?

Forum Update: Disabled Home Page Forum Statistics for Guests (Not Registered)

Just a quick update; to speed up the forums, I have disabled the forum statistics on the home page for non registered users. No changes for registered users. (0 Replies)
Discussion started by: Neo
0 Replies

5. What is on Your Mind?

Mobile: Advanced Forum Statistics to Forum Home Page

For mobile users, I have just added a "first beta" Advanced Forum Statistics to the home page on mobile using CSS overflow:auto; so you can swipe if you need to see more. Google Search Console mobile usability says this page is "mobile friendly" so perhaps this will be useful for some of our... (12 Replies)
Discussion started by: Neo
12 Replies
PDFTOHTML(1)						      General Commands Manual						      PDFTOHTML(1)

NAME
pdftohtml - program to convert pdf files into html, xml and png images SYNOPSIS
pdftohtml [options] <PDF-file> [<html-file> <xml-file>] DESCRIPTION
This manual page documents briefly the pdftohtml command. This manual page was written for the Debian GNU/Linux distribution because the original program does not have a manual page. pdftohtml is a program that converts pdf documents into html. It generates its output in the current working directory. OPTIONS
A summary of options are included below. -h, -help Show summary of options. -f <int> first page to print -l <int> last page to print -q dont print any messages or errors -v print copyright and version info -p exchange .pdf links with .html -c generate complex output -i ignore images -noframes generate no frames. Not supported in complex output mode. -stdout use standard output -zoom <fp> zoom the pdf document (default 1.5) -xml output for XML post-processing -enc <string> output text encoding name -opw <string> owner password (for encrypted files) -upw <string> user password (for encrypted files) -hidden force hidden text extraction -dev output device name for Ghostscript (png16m, jpeg etc) -nomerge do not merge paragraphs -nodrm override document DRM settings AUTHOR
Pdftohtml was developed by Gueorgui Ovtcharov and Rainer Dorsch. It is based and benefits a lot from Derek Noonburg's xpdf package. This manual page was written by Soren Boll Overgaard <boll@debian.org>, for the Debian GNU/Linux system (but may be used by others). PDFTOHTML(1)
All times are GMT -4. The time now is 10:53 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy