harvesting posts from html code


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting harvesting posts from html code
# 1  
Old 12-11-2010
harvesting posts from html code

How could I use sed to find a string, then take the contents of the next line to a new file? I want to try to collect data from thread. If I look at the html for the page, it seems like I can cut out all the junk by keying on the phrase <div class="postmsg"> then printing the next line to a new file I can then further refine with sed. How is this best accomplished with just bash?
# 2  
Old 12-11-2010
With just bash? That'll be painful. For that matter, so'd sed, or any other line-based tool. It'd be hard even to do it in awk, since without it nesting tags for you, you wouldn't know which </div> to end at, and if you did it yourself that's a mountain of work...

Using a perl module that actually parses HTML instead of trying to sed/grep for something in a file that doesn't even have proper lines is much more reliable.
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How is html code read, compared to say python?

so, the first line of bash, perl, python, ruby, and similar languages must contain the path to the interpreter...i.e. #!/bin/perl, or #!/bin/python. so in the case of a perl script, for instance, a perl script cannot and will never run if the perl program is not installed/present on the system. ... (9 Replies)
Discussion started by: SkySmart
9 Replies

2. UNIX for Beginners Questions & Answers

HTML color code and tabluar issue

input data in a file servic webservice.somthing 200 OK servic1 webservice.somthing 200 OK servic1 webservice.somthing 400 BAD REQEST Below script is making tabular form perfectly. but there are two thing i am not able to achive 1.how can i color the complete row as red when it see '400' in... (12 Replies)
Discussion started by: mirwasim
12 Replies

3. Shell Programming and Scripting

Need to add code to hundreds of .html files

Need assistance to add code to hundreds of .html Code will look like below and needs to be added below <html> tag: <script> Some .js code here </script> This will be used in Fedora release 7 (Moonshine). I will appreciate any type of help and/or orientation. Thank you! (4 Replies)
Discussion started by: Ferocci
4 Replies

4. Shell Programming and Scripting

Adding incomplete HTML code to a file

Hi folks, I am scraping data from the Internet that has the format similar to what's on this page -- Trigger Notice Report The code I've written for scraping and storing results works fine when the HTML code is well written, but not when there are mistakes. In particular, the code breaks when... (4 Replies)
Discussion started by: ksk
4 Replies

5. Shell Programming and Scripting

html code in SQL query

Hi expert, I have a script which is connecting with sql internally, fetch same data, store it in a file and then from os I cat this file and sending it to mail (windows outlook). This is working fine, I just need to know wether we can add some html codes with the sql query like we can add... (0 Replies)
Discussion started by: mcagaurav
0 Replies

6. Shell Programming and Scripting

HTML code remove

Hello, I have one file which has been inserted intermittently with HTML web page. I would like to remove all text between "<html xmlns="http://www.w3.org/1999/xhtml">" and </html> tags. Can any one please suggest me sed regular expression for it. Thanks (3 Replies)
Discussion started by: nrbhole
3 Replies

7. Shell Programming and Scripting

Extract URLs from HTML code using sed

Hello, i try to extract urls from google-search-results, but i have problem with sed filtering of html-code. what i wont is just list of urls thay apears between ........<p><a href=" and next following " in html code. here is my code, i use wget and pipelines to filtering. wget works, but... (13 Replies)
Discussion started by: L0rd
13 Replies

8. Shell Programming and Scripting

fetch substring from html code

hello mates. please help me out once again. i have a html file where i want to fetch out one value from the entire html-code sample html code: ..... <b>Amount:<b> 12345</div> ... now i only want to fetch the 12345 from the html document. how to i tell sed to get me the value from... (2 Replies)
Discussion started by: scarfake
2 Replies
Login or Register to Ask a Question