Script to delete HTML tag


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Script to delete HTML tag
# 1  
Old 11-20-2011
Script to delete HTML tag

Guys,

I have a little script that I got of the internet and that I use in Squid to block ads.
I used that script with linux but now i have moved my servers to freebsd. I have a step learning curve there but it is fun: Back to the script issue.

The script used to work i with linux but freebsd is a bit different.
This line is causing me issue
Code:
# cat /tmp/temp_ad_file | grep "(^|\.)" > "/usr/local/etc/squid/squid.adservers"

If I use the line above it in the script below, the destination folder is going to be completely emptied. The goal is to get rid of the HTML tags "(^|\.)" in the list that is given by http address "pgl.yoyo.org" for bad ad website. Then it is used by squid proxy.
The line above is unusable. The script works well if i modified the line above without the pipe, grep and the tags
Code:
# cat /tmp/temp_ad_file > "/usr/local/etc/squid/squid.adservers"

Then the list is being updated correctly and not emptied but still with the HTML tags in it.
Code:
#!/bin/sh
# Get new ad server list
/usr/local/bin/wget -O /tmp/temp_ad_file \
        'http://pgl.yoyo.org/adservers/serverlist.php?hostformat=squid-dstdom-regex;showintro=0&mimetype=plaintext'
# Clean HTML headers out of the list
cat /tmp/temp_ad_file > "/usr/local/etc/squid/squid.adservers"
# cat /tmp/temp_ad_file | grep "(^|\.)" > "/usr/local/etc/squid/squid.adservers"
# Refresh Squid
/usr/local/sbin/squid -k reconfigure
# Remove tmp file
rm -rf /tmp/temp_ad_file

Any help is much appreciated

Kind Regards,

Last edited by Scott; 11-20-2011 at 02:23 PM.. Reason: Code tags
# 2  
Old 11-20-2011
Can you paste the html tags you are referring to?... actual line in html...
Quote:
The goal is to get rid of the HTML tags "(^|\.)"
--ahamed

Last edited by ahamed101; 11-20-2011 at 12:36 PM.. Reason: removed the suggestion, wasn't sure!
# 3  
Old 11-20-2011
ahamed101, thanks for replying.
I have pasted the start of the file (txt file)
The HTML tags would be "(^|\.)" & $. If left in that list, the acl squid can't use the file.
Code:
(^|\.)www\.sponsor2002\.de$
(^|\.)www1\.gto-media\.com$
(^|\.)www8\.glam\.com$
(^|\.)x-traceur\.com$
(^|\.)x\.mycity\.com$
(^|\.)x6\.yakiuchi\.com$
(^|\.)xchange\.awmcenter\.eu$
(^|\.)xchange\.ro$
(^|\.)xclicks\.net$
(^|\.)xertive\.com$
(^|\.)xiti\.com$

Kind Regards,

Last edited by Scott; 11-20-2011 at 02:22 PM.. Reason: Code tags
# 4  
Old 11-20-2011
Try this...
Code:
grep '(^|\\.)' /tmp/temp_ad_file > "/usr/local/etc/squid/squid.adservers"

--ahamed
# 5  
Old 11-20-2011
ahamed101, thanks but i get where i was before: The squid.adservers get emptied completely with
Code:
# grep ' (^|\\.) ' /tmp/temp_ad_file > "/usr/local/etc/squid/squid.adservers"

My first step is to go and download the list as a txt file and save it to squid folder as squid.adservers.
Then the line above is to update the list once every 3 days. With the line above the destination folder squid.adservers gets emptied when the script is run and the acl inside squid is then complaining.

Regards,

Last edited by Franklin52; 11-21-2011 at 08:04 AM.. Reason: Code tags
# 6  
Old 11-20-2011
I am confused... You have pasted the contents of /tmp/temp_ad_file right?... The grep statement is looking for pattern (^|\.) and populating those in the squid.adservers file... try the grep statement without the redirection and see if you get anything on the screen...

--ahamed
# 7  
Old 11-20-2011
Sorry for the confusion ahamed101
tried without the redirection so staying in the tmp directory with folder "temp_ad_file"
grep '(^|\\.)' /tmp/temp_ad_file. When I look in the temp_ad_file and the list is complete with '(^|\\.)'
Here is an extract below.
Basically what I am trying is clean the html part and redirect the entire list to the squid folder to file called squid.adservers
Code:
(^|\.)zedo\.com$
(^|\.)zencudo\.co\.uk$
(^|\.)zenzuu\.com$
(^|\.)zeus\.developershed\.com$
(^|\.)zeusclicks\.com$
(^|\.)zintext\.com$
(^|\.)zmedia\.com$

Kind Regards,

Last edited by Scott; 11-20-2011 at 02:22 PM.. Reason: Code tags
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Multiline html tag parse shell script

Hello, I want to parse the contents of a multiline html tag ex: <html> <body> <p>some other text</p> <div> <p class="margin-bottom-0"> text1 <br> text2 <br> <br> text3 </p> </div> </body> (15 Replies)
Discussion started by: SorcRR
15 Replies

2. Shell Programming and Scripting

Print Value between desired html tag

Hi, I have a html line as below :-... (6 Replies)
Discussion started by: satishmallidi
6 Replies

3. Shell Programming and Scripting

Search for a html tag and print the entire tag

I want to print from <fruits> to </fruits> tag which have <fruit> as mango. Also i want both <fruits> and </fruits> in output. Please help eg. <fruits> <fruit id="111">mango<fruit> . another 20 lines . </fruits> (3 Replies)
Discussion started by: Ashik409
3 Replies

4. Shell Programming and Scripting

Extracting a string from html tag

Hi I am new to string extractions in shell script... I am trying to extract a string such as #1753 from html tag looks like below. <a class="model-link tl-tr" href="lastSuccessfulBuild/">Last successful build (#1753), 40 min ago</a> and want the value as 1753 Could someone help me to... (3 Replies)
Discussion started by: hicharbo
3 Replies

5. Shell Programming and Scripting

Add the html tag first and last line the file

Hi, i have 30 html files and i want to add the html tag first (<html>) and end of the line </html> tag..How to do it in script. Thanks, (7 Replies)
Discussion started by: bmk
7 Replies

6. Shell Programming and Scripting

how to delete certain java script from html files using sed

I am cleaning forum posts to convert them in offline reading version with clean html text. All files are with html extension and reside in one folder. There is some java script i would like to remove, which looks like <script LANGUAGE="JavaScript1.1"> <!-- function mMz() { var mPz = "";... (2 Replies)
Discussion started by: georgi58
2 Replies

7. Shell Programming and Scripting

extracting Line between HTML tag

Hi everyone: I want to extract string which is in between certain html tag. e.g. I tried with grep,cut, awk but could not find exact syntax for this one. :wall: PS>Sorry about bad english. (8 Replies)
Discussion started by: newlook2011
8 Replies

8. Shell Programming and Scripting

How can i delete html attributes from tag ?

Input: <table class="pixelBorderTable faqTable" width="100%" border="1" cellpadding="3" cellspacing="0"> <tbody><tr> <td class="pixelBorderTableHeaderTd" valign="top" width="20%" bgcolor="#666666"><p>&nbsp;</p></td> <td class="pixelBorderTableHeaderTd" valign="top"... (1 Reply)
Discussion started by: cola
1 Replies

9. Shell Programming and Scripting

how to use html tag in shell scripting

Hai friends I have a small doubt.. how can we use html tag in shell scripting code : echo "<html>" echo "<body>" echo " welcome to peace world " echo "</body>" echo "</html>" output displayed like this: <html> <body> welcome to peace world </body> </html> (5 Replies)
Discussion started by: jrex1983
5 Replies

10. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part. Same problem happens in "type" command in MS-DOS. I know you can do it by opening it in Internet Explorer,... (4 Replies)
Discussion started by: los111
4 Replies
Login or Register to Ask a Question