Sponsored Content
Full Discussion: Script to delete HTML tag
Top Forums Shell Programming and Scripting Script to delete HTML tag Post 302575115 by agama on Sunday 20th of November 2011 02:38:16 PM
Old 11-20-2011
Boy is this thread confusing Smilie

A couple of observations that I have that might help clear the problem. First, the original post refers to 'removing html' from the file. However the file pulled from yoyo.org with wget using text/plain does not contain any html. More so, the reference to
Quote:
The HTML tags would be "(^|\.)" & $. If left in that list, the acl squid can't use the file.
seems to indicate that while incorrectly calling (^|\.) HTML, these strings are not desired. Depending on how squid is configured, this is true, they need to be removed.

The file from yoyo.com is a list of regular expressions and if squid isn't configured with acl ads dstdom_regex -i "[/usr/local]/etc/squid.adservers" then the regex parts will cause problems. I believe this is the reason things have stopped working is because the configuration on the old machine isn't the same as on FreeBSD. Given this, the original code that is extracting the regex lines from the yoyo.com data using the regex string makes sense.

This doesn't explain why the output file is ending up empty, but might change the focus on the problem a bit. If the squid config is changed to match the old machine, then the regex file can be used as is, otherwise the regex portions should be stripped:

Code:
sed 's/[()|.$^]//g' /tmp/temp_ad_file >/usr/local/etc/squid/squid.adservers

Care should be taken if these strings are used without the regex as they might match more URLs than desired.

I'm interested in knowing if the sed above has the same problem -- generates an empty file. If it does, then I question the permissions on the output file. What happens if the output file is changed to something like >/tmp/foo?
These 2 Users Gave Thanks to agama For This Post:
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part. Same problem happens in "type" command in MS-DOS. I know you can do it by opening it in Internet Explorer,... (4 Replies)
Discussion started by: los111
4 Replies

2. Shell Programming and Scripting

how to use html tag in shell scripting

Hai friends I have a small doubt.. how can we use html tag in shell scripting code : echo "<html>" echo "<body>" echo " welcome to peace world " echo "</body>" echo "</html>" output displayed like this: <html> <body> welcome to peace world </body> </html> (5 Replies)
Discussion started by: jrex1983
5 Replies

3. Shell Programming and Scripting

How can i delete html attributes from tag ?

Input: <table class="pixelBorderTable faqTable" width="100%" border="1" cellpadding="3" cellspacing="0"> <tbody><tr> <td class="pixelBorderTableHeaderTd" valign="top" width="20%" bgcolor="#666666"><p>&nbsp;</p></td> <td class="pixelBorderTableHeaderTd" valign="top"... (1 Reply)
Discussion started by: cola
1 Replies

4. Shell Programming and Scripting

extracting Line between HTML tag

Hi everyone: I want to extract string which is in between certain html tag. e.g. I tried with grep,cut, awk but could not find exact syntax for this one. :wall: PS>Sorry about bad english. (8 Replies)
Discussion started by: newlook2011
8 Replies

5. Shell Programming and Scripting

how to delete certain java script from html files using sed

I am cleaning forum posts to convert them in offline reading version with clean html text. All files are with html extension and reside in one folder. There is some java script i would like to remove, which looks like <script LANGUAGE="JavaScript1.1"> <!-- function mMz() { var mPz = "";... (2 Replies)
Discussion started by: georgi58
2 Replies

6. Shell Programming and Scripting

Add the html tag first and last line the file

Hi, i have 30 html files and i want to add the html tag first (<html>) and end of the line </html> tag..How to do it in script. Thanks, (7 Replies)
Discussion started by: bmk
7 Replies

7. Shell Programming and Scripting

Extracting a string from html tag

Hi I am new to string extractions in shell script... I am trying to extract a string such as #1753 from html tag looks like below. <a class="model-link tl-tr" href="lastSuccessfulBuild/">Last successful build (#1753), 40 min ago</a> and want the value as 1753 Could someone help me to... (3 Replies)
Discussion started by: hicharbo
3 Replies

8. Shell Programming and Scripting

Search for a html tag and print the entire tag

I want to print from <fruits> to </fruits> tag which have <fruit> as mango. Also i want both <fruits> and </fruits> in output. Please help eg. <fruits> <fruit id="111">mango<fruit> . another 20 lines . </fruits> (3 Replies)
Discussion started by: Ashik409
3 Replies

9. Shell Programming and Scripting

Print Value between desired html tag

Hi, I have a html line as below :-... (6 Replies)
Discussion started by: satishmallidi
6 Replies

10. UNIX for Beginners Questions & Answers

Multiline html tag parse shell script

Hello, I want to parse the contents of a multiline html tag ex: <html> <body> <p>some other text</p> <div> <p class="margin-bottom-0"> text1 <br> text2 <br> <br> text3 </p> </div> </body> (15 Replies)
Discussion started by: SorcRR
15 Replies
squid(8)						      System Manager's Manual							  squid(8)

NAME
squid - proxy caching server SYNOPSIS
squid [ -dhsvzCDFNRVYX ] [ -f config-file ] [ -[ au ] port ] [ -k signal ] DESCRIPTION
squid is a high-performance proxy caching server for web clients, supporting FTP, gopher, and HTTP data objects. Unlike traditional caching software, squid handles all requests in a single, non-blocking, I/O-driven process. squid keeps meta data and especially hot objects cached in RAM, caches DNS lookups, supports non-blocking DNS lookups, and implements nega- tive caching of failed requests. squid supports SSL, extensive access controls, and full request logging. By using the lightweight Internet Cache Protocol, squid caches can be arranged in a hierarchy or mesh for additional bandwidth savings. squid consists of a main server program squid, a Domain Name System lookup program dnsserver, some optional programs for rewriting requests and performing authentication, and some management and client tools. When squid starts up, it spawns a configurable number of dnsserver processes, each of which can perform a single, blocking Domain Name System (DNS) lookup. This reduces the amount of time the cache waits for DNS lookups. squid is derived from the ARPA-funded Harvest Project http://harvest.cs.colorado.edu/ This manual page only lists the command line arguments. For details on how to configure squid see the file /etc/squid/squid.conf, the FAQ included with the distribution and the documentation at the squid home page http://www.squid-cache.org OPTIONS
-a port Specify HTTP port number (default: 3128). -d level Write debugging to stderr also. -f file Use the given config-file instead of /etc/squid/squid.conf -h Print help message. -k reconfigure | rotate | shutdown | interrupt | kill | debug | check | parse Parse configuration file, then send signal to running copy (except -k parse) and exit. -s Enable logging to syslog. -u port Specify ICP port number (default: 3130), disable with 0. -v Print version. -z Create swap directories -C Do not catch fatal signals. -D Disable initial DNS tests. -F Don't serve any requests until store is rebuilt. -N No daemon mode. -R Do not set REUSEADDR on port. -V Virtual host httpd-accelerator. -X Force full debugging. -Y Only return UDP_HIT or UDP_MISS_NOFETCH during fast reload. FILES
/etc/squid/squid.conf The main configuration file. You must initially make changes to this file for squid to work. For example, the default configura- tion does not allow access from any browser. squid version 2.0 squid(8)
All times are GMT -4. The time now is 06:25 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy