Sponsored Content
Operating Systems Linux Learning scrapers, webcrawlers, search engines and CURL Post 303019102 by Neo on Friday 22nd of June 2018 11:16:30 PM
Old 06-23-2018
Quote:
Originally Posted by TBotNik
  • Text only vs regular brower: which is best?
  • wget vs php fileopen vs CURL: Which is best?
  • HTML tag find/parse: Are there libraries that effectively do this?
  • HTML tag find/parse: Is REGEX the best way to parse these? Where are examples?
  • Checking for the new meta-tags of:
I think you are better off to get web page content using PHP scripts and parse the files with REGEX.

If you Google around, I am sure you can find many sample PHP scripts that do most of what you want. This is very old technology and there is no need to reinvent the wheel parsing HTML data.
 

3 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

I dont want to know any search engines

I just want to know where I can download it on this website plz (1 Reply)
Discussion started by: memattmyself
1 Replies

2. UNIX for Dummies Questions & Answers

Using cURL to save online search results

Hi, I'm attacking this from ignorance because I am not sure how to even ask the question. Here is the mission: I have a list of about 4,000 telephone numbers for past customers. I need to determine how many of these customers are still in business. Obviously, I could call all the numbers.... (0 Replies)
Discussion started by: jccbin
0 Replies

3. Shell Programming and Scripting

Checking status of engines using C-shell

I am relatively new to scripting. I am trying to develop a script that will 1. Source an executable file as an argument to the script that sets up the environment 2. Run a command "stat" that gives the status of 5 Engines running on the system 3. Check the status of the 5 Engines as either... (0 Replies)
Discussion started by: paslas
0 Replies
STRIP_TAGS(3)								 1							     STRIP_TAGS(3)

strip_tags - Strip HTML and PHP tags from a string

SYNOPSIS
string strip_tags (string $str, [string $allowable_tags]) DESCRIPTION
This function tries to return a string with all NULL bytes, HTML and PHP tags stripped from a given $str. It uses the same tag stripping state machine as the fgetss(3) function. PARAMETERS
o $str - The input string. o $allowable_tags - You can use the optional second parameter to specify tags which should not be stripped. Note HTML comments and PHP tags are also stripped. This is hardcoded and can not be changed with $allowable_tags. Note This parameter should not contain whitespace. strip_tags(3) sees a tag as a case-insensitive string between < and the first whitespace or >. Note In PHP 5.3.4 and later, you will also need to include the self-closing XHTML tag to strip these from $str. For example, to strip both <br> and <br/>, you should use: <?php strip_tags($input, '<br><br/>'); ?> RETURN VALUES
Returns the stripped string. CHANGELOG
+--------+---------------------------------------------------+ |Version | | | | | | | Description | | | | +--------+---------------------------------------------------+ | 5.3.4 | | | | | | | strip_tags(3) no longer strips self-closing XHTML | | | tags unless the self-closing XHTML tag is also | | | given in $allowable_tags. | | | | | 5.0.0 | | | | | | | strip_tags(3) is now binary safe. | | | | +--------+---------------------------------------------------+ EXAMPLES
Example #1 strip_tags(3) example <?php $text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>'; echo strip_tags($text); echo " "; // Allow <p> and <a> echo strip_tags($text, '<p><a>'); ?> The above example will output: Test paragraph. Other text <p>Test paragraph.</p> <a href="#fragment">Other text</a> NOTES
Warning Because strip_tags(3) does not actually validate the HTML, partial or broken tags can result in the removal of more text/data than expected. Warning This function does not modify any attributes on the tags that you allow using $allowable_tags, including the style and onmouseover attributes that a mischievous user may abuse when posting text that will be shown to other users. Note Tag names within the input HTML that are greater than 1023 bytes in length will be treated as though they are invalid, regardless of the $allowable_tags parameter. SEE ALSO
htmlspecialchars(3). PHP Documentation Group STRIP_TAGS(3)
All times are GMT -4. The time now is 06:54 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy