Sponsored Content
Top Forums Shell Programming and Scripting Problem when extracting the title of HTML doc Post 302263232 by i007 on Monday 1st of December 2008 05:36:24 AM
Old 12-01-2008
Problem when extracting the title of HTML doc

Dear all.

I need to extract the title (text between <title> and </title>) of a set of HTML documents.
I've found a command that makes the work of extracting the text, but it does not always work.

It works with the next example:
Code:
cat a.txt 
htmltext<title>This is a HTML title</title>blablalbla

Code:
grep title a.txt | sed -n 's/.*<title>\(.*\)<\/title>.*/\1/ip;T;q'
This is a HTML title

However, it does not works with a real example:

Code:
cat b.txt 
<head><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"></meta> <title>This my new page
</title> <link href...></link>

Code:
grep title b.txt | sed -n 's/.*<title>\(.*\)<\/title>.*/\1/ip;T;q'

The last command do not return anything.

I appreciate any comment or suggestion.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Problem in extracting vector data

Hi, Currently I have two files; A and B. File A has below data:- -3 + <1 2 3 4 5 6 7 8 1 2 > - 1] -2 + <8 8 3 4 0 3 7 9 1 3 > - 1] -1 + <3 7 3 4 8 2 7 2 1 2 > - 1] -3 + <2 2 3 4 3 1 7 8 8 2 > - 1] and File B has below data:- <9 1 1 4 2 6 3 8 8 9 > From these two files, I... (2 Replies)
Discussion started by: ahjiefreak
2 Replies

2. Shell Programming and Scripting

Problem with here doc operator in FTP script

Hello folks, I am facing a problem with the following korn shell script snippet: ftp -n -i -v <<EOF print -p open $CURR_HOST print -p user $USER $PASSWD print -p binary print -p cd /mydir/subdir/datadir print -p get $FILENAME print -p bye EOF exit It gives me the following... (3 Replies)
Discussion started by: Rajat
3 Replies

3. UNIX Desktop Questions & Answers

Terminal title bar tweak discrepancy problem in Cygwin/X

Code for the tweak (not my fave 'running process' but the more popular 'working directory') : case "$TERM" in xterm*|rxvt*|rxvt-unicode*) PROMPT_COMMAND='echo -e "\033]0;$TERM: ${PWD}\007"' ;; *) ;; esac Where it works: rxvt (the one I run 'rootless' outside of ... (0 Replies)
Discussion started by: SilversleevesX
0 Replies

4. Shell Programming and Scripting

Problem with while reading HTML inputs

Hi All, I am not able to read my HTML form inputs properly in my script. I have a textarea in my form where user needs to enter sql query... but when user enter query like below : select * from order_queue where NUM_OF_PICKUP >=3 and TRANSACTION_TYPE=4 ; its coming like : select 171_arc... (3 Replies)
Discussion started by: askumarece
3 Replies

5. UNIX for Dummies Questions & Answers

Problem in extracting the string between parenthesis

Hi Team, I am not able to extract string between parenthesis.I need to extract string between first parenthesis only. Please find the sample data and code. But the below my code is returning "DW_EFD_TXN_ID", "PRCS_DTE" & INITIAL 52428800 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645... (12 Replies)
Discussion started by: suriyavignesh
12 Replies

6. Shell Programming and Scripting

Extracting anchor text and its URL from HTML files in BASH

Hi All, I have some HTML files and my requirement is to extract all the anchor text words from the HTML files along with their URLs and store the result in a separate text file separated by space. For example, <a href="/kid/stay_healthy/">Staying Healthy</a> which has /kid/stay_healthy/ as... (3 Replies)
Discussion started by: shoaibjameel123
3 Replies

7. Shell Programming and Scripting

extracting Line between HTML tag

Hi everyone: I want to extract string which is in between certain html tag. e.g. I tried with grep,cut, awk but could not find exact syntax for this one. :wall: PS>Sorry about bad english. (8 Replies)
Discussion started by: newlook2011
8 Replies

8. UNIX for Dummies Questions & Answers

problem with extracting line in file

My file looks like this and i need to only extract those with PDT_AP21_B and output it to another file. Can anyone help? Thanks. PDT_AP21_R,,, 11 TYS,,,,T17D1207230742TYO***T17DS,,C PDT_AP21_L,,,9631166650001 ,,,,T17D1207230903TYOTYST17DS ,,C... (3 Replies)
Discussion started by: Alyssa
3 Replies

9. Shell Programming and Scripting

Extracting a string from html tag

Hi I am new to string extractions in shell script... I am trying to extract a string such as #1753 from html tag looks like below. <a class="model-link tl-tr" href="lastSuccessfulBuild/">Last successful build (#1753), 40 min ago</a> and want the value as 1753 Could someone help me to... (3 Replies)
Discussion started by: hicharbo
3 Replies

10. Shell Programming and Scripting

Add Color To html Doc

I have a script which converts a .csv file to html nicely. Trying to add 3 colors, green, yellow and red to the output depending upon the values in the cells. Tried some printf command but just can't seem to get any where. Any ideas would be appreciated. nawk 'BEGIN{ FS="," print ... (7 Replies)
Discussion started by: jimmyf
7 Replies
TRIM(3) 								 1								   TRIM(3)

trim - Strip whitespace (or other characters) from the beginning and end of a string

SYNOPSIS
string trim (string $str, [string $character_mask = " 0r B"]) DESCRIPTION
This function returns a string with whitespace stripped from the beginning and end of $str. Without the second parameter, trim(3) will strip these characters: o " " (ASCII 32 ( 0x20)), an ordinary space. o " " (ASCII 9 ( 0x09)), a tab. o " " (ASCII 10 ( 0x0A)), a new line (line feed). o " " (ASCII 13 ( 0x0D)), a carriage return. o "" (ASCII 0 ( 0x00)), the NUL-byte. o "x0B" (ASCII 11 ( 0x0B)), a vertical tab. PARAMETERS
o $str - The string that will be trimmed. o $character_mask - Optionally, the stripped characters can also be specified using the $character_mask parameter. Simply list all characters that you want to be stripped. With .. you can specify a range of characters. RETURN VALUES
The trimmed string. EXAMPLES
Example #1 Usage example of trim(3) <?php $text = " These are a few words :) ... "; $binary = "x09Example stringx0A"; $hello = "Hello World"; var_dump($text, $binary, $hello); print " "; $trimmed = trim($text); var_dump($trimmed); $trimmed = trim($text, " ."); var_dump($trimmed); $trimmed = trim($hello, "Hdle"); var_dump($trimmed); $trimmed = trim($hello, 'HdWr'); var_dump($trimmed); // trim the ASCII control characters at the beginning and end of $binary // (from 0 to 31 inclusive) $clean = trim($binary, "x00..x1F"); var_dump($clean); ?> The above example will output: string(32) " These are a few words :) ... " string(16) " Example string " string(11) "Hello World" string(28) "These are a few words :) ..." string(24) "These are a few words :)" string(5) "o Wor" string(9) "ello Worl" string(14) "Example string" Example #2 Trimming array values with trim(3) <?php function trim_value(&$value) { $value = trim($value); } $fruit = array('apple','banana ', ' cranberry '); var_dump($fruit); array_walk($fruit, 'trim_value'); var_dump($fruit); ?> The above example will output: array(3) { [0]=> string(5) "apple" [1]=> string(7) "banana " [2]=> string(11) " cranberry " } array(3) { [0]=> string(5) "apple" [1]=> string(6) "banana" [2]=> string(9) "cranberry" } NOTES
Note Possible gotcha: removing middle characters Because trim(3) trims characters from the beginning and end of a string, it may be confusing when characters are (or are not) removed from the middle. trim('abc', 'bad') removes both 'a' and 'b' because it trims 'a' thus moving 'b' to the beginning to also be trimmed. So, this is why it "works" whereas trim('abc', 'b') seemingly does not. SEE ALSO
ltrim(3), rtrim(3), str_replace(3). PHP Documentation Group TRIM(3)
All times are GMT -4. The time now is 11:40 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy