Sponsored Content
Top Forums Shell Programming and Scripting help with sed needed to extract content from html tags Post 302604852 by seb001 on Tuesday 6th of March 2012 06:44:43 AM
Old 03-06-2012
yes all sed's from post 5th and 6th return the same string
Code:
abc123def678TextText

Code:
 sed -n "/1st/{s/.*1st'>//;s/<\/textarea>.*//p;}" infile

this one does the trick and returns content of 1st textarea correctly, Thanks! Smilie

Code:
 sed -n '/1st/{s/<[^>]*>//gp;}' infile

this one returns all content from in between html brackets, just like sed's in 5th and 6th post
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to supplement HTML tags with SED

I am cleaning up HTML with sed. With the regexp <a name="+"></a><h>*<span class="mw-headline" >+</span></h> I can find the tags I need. But when I place them in a sed command, sed fails. So I started building up from a smaller command. This is where I am now: sed -r -e s/"<a... (3 Replies)
Discussion started by: DocBrewer
3 Replies

2. UNIX for Advanced & Expert Users

sed to extract HTML content

Hiya, I am trying to extract a news article from a web page. The sed I have written brings back a lot of Javascript code and sometimes advertisments too. Can anyone please help with this one ??? I need to fix this sed so it picks up the article ONLY (don't worry about the title or date .. i got... (2 Replies)
Discussion started by: stargazerr
2 Replies

3. Shell Programming and Scripting

sed to extract only floating point numbers from HTML

Hi All, I'm trying to extract some floating point numbers from within some HTML code like this: <TR><TD class='awrc'>Parse CPU to Parse Elapsd %:</TD><TD ALIGN='right' class='awrc'> 64.50</TD><TD class='awrc'>% Non-Parse CPU:</TD><TD ALIGN='right' class='awrc'> ... (2 Replies)
Discussion started by: pondlife
2 Replies

4. Shell Programming and Scripting

Extract URLs from HTML code using sed

Hello, i try to extract urls from google-search-results, but i have problem with sed filtering of html-code. what i wont is just list of urls thay apears between ........<p><a href=" and next following " in html code. here is my code, i use wget and pipelines to filtering. wget works, but... (13 Replies)
Discussion started by: L0rd
13 Replies

5. Shell Programming and Scripting

sed - striping out html tags

I have pasted the contents of a log file (swmbackup.wrkstn.1262071383.sales2a) below: Workstation: sales2a<BR Vault sales2a-hogwarts will be initialized.<BR <font color="red"There was a problem mounting /mnt/sales2a/desktop$ </FONT<BR <font color="red"There was a problem mounting... (4 Replies)
Discussion started by: bigtonydallas
4 Replies

6. Shell Programming and Scripting

SED to extract HTML text data, not quite right!

I am attempting to extract weather data from the following website, but for the Victoria area only: Text Forecasts - Environment Canada I use this: sed -n "/Greater Victoria./,/Fraser Valley./p" But that phrasing does not sometimes get it all and think perhaps the website has more... (2 Replies)
Discussion started by: lagagnon
2 Replies

7. Shell Programming and Scripting

awk -- Extract data from html within multiple tags as reference

Hi, I'm trying to get some data from an html file, but the problem is before it can extract the information I have multiple patterns that need to be passed through. https://www.unix.com/shell-programming-scripting/150711-extract-data-awk-html-files.html Is a similar problem. The only... (5 Replies)
Discussion started by: counfhou
5 Replies

8. UNIX for Dummies Questions & Answers

Replacing HTML tags with sed

Ok, so this is stupid simple, and I know I am going to feel like an idiot when I get help. I am altering a HTML report that has contraband in it so that the links to said contraband and the images are not shown. The link/img pairs are in the form of : <a... (5 Replies)
Discussion started by: twjolson
5 Replies

9. Shell Programming and Scripting

Print content between two html tags

Hi Expert, Is there any other way to print and write to a same filename the content between two html tags? Here the sample: cat file.html <div id="outline"> hello world<br> </div> <div id="container_faq"> test1<br> </div> <div class="widget_quick"> thead test<br> </div> ... (3 Replies)
Discussion started by: lxdorney
3 Replies

10. Shell Programming and Scripting

Awk/sed HTML extract

I'm extracting text between table tags in HTML <th><a href="/wiki/Buick_LeSabre" title="Buick LeSabre">Buick LeSabre</a></th> using this: awk -F "</*th>" '/<\/*th>/ {print $2}' auto2 > auto3 then this (text between a href): sed -e 's/\(<*>\)//g' auto3 > auto4 How to shorten this into one... (8 Replies)
Discussion started by: p1ne
8 Replies
mdbGeneral(5)							 The m17n Library						     mdbGeneral(5)

NAME
mdbGeneral - General Format DESCRIPTION
The mdatabase_load() function returns the data specified by tags in the form of plist if the first tag is not Mchartable nor Mcharset. The keys of the returned plist are limited to Minteger, Msymbol, Mtext, and Mplist. The type of the value is unambiguously determined by the corresponding key. If the key is Minteger, the value is an integer. If the key is Msymbol, the value is a symbol. And so on. A number of expressions are possible to represent a plist. For instance, we can use the form (K1:V1, K2:V2, ..., Kn:Vn) to represent a plist whose first property key and value are K1 and V1, second key and value are K2 and V2, and so on. However, we can use a simpler expression here because the types of plists used in the m17n database are fairly restricted. Hereafter, we use an expression, which is similar to S-expression, to represent a plist. (Actually, the default database loader of the m17n library is designed to read data files written in this expression.) The expression consists of one or more elements. Each element represents a property, i.e. a single element of a plist. Elements are separated by one or more whitespaces, i.e. a space (code 32), a tab (code 9), or a newline (code 10). Comments begin with a semicolon (;) and extend to the end of the line. The key and the value of each property are determined based on the type of the element as explained below. o INTEGER An element that matches the regular expression -?[0-9]+ or 0[xX][0-9A-Fa-f]+ represents a property whose key is Minteger. An element matching the former expression is interpreted as an integer in decimal notation, and one matching the latter is interpreted as an integer in hexadecimal notation. The value of the property is the result of interpretation. For instance, the element 0xA0 represents a property whose value is 160 in decimal. o SYMBOL An element that matches the regular expression [^-0-9(]([^()]|.)+ represents a property whose key is Msymbol. In the element, , , , and e are replaced with tab (code 9), newline (code 10), carriage return (code 13), and escape (code 27) respectively. Other characters following a backslash is interpreted as it is. The value of the property is the symbol having the resulting string as its name. For instance, the element abc def represents a property whose value is the symbol having the name 'abc def'. o MTEXT An element that matches the regular expression '([^']|')*' represents a property whose key is Mtext. The backslash escape explained above also applies here. r, each part in the element matching the regular expression [xX][0-9A-Fa-f][0-9A-Fa-f] is replaced with its hexadecimal interpretation. After having resolved the backslash escapes, the byte sequence between the double quotes is interpreted as a UTF-8 sequence and decoded into an M-text. This M-text is the value of the property. o PLIST Zero or more elements surrounded by a pair of parentheses represent a property whose key is Mplist. Whitespaces before and after a parenthesis can be omitted. The value of the property is a plist, which is the result of recursive interpretation of the elements between the parentheses. SYNTAX NOTATION
In an explanation of a plist format of data, a BNF-like notation is used. In the notation, non-terminals are represented by a string of uppercase letters (including '-' in the middle), terminals are represented by a string surrounded by '"'. Special non-terminals INTEGER, SYMBOL, MTEXT and PLIST represents property integer, symbol, M-text, or plist respectively. EXAMPLE
Here is an example of database data that is read into a plist of this simple format: DATA-FORMAT ::= [ INTEGER | SYMBOL | MTEXT | FUNC ] * FUNC ::= '(' FUNC-NAME FUNC-ARG * ')' FUNC-NAME ::= SYMBOL FUNC-ARG ::= INTEGER | SYMBOL | MTEXT | '(' FUNC-ARG ')' For instance, a data file that contains this text matches the above syntax: abc 123 (pqr 0xff) "m and is read into this plist: 1st element: key: Msymbol, value: abc 2nd element: key: Minteger, value: 123 3rd element: key: Mplist, value: a plist of these elements: 1st element: key Msymbol, value: pgr 2nd element: key Minteger, value: 255 4th element: key: Mtext, value: m"text 5th element: key: Mplist, value: a plist of these elements: 1st element: key: Msymbol, value: __ 2nd element: key: Mplist, value: a plist of these elements: 1st element: key: Mtext, value: string 2nd element: key: Msymbol, value: xyz 3rd element: key: Minteger, value: -456 COPYRIGHT
Copyright (C) 2001 Information-technology Promotion Agency (IPA) Copyright (C) 2001-2011 National Institute of Advanced Industrial Science and Technology (AIST) Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License <http://www.gnu.org/licenses/fdl.html>. Version 1.6.2 12 Jan 2011 mdbGeneral(5)
All times are GMT -4. The time now is 07:12 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy