Sponsored Content
Top Forums UNIX for Beginners Questions & Answers Multiline html tag parse shell script Post 303040321 by RudiC on Saturday 26th of October 2019 12:57:47 PM
Old 10-26-2019
Why don't you paint the whole picture with your requirements (including but not limited to "get other texts out of the html", "get rid of the new lines", "replacing the commas with semicolon") and input data so people could work towards a final, optimal solution? E.g. the sed, awk, and dual tr invocations could be combined into a single run of one of the tools,
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part. Same problem happens in "type" command in MS-DOS. I know you can do it by opening it in Internet Explorer,... (4 Replies)
Discussion started by: los111
4 Replies

2. Shell Programming and Scripting

how to use html tag in shell scripting

Hai friends I have a small doubt.. how can we use html tag in shell scripting code : echo "<html>" echo "<body>" echo " welcome to peace world " echo "</body>" echo "</html>" output displayed like this: <html> <body> welcome to peace world </body> </html> (5 Replies)
Discussion started by: jrex1983
5 Replies

3. UNIX for Advanced & Expert Users

shell script to parse html file

hi all, i have a html file something similar to this. <tr class="evenrow"> <td class="data">added</td><td class="data">xyz@abc.com</td> <td class="data">filename.sql</td><td class="modifications-data">08/25/2009 07:58:40</td><td class="data">Added TK prof script</td> </tr> <tr... (1 Reply)
Discussion started by: sais
1 Replies

4. Shell Programming and Scripting

Parse HTML tag parameters and text

Hi! I have a bunch of HTML files, which I want to parse to CSV files. Every page has a table in it, and I need to parse each row into a csv record. With awk and sed, I managed to put every table row in separate lines. So my file looks like this: <TR> .... </TR> <TR> .... </TR> ...One... (1 Reply)
Discussion started by: senszey
1 Replies

5. Shell Programming and Scripting

Script to delete HTML tag

Guys, I have a little script that I got of the internet and that I use in Squid to block ads. I used that script with linux but now i have moved my servers to freebsd. I have a step learning curve there but it is fun: Back to the script issue. The script used to work i with linux but... (15 Replies)
Discussion started by: zongo
15 Replies

6. Shell Programming and Scripting

awk Script to parse a XML tag

I have an XML tag like this: <property name="agent" value="/var/tmp/root/eclipse" /> Is there way using awk that i can get the value from the above tag. So the output should be: /var/tmp/root/eclipse Help will be appreciated. Regards, Adi (6 Replies)
Discussion started by: asirohi
6 Replies

7. Shell Programming and Scripting

Search for a html tag and print the entire tag

I want to print from <fruits> to </fruits> tag which have <fruit> as mango. Also i want both <fruits> and </fruits> in output. Please help eg. <fruits> <fruit id="111">mango<fruit> . another 20 lines . </fruits> (3 Replies)
Discussion started by: Ashik409
3 Replies

8. Shell Programming and Scripting

Using shell command need to parse multiple nested tag value of a XML file

I have this XML file - <gp> <mms>1110012</mms> <tg>988</tg> <mm>LongTime</mm> <lv> <lkid>StartEle=ONE, Desti = Motion</lkid> <kk>12</kk> </lv> <lv> <lkid>StartEle=ONE, Source = Velocity</lkid> <kk>2</kk> </lv> <lv> ... (3 Replies)
Discussion started by: NeedASolution
3 Replies

9. Shell Programming and Scripting

XML Parse between to tag with upper tag

Hi Guys Here is my Input : <?xml version="1.0" encoding="UTF-8"?> <xn:MeContext id="01736"> <xn:VsDataContainer id="01736"> <xn:attributes> <xn:vsDataType>vsDataMeContext</xn:vsDataType> ... (12 Replies)
Discussion started by: pareshkp
12 Replies

10. Shell Programming and Scripting

How to remove html tag which has multiple lines in SHELL?

I want to clean a html file. I try to remove the script part in the html and remove the rest of tags and empty lines. The code I try to use is the following: sed '/<script/,/<\/script>/d' webpage.html | sed -e 's/<*>//g' | sed '/^\s*$/d' > output.txt However, in this method, I can not... (10 Replies)
Discussion started by: YuhuiFeng
10 Replies
regex(1F)                                                          FMLI Commands                                                         regex(1F)

NAME
regex - match patterns against a string SYNOPSIS
regex [-e] [ -v "string"] [ pattern template] ... pattern [template] DESCRIPTION
The regex command takes a string from the standard input, and a list of pattern / template pairs, and runs regex() to compare the string against each pattern until there is a match. When a match occurs, regex writes the corresponding template to the standard output and returns TRUE. The last (or only) pattern does not need a template. If that is the pattern that matches the string, the function simply returns TRUE. If no match is found, regex returns FALSE. The argument pattern is a regular expression of the form described in regex(). In most cases, pattern should be enclosed in single quotes to turn off special meanings of characters. Note that only the final pattern in the list may lack a template. The argument template may contain the strings $m0 through $m9, which will be expanded to the part of pattern enclosed in ( ... )$0 through ( ... )$9 constructs (see examples below). Note that if you use this feature, you must be sure to enclose template in single quotes so that FMLI does not expand $m0 through $m9 at parse time. This feature gives regex much of the power of cut(1), paste(1), and grep(1), and some of the capabilities of sed(1). If there is no template, the default is $m0$m1$m2$m3$m4$m5$m6$m7$m8$m9. OPTIONS
The following options are supported: -e Evaluates the corresponding template and writes the result to the standard output. -v "string" Uses string instead of the standard input to match against patterns. EXAMPLES
Example 1: Cutting letters out of a string To cut the 4th through 8th letters out of a string (this example will output strin and return TRUE): `regex -v "my string is nice" '^.{3}(.{5})$0' '$m0'` Example 2: Validating input in a form In a form, to validate input to field 5 as an integer: valid=`regex -v "$F5" '^[0-9]+$'` Example 3: Translating an environment variable in a form In a form, to translate an environment variable which contains one of the numbers 1, 2, 3, 4, 5 to the letters a, b, c, d, e: value=`regex -v "$VAR1" 1 a 2 b 3 c 4 d 5 e '.*' 'Error'` Note the use of the pattern '.*' to mean "anything else". Example 4: Using backquoted expressions In the example below, all three lines constitute a single backquoted expression. This expression, by itself, could be put in a menu defini- tion file. Since backquoted expressions are expanded as they are parsed, and output from a backquoted expression (the cat command, in this example) becomes part of the definition file being parsed, this expression would read /etc/passwd and make a dynamic menu of all the login ids on the system. `cat /etc/passwd | regex '^([^:]*)$0.*$' ' name=$m0 action=`message "$m0 is a user"`'` DIAGNOSTICS
If none of the patterns match, regex returns FALSE, otherwise TRUE. NOTES
Patterns and templates must often be enclosed in single quotes to turn off the special meanings of characters. Especially if you use the $m0 through $m9 variables in the template, since FMLI will expand the variables (usually to "") before regex even sees them. Single characters in character classes (inside []) must be listed before character ranges, otherwise they will not be recognized. For exam- ple, [a-zA-Z_/] will not find underscores (_) or slashes (/), but [_/a-zA-Z] will. The regular expressions accepted by regcmp differ slightly from other utilities (that is, sed, grep, awk, ed, and so forth). regex with the -e option forces subsequent commands to be ignored. In other words, if a backquoted statement appears as follows: `regex -e ...; command1; command2` command1 and command2 would never be executed. However, dividing the expression into two: `regex -e ...``command1; command2` would yield the desired result. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWcsu | +-----------------------------+-----------------------------+ SEE ALSO
awk(1), cut(1), grep(1), paste(1), sed(1), regcmp(3C), attributes(5) SunOS 5.10 12 Jul 1999 regex(1F)
All times are GMT -4. The time now is 08:51 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy