Sponsored Content
Top Forums Shell Programming and Scripting Retrieve information Text/Word from HTML code using awk/sed Post 302897217 by Yoda on Friday 11th of April 2014 05:28:57 PM
Old 04-11-2014
Ok, here is what I got:
Code:
$ cat file.html
<font face=arial size=-1><li><a href=/value_for_clients/Tokyo/abc_process.txt>abc</a> NDK Version:  4.0 </li>
<font face=arial size=-1><li><a href=/value_for_clients/Tokyo/abc01_process.txt>abc01</a> NDK Version:  4.0 </li>
<font face=arial size=-1><li><a href=/value_for_clients/Tokyo/abc045_process.txt>abc045</a> NDK Version:  4.0 </li>
<font face=arial size=-1><li><a href=/value_for_clients/Tokyo/cdf_process.txt>cdf</a> NDK Version:  4.0 </li>
<font face=arial size=-1><li><a href=/value_for_clients/Tokyo/Manhattan_process.txt>Manhattan</a> NDK Version:  4.0 </li>

Code:
$ awk -F'[<>]' '{ print $7 }' file.html
abc
abc01
abc045
cdf
Manhattan

You can also try:
Code:
sed 's#.*txt>##;s#<.*##' file.html

This User Gave Thanks to Yoda For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to use sed to remove html tags including text between them

How to use sed to remove html tags including text between them? Example: User <b> rolvak </b> is stupid. It does not using <b>OOP</b>! and should output: User is stupid. It does not using ! Thank you.. (2 Replies)
Discussion started by: alphagon
2 Replies

2. UNIX for Dummies Questions & Answers

retrieve lines using sed, grep or awk

Hi, I'm looking for a command to retrieve a block of lines using sed or grep, probably awk if that can do the job. In below example, By searching for words "Third line2" i'm expecting to retrieve the full block starting with 'BEGIN' and ending with 'END' of the search. Example: ... (3 Replies)
Discussion started by: learning_linux
3 Replies

3. Shell Programming and Scripting

SED to extract HTML text data, not quite right!

I am attempting to extract weather data from the following website, but for the Victoria area only: Text Forecasts - Environment Canada I use this: sed -n "/Greater Victoria./,/Fraser Valley./p" But that phrasing does not sometimes get it all and think perhaps the website has more... (2 Replies)
Discussion started by: lagagnon
2 Replies

4. Shell Programming and Scripting

How to retrieve digital string using sed or awk

Hi, I have filename in the following format: YUENLONG_20070818.DMP HK_20070818_V0.DMP WANCHAI_20070820.DMP KWUNTONG_20070820_V0.DMP How to retrieve only the digital part with sed or awk and return the following format: 20070818 20070818 20070820 20070820 Thanks! Victor (3 Replies)
Discussion started by: victorcheung
3 Replies

5. Shell Programming and Scripting

sed/awk to retrieve max year in column

I am trying to retrieve that max 'year' in a text file that is delimited by tilde (~). It is the second column and the values may be in Char format (double quoted) and have duplicate values. Please help. (4 Replies)
Discussion started by: CKT_newbie88
4 Replies

6. Shell Programming and Scripting

Execute a C program and retrieve information

Hi I have the following script: #!/bin/sh gcc -o program program.c ./program & PID=$! where i execute a C program and i get its pid. I want to retrieve information about this program (e.g memory consumption) using command top. So far i have: top -d 1.0 -p $PID But i dont know how to... (6 Replies)
Discussion started by: nteath
6 Replies

7. Shell Programming and Scripting

cut, sed, awk too slow to retrieve line - other options?

Hi, I have a script that, basically, has two input files of this type: file1 key1=value1_1_1 key2=value1_2_1 key4=value1_4_1 ... file2 key2=value2_2_1 key2=value2_2_2 key3=value2_3_1 key4=value2_4_1 ... My files are 10k lines big each (approx). The keys are strings that don't... (7 Replies)
Discussion started by: fzd
7 Replies

8. Shell Programming and Scripting

Extract word from text (sed,awk, etc...)

Hello, I need some help extracting the number after the RBA e.g 15911688 from the below block of text (e.g: grep RBA |sed .......). The code should be valid for blocks if text generated at different times as well and not for the below text only. ... (2 Replies)
Discussion started by: drbiloukos
2 Replies

9. Shell Programming and Scripting

Perl code to retrieve text from website

perl -MLWP::Simple -le '$s=shift;$c=get("http://www.google.com/intl/en/chrome/devices/chromecast/$s/");$c=~/meta content=(.*?)name=\"Remote free\"/msg; print length($1),"\t$1"' ?gclid=CJDg27OdnL0CFcFlOgodFD8A6Q >output.txt output.txt should be: Chromecast works with devices you already own,... (9 Replies)
Discussion started by: cmccabe
9 Replies

10. Shell Programming and Scripting

Awk/sed HTML extract

I'm extracting text between table tags in HTML <th><a href="/wiki/Buick_LeSabre" title="Buick LeSabre">Buick LeSabre</a></th> using this: awk -F "</*th>" '/<\/*th>/ {print $2}' auto2 > auto3 then this (text between a href): sed -e 's/\(<*>\)//g' auto3 > auto4 How to shorten this into one... (8 Replies)
Discussion started by: p1ne
8 Replies
Text::WordDiff::HTML(3pm)				User Contributed Perl Documentation				 Text::WordDiff::HTML(3pm)

Name
       Text::WordDiff::HTML - XHTML formatting for Text::WordDiff

Synopsis
	   use Text::WordDiff;

	   my $diff = word_diff 'file1.txt', 'file2.txt'; { STYLE => 'HTML' };
	   my $diff = word_diff $string1,   $string2,    { STYLE => 'HTML' };
	   my $diff = word_diff *FH1,	     *FH2,	   { STYLE => 'HTML' };
	   my $diff = word_diff &reader1,   &reader2,    { STYLE => 'HTML' };
	   my $diff = word_diff @records1,  @records2,   { STYLE => 'HTML' };

	   # May also mix input types:
	   my $diff = word_diff @records1,  'file_B.txt', { STYLE => 'HTML' };

Description
       This class subclasses Text::WordDiff::Base to provide a XHTML formatting for Text::WordDiff. See Term::WordDiff for usage details. This
       class should never be used directly.

       Text::WordDiff::HTML formats word diffs for viewing in a Web browser. The diff content is highlighted as follows:

       o   "<div class="file">"

	   This element contains the entire contents of the diff "file" returned by "word_diff()". All of the following elements are subsumed by
	   this one.

	   o   "<span class="fileheader">"

	       The header section for the files being "diff"ed, usually something like:

		 --- in.txt    Thu Sep	1 12:51:03 2005
		 +++ out.txt   Thu Sep	1 12:52:12 2005

	       This element immediately follows the opening "file" "<div>" element, but will not be present if Text::WordDif cannot deterimine the
	       file names for both files being compared.

	   o   "<span class="hunk">"

	       This element contains a single diff "hunk". Each hunk may contain the following elements:

	       o   "<ins>"

		   Inserted content.

	       o   "<del>"

		   Deleted content.

       You may do whatever you like with these elements and classes; I highly recommend that you style them using CSS. You'll find an example CSS
       file in the eg directory in the Text-WordDiff distribution.

See Also
       Text::WordDiff
       Text::WordDiff::ANSIColor

Support
       This module is stored in an open repository at the following address:

       <https://svn.kineticode.com/Text-WordDiff/trunk/>

       Patches against Text::WordDiff are welcome. Please send bug reports to <bug-text-worddiff@rt.cpan.org>.

Author
       David Wheeler <david@kineticode.com>

Copyright and License
       Copyright (c) 2005-2008 David Wheeler. Some Rights Reserved.

       This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

perl v5.10.0							    2009-09-24						 Text::WordDiff::HTML(3pm)
All times are GMT -4. The time now is 02:31 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy