awk to extract value after keyword in html


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers awk to extract value after keyword in html
# 1  
awk to extract value after keyword in html

Using awk to extract value after a keyword in an html, and store in ts. The awk does execute but ts is empty. I use the tag as a delimiter and the keyword as a pattern, but there probably is a better way. Thank you Smilie.

file
Code:
<html><head><title>xxxxxx xxxxx</title><style type="text/css">
 @media screen {
  div.summary {
    width: 18em;
    position:fixed;
    top: 3em;
    margin:1em 0 0 1em;
  }
...
...
...
<th>Measure</th><th>Value</th></tr></thead><tbody><tr><td>Filename</td><td>xxxxx</td></tr><tr><td>File type</td><td>Conventional base calls</td></tr><tr><td>Encoding</td><td>Sanger / Illumina 1.9</td></tr><tr><td>Total Sequences</td><td>49531132</td></tr><tr><td>Sequences flagged as poor quality</td><td>0</td></tr><tr><td>Sequence length</td><td>151</td></tr><tr><td>%GC</td><td>51</td></tr></tbody></table></div><div class="module"><h2 id="M1"><img
...
...
...

awk
Code:
ts=$(awk -F "[/tr><tr><td></td>]" '/Total Sequences/{print $2}' file)
echo $ts

desired
Code:
$ts=49531132

# 2  
Hi
Code:
awk -F'Total Sequences[^0-9]*' '/Total Sequences/ {sub("[^0-9].*", "", $2); print $2}'

--- Post updated at 23:03 ---

Code:
sed -n 's/.*Total Sequences[^0-9]*\([0-9]*\).*/\1/p' file

--- Post updated at 23:13 ---

Code:
awk -F'</td><td>|</td></tr><tr><td>' '/Total Sequences/ {print $8}' file

This User Gave Thanks to nezabudka For This Post:
# 3  
Try also
Code:
awk 'match ($0, /Total Sequences<\/td><td>[^<]*/) {print substr ($0, RSTART+24, RLENGTH-24)}' file
49531132

This User Gave Thanks to RudiC For This Post:
# 4  
Try:
Code:
awk '$2=="Total Sequences"{n=-3} ++n==0{print "$ts=" $2}' RS=\< FS=\> file

This User Gave Thanks to Scrutinizer For This Post:
# 5  
Thank you all Smilie.
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #525
Difficulty: Medium
In ASCII, to convert between uppercase and lowercase versions, you only need to invert the second bit (0 for uppercase, 1 for lowercase).
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract text from html using perl or awk

I am trying to extract text after keywords fron an html file. The keywords are reportLink":, "barcodedSamples": {", "barcodedSamples": {". Both the perl and awk run but the output is just the entire index.html not the desired output. Also for the reportLink": only the text after the second / until... (5 Replies)
Discussion started by: cmccabe
5 Replies

2. Shell Programming and Scripting

Awk/sed HTML extract

I'm extracting text between table tags in HTML <th><a href="/wiki/Buick_LeSabre" title="Buick LeSabre">Buick LeSabre</a></th> using this: awk -F "</*th>" '/<\/*th>/ {print $2}' auto2 > auto3 then this (text between a href): sed -e 's/\(<*>\)//g' auto3 > auto4 How to shorten this into one... (8 Replies)
Discussion started by: p1ne
8 Replies

3. Shell Programming and Scripting

Need to extract the word after a particular keyword throughout the file..

Hi Everyone, Need help in extracting the hostname from the below output. Expected output: DS-TESTB-GDS-1.TEST.ABC.COM DS-TESTB-GDS-2.TEST.ABC.COM .... ... /tmp $ cat -n /tmp/patchreport 1 /usr/bin/perl /admin/bin/patch/applyPatches.pl --apply_patches... (4 Replies)
Discussion started by: thiyagoo
4 Replies

4. Shell Programming and Scripting

Substitute keyword in html address

I have data that looks like the below: PXL-A0000005 DTE3504500000005 PXL-A0000007 DTE3504500000007 PXL-A0000014 DTE3504500000014 PXL-A0000015 DTE3504500000015 PXL-A0000016 DTE3504500000016 What I am trying to do is use the value in $1 and substitute it in catno=....&storage . I do... (2 Replies)
Discussion started by: cmccabe
2 Replies

5. Shell Programming and Scripting

awk -- Extract data from html within multiple tags as reference

Hi, I'm trying to get some data from an html file, but the problem is before it can extract the information I have multiple patterns that need to be passed through. https://www.unix.com/shell-programming-scripting/150711-extract-data-awk-html-files.html Is a similar problem. The only... (5 Replies)
Discussion started by: counfhou
5 Replies

6. Shell Programming and Scripting

extract lines from text after keyword

I have a text and I want to extract the 4 lines following a keyword! For example if I have this text and the keyword is AAA hello helloo AAA one two three four helloooo hellooo I want the output to be one two three four (7 Replies)
Discussion started by: stekanius
7 Replies

7. Shell Programming and Scripting

Extract Lines Containg a Keyword

Hi , I have two files, say KEY_FILE and the MAIN_FILE. I am trying to read the KEY_FILE which has only one column and look for this column data in the MAIN_FILE to extract all the rows that have this key. I have written a script to do so, but somehow it is not returning all the rows ( It... (4 Replies)
Discussion started by: Sheel
4 Replies

8. Shell Programming and Scripting

extract data with awk from html files

Hello everyone, I'm new to this forum and i am new as a shell scripter. my problem is to have html files in a directory and I would like to extract from these some data that lies between two different lines Here's my situation <td align="default"> oxidizability (mg / l): data_to_extract... (6 Replies)
Discussion started by: sbobotex
6 Replies

9. Shell Programming and Scripting

Extract lines of text based on a specific keyword

I regularly extract lines of text from files based on the presence of a particular keyword; I place the extracted lines into another text file. This takes about 2 hours to complete using the "sort" command then Kate's find & highlight facility. I've been reading the forum & googling and can find... (4 Replies)
Discussion started by: DionDeVille
4 Replies

10. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part. Same problem happens in "type" command in MS-DOS. I know you can do it by opening it in Internet Explorer,... (4 Replies)
Discussion started by: los111
4 Replies

Featured Tech Videos