Sponsored Content
Top Forums UNIX for Beginners Questions & Answers awk to extract value after keyword in html Post 303041511 by cmccabe on Tuesday 26th of November 2019 12:31:48 PM
Old 11-26-2019
awk to extract value after keyword in html

Using awk to extract value after a keyword in an html, and store in ts. The awk does execute but ts is empty. I use the tag as a delimiter and the keyword as a pattern, but there probably is a better way. Thank you Smilie.

file
Code:
<html><head><title>xxxxxx xxxxx</title><style type="text/css">
 @media screen {
  div.summary {
    width: 18em;
    position:fixed;
    top: 3em;
    margin:1em 0 0 1em;
  }
...
...
...
<th>Measure</th><th>Value</th></tr></thead><tbody><tr><td>Filename</td><td>xxxxx</td></tr><tr><td>File type</td><td>Conventional base calls</td></tr><tr><td>Encoding</td><td>Sanger / Illumina 1.9</td></tr><tr><td>Total Sequences</td><td>49531132</td></tr><tr><td>Sequences flagged as poor quality</td><td>0</td></tr><tr><td>Sequence length</td><td>151</td></tr><tr><td>%GC</td><td>51</td></tr></tbody></table></div><div class="module"><h2 id="M1"><img
...
...
...

awk
Code:
ts=$(awk -F "[/tr><tr><td></td>]" '/Total Sequences/{print $2}' file)
echo $ts

desired
Code:
$ts=49531132

 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part. Same problem happens in "type" command in MS-DOS. I know you can do it by opening it in Internet Explorer,... (4 Replies)
Discussion started by: los111
4 Replies

2. Shell Programming and Scripting

Extract lines of text based on a specific keyword

I regularly extract lines of text from files based on the presence of a particular keyword; I place the extracted lines into another text file. This takes about 2 hours to complete using the "sort" command then Kate's find & highlight facility. I've been reading the forum & googling and can find... (4 Replies)
Discussion started by: DionDeVille
4 Replies

3. Shell Programming and Scripting

extract data with awk from html files

Hello everyone, I'm new to this forum and i am new as a shell scripter. my problem is to have html files in a directory and I would like to extract from these some data that lies between two different lines Here's my situation <td align="default"> oxidizability (mg / l): data_to_extract... (6 Replies)
Discussion started by: sbobotex
6 Replies

4. Shell Programming and Scripting

Extract Lines Containg a Keyword

Hi , I have two files, say KEY_FILE and the MAIN_FILE. I am trying to read the KEY_FILE which has only one column and look for this column data in the MAIN_FILE to extract all the rows that have this key. I have written a script to do so, but somehow it is not returning all the rows ( It... (4 Replies)
Discussion started by: Sheel
4 Replies

5. Shell Programming and Scripting

extract lines from text after keyword

I have a text and I want to extract the 4 lines following a keyword! For example if I have this text and the keyword is AAA hello helloo AAA one two three four helloooo hellooo I want the output to be one two three four (7 Replies)
Discussion started by: stekanius
7 Replies

6. Shell Programming and Scripting

awk -- Extract data from html within multiple tags as reference

Hi, I'm trying to get some data from an html file, but the problem is before it can extract the information I have multiple patterns that need to be passed through. https://www.unix.com/shell-programming-scripting/150711-extract-data-awk-html-files.html Is a similar problem. The only... (5 Replies)
Discussion started by: counfhou
5 Replies

7. Shell Programming and Scripting

Substitute keyword in html address

I have data that looks like the below: PXL-A0000005 DTE3504500000005 PXL-A0000007 DTE3504500000007 PXL-A0000014 DTE3504500000014 PXL-A0000015 DTE3504500000015 PXL-A0000016 DTE3504500000016 What I am trying to do is use the value in $1 and substitute it in catno=....&storage . I do... (2 Replies)
Discussion started by: cmccabe
2 Replies

8. Shell Programming and Scripting

Need to extract the word after a particular keyword throughout the file..

Hi Everyone, Need help in extracting the hostname from the below output. Expected output: DS-TESTB-GDS-1.TEST.ABC.COM DS-TESTB-GDS-2.TEST.ABC.COM .... ... /tmp $ cat -n /tmp/patchreport 1 /usr/bin/perl /admin/bin/patch/applyPatches.pl --apply_patches... (4 Replies)
Discussion started by: thiyagoo
4 Replies

9. Shell Programming and Scripting

Awk/sed HTML extract

I'm extracting text between table tags in HTML <th><a href="/wiki/Buick_LeSabre" title="Buick LeSabre">Buick LeSabre</a></th> using this: awk -F "</*th>" '/<\/*th>/ {print $2}' auto2 > auto3 then this (text between a href): sed -e 's/\(<*>\)//g' auto3 > auto4 How to shorten this into one... (8 Replies)
Discussion started by: p1ne
8 Replies

10. Shell Programming and Scripting

Extract text from html using perl or awk

I am trying to extract text after keywords fron an html file. The keywords are reportLink":, "barcodedSamples": {", "barcodedSamples": {". Both the perl and awk run but the output is just the entire index.html not the desired output. Also for the reportLink": only the text after the second / until... (5 Replies)
Discussion started by: cmccabe
5 Replies
XGETTEXT(1)								GNU							       XGETTEXT(1)

NAME
xgettext - extract gettext strings from source SYNOPSIS
xgettext [OPTION] [INPUTFILE]... DESCRIPTION
Extract translatable strings from given input files. Mandatory arguments to long options are mandatory for short options too. Similarly for optional arguments. Input file location: INPUTFILE ... input files -f, --files-from=FILE get list of input files from FILE -D, --directory=DIRECTORY add DIRECTORY to list for input files search If input file is -, standard input is read. Output file location: -d, --default-domain=NAME use NAME.po for output (instead of messages.po) -o, --output=FILE write output to specified file -p, --output-dir=DIR output files will be placed in directory DIR If output file is -, output is written to standard output. Choice of input file language: -L, --language=NAME recognise the specified language (C, C++, ObjectiveC, PO, Shell, Python, Lisp, EmacsLisp, librep, Scheme, Smalltalk, Java, JavaProp- erties, C#, awk, YCP, Tcl, Perl, PHP, GCC-source, NXStringTable, RST, Glade) -C, --c++ shorthand for --language=C++ By default the language is guessed depending on the input file name extension. Input file interpretation: --from-code=NAME encoding of input files (except for Python, Tcl, Glade) By default the input files are assumed to be in ASCII. Operation mode: -j, --join-existing join messages with existing file -x, --exclude-file=FILE.po entries from FILE.po are not extracted -cTAG, --add-comments=TAG place comment blocks starting with TAG and preceding keyword lines in output file -c, --add-comments place all comment blocks preceding keyword lines in output file Language specific options: -a, --extract-all extract all strings (only languages C, C++, ObjectiveC, Shell, Python, Lisp, EmacsLisp, librep, Scheme, Java, C#, awk, Tcl, Perl, PHP, GCC-source, Glade) -kWORD, --keyword=WORD look for WORD as an additional keyword -k, --keyword do not to use default keywords (only languages C, C++, ObjectiveC, Shell, Python, Lisp, EmacsLisp, librep, Scheme, Java, C#, awk, Tcl, Perl, PHP, GCC-source, Glade) --flag=WORD:ARG:FLAG additional flag for strings inside the argument number ARG of keyword WORD (only languages C, C++, ObjectiveC, Shell, Python, Lisp, EmacsLisp, librep, Scheme, Java, C#, awk, YCP, Tcl, Perl, PHP, GCC-source) -T, --trigraphs understand ANSI C trigraphs for input (only languages C, C++, ObjectiveC) --qt recognize Qt format strings (only language C++) --kde recognize KDE 4 format strings (only language C++) --boost recognize Boost format strings (only language C++) --debug more detailed formatstring recognition result Output details: --color use colors and other text attributes always --color=WHEN use colors and other text attributes if WHEN. WHEN may be 'always', 'never', 'auto', or 'html'. --style=STYLEFILE specify CSS style rule file for --color -e, --no-escape do not use C escapes in output (default) -E, --escape use C escapes in output, no extended chars --force-po write PO file even if empty -i, --indent write the .po file using indented style --no-location do not write '#: filename:line' lines -n, --add-location generate '#: filename:line' lines (default) --strict write out strict Uniforum conforming .po file --properties-output write out a Java .properties file --stringtable-output write out a NeXTstep/GNUstep .strings file -w, --width=NUMBER set output page width --no-wrap do not break long message lines, longer than the output page width, into several lines -s, --sort-output generate sorted output -F, --sort-by-file sort output by file location --omit-header don't write header with 'msgid ""' entry --copyright-holder=STRING set copyright holder in output --foreign-user omit FSF copyright in output for foreign user --package-name=PACKAGE set package name in output --package-version=VERSION set package version in output --msgid-bugs-address=EMAIL@ADDRESS set report address for msgid bugs -m[STRING], --msgstr-prefix[=STRING] use STRING or "" as prefix for msgstr values -M[STRING], --msgstr-suffix[=STRING] use STRING or "" as suffix for msgstr values Informative output: -h, --help display this help and exit -V, --version output version information and exit AUTHOR
Written by Ulrich Drepper. REPORTING BUGS
Report bugs to <bug-gnu-gettext@gnu.org>. COPYRIGHT
Copyright (C) 1995-1998, 2000-2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. SEE ALSO
The full documentation for xgettext is maintained as a Texinfo manual. If the info and xgettext programs are properly installed at your site, the command info xgettext should give you access to the complete manual. GNU gettext-tools 0.18.2 March 2013 XGETTEXT(1)
All times are GMT -4. The time now is 05:12 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy