Extract text from html using perl or awk Post: 302981615

Sponsored Content

Top Forums Shell Programming and Scripting Extract text from html using perl or awk Post 302981615 by RudiC on Thursday 15th of September 2016 05:53:42 PM

09-15-2016

Registered User

Try (as a starting point)

Code:

awk -F"[]\":{}, ]*" '
BEGIN   {for (n=split ("reportLink,barcodedSamples,barcodeSampleInfo", T); n>0; n--) SRCH[T[n]] = n
        }
        {for (i=1; i<NF; i++) if ($i in SRCH) print $(i+1)
        }

' /tmp/6784d1473958785-extract-text-html-using-perl-awk-index-html
MEV45
IonXpress_007
IonXpress_008
IonXpress_009
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/
/output/Home/Auto_user_S5-00580-5-Medexome_66_tn_031/
MEV42
IonXpress_004
IonXpress_005
IonXpress_006
/output/Home/Auto_user_S5-00580-4-Medexome_65_028/
/output/Home/Auto_user_S5-00580-4-Medexome_65_tn_029/
MEC1
IonXpress_001
IonXpress_002
IonXpress_003
/output/Home/medex60_8.13.16_027/
/output/Home/reanlzemedex60_023/
/output/Home/Auto_user_S5-00580-2-Medical_Exome_60_014/
/output/Home/Auto_user_S5-00580-2-Medical_Exome_60_tn_015/
MEC1
IonXpress_001
IonXpress_002
IonXpress_003
/output/Home/Medex59_8.11.2016_026/
/output/Home/MEDEX59_8.11-2016_025/
/output/Home/reanalyze59_8.10.16_024/
/output/Home/Auto_user_S5-00580-3-Medical_Exome_59_016/
chipDescription
/output/Home/Auto_user_S5-00580-1-IQOQ_RUN_Sample_2_51_012/
/output/Home/Auto_user_S5-00580-1-IQOQ_RUN_Sample_2_51_tn_013/
chipDescription
/output/Home/Auto_user_S5-00580-0-Test_Fragment_Run_49_010/
/output/Home/Auto_user_S5-00580-0-Test_Fragment_Run_49_tn_011/

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part. Same problem happens in "type" command in MS-DOS. I know you can do it by opening it in Internet Explorer,...

2. Shell Programming and Scripting

Is it possible to convert text file to html table using perl

Hi, I have a text file say file1 having data like ABC c:/hm/new1 Dir DEF d:/ner/d sd ...... So i want to make a table from this text file, is it possible to do it using perl. Thanks in advance Sarbjit

3. Shell Programming and Scripting

SED to extract HTML text data, not quite right!

I am attempting to extract weather data from the following website, but for the Victoria area only: Text Forecasts - Environment Canada I use this: sed -n "/Greater Victoria./,/Fraser Valley./p" But that phrasing does not sometimes get it all and think perhaps the website has more...

4. Shell Programming and Scripting

extract data with awk from html files

Hello everyone, I'm new to this forum and i am new as a shell scripter. my problem is to have html files in a directory and I would like to extract from these some data that lies between two different lines Here's my situation <td align="default"> oxidizability (mg / l): data_to_extract...

5. Shell Programming and Scripting

awk -- Extract data from html within multiple tags as reference

Hi, I'm trying to get some data from an html file, but the problem is before it can extract the information I have multiple patterns that need to be passed through. https://www.unix.com/shell-programming-scripting/150711-extract-data-awk-html-files.html Is a similar problem. The only...

6. Shell Programming and Scripting

Perl script to extract text from image file

Hi Folks, Could you please share your ideas on extracting text from image file(jpg,png and gif formats). Regards, J

7. Shell Programming and Scripting

Retrieve information Text/Word from HTML code using awk/sed

awk/sed newbie here. I have a HTML file and from that file and I would like to retrieve a text word. <font face=arial size=-1><li><a href=/value_for_clients/Tokyo/abc_process.txt>abc</a> NDK Version: 4.0 </li> <font face=arial size=-1><li><a...

8. Shell Programming and Scripting

awk and HTML with conditional text colour

Hello All, I am using awk with html options to format and send output to another file. Below command works fine, no issues. awk 'BEGIN{print "<table border="1" width="1000" >"} {print "<tr>";for(i=1;i<=NF;i++)print "<td>" $i"</td>";print "</tr>"} END {print "</table>"}' ${TMPLOGFILE1} >>...

9. Shell Programming and Scripting

Awk/sed HTML extract

I'm extracting text between table tags in HTML <th><a href="/wiki/Buick_LeSabre" title="Buick LeSabre">Buick LeSabre</a></th> using this: awk -F "</*th>" '/<\/*th>/ {print $2}' auto2 > auto3 then this (text between a href): sed -e 's/$<*>$//g' auto3 > auto4 How to shorten this into one...

10. UNIX for Beginners Questions & Answers

awk to extract value after keyword in html

Using awk to extract value after a keyword in an html, and store in ts. The awk does execute but ts is empty. I use the tag as a delimiter and the keyword as a pattern, but there probably is a better way. Thank you :). file <html><head><title>xxxxxx xxxxx</title><style type="text/css"> ...

LEARN ABOUT DEBIAN

plan9-split

SPLIT(1)						      General Commands Manual							  SPLIT(1)

NAME

       split - split a file into pieces

SYNOPSIS

       split [ option ...  ] [ file ]

DESCRIPTION

       Split reads file (standard input by default) and writes it in pieces of 1000 lines per output file.  The names of the output files are xaa,
       xab, and so on to xzz.  The options are

       -n n   Split into n-line pieces.

       -l n   Synonym for -n n, a nod to Unix's syntax.

       -e expression
	      File divisions occur at each line that matches a regular expression; see regexp(7).  Multiple -e options may appear.   If  a  subex-
	      pression	of  expression is contained in parentheses (...), the output file name is the portion of the line which matches the subex-
	      pression.

       -f stem
	      Use stem instead of x in output file names.

       -s suffix
	      Append suffix to names identified under -e.

       -x     Exclude the matched input line from the output file.

       -i     Ignore case in option -e; force output file names (excluding the suffix) to lower case.

SOURCE

       /src/cmd/split.c

SEE ALSO

       sed(1), awk(1), grep(1), regexp(7)

																	  SPLIT(1)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

Discussion started by: los111

2. Shell Programming and Scripting

Is it possible to convert text file to html table using perl

Discussion started by: sarbjit

3. Shell Programming and Scripting

SED to extract HTML text data, not quite right!

Discussion started by: lagagnon

4. Shell Programming and Scripting

extract data with awk from html files

Discussion started by: sbobotex