Parse text file using specific tags


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Parse text file using specific tags
# 1  
Old 12-11-2014
Parse text file using specific tags

Code:
 awk -F "[<>]" '/<href=>|<href=>|<top>|<top>/ {print $3, OFS=\t}' source.txt > output.txt

I'm not quite sure how to parse the attached file, but what I am trying to do is in a output file have the link (href=), name (after the <), and count (<top>) in 3 separate columns.

My attempt is the above script and an output.txt is created but it is empty.

The desired output is:
Code:
http://geneticslab.emory.edu/tests/MM021     Autism Spectrum Disorders     61
http://geneticslab.emory.edu/tests/MM250     Brain Malformations     50

Thank you Smilie.
# 2  
Old 12-11-2014
Try
Code:
sed 'N; s/\n/\t/; s/href="/>/; s/<[^>]*>//g; s/">/\t/g; s/[ -]*&#[0-9]*;[ -]*//g; /^[\t]*$/d' /tmp/source.txt
http://geneticslab.emory.edu/tests/MM021    Autism Spectrum Disorders    61
http://geneticslab.emory.edu/tests/MM250    Brain Malformations    50
http://geneticslab.emory.edu/tests/MCAR1    Comprehensive Cardiovascular    106
.
.
.

Not sure how to avoid the last five entries' disorder due to lengthy font/line-height info.
These 2 Users Gave Thanks to RudiC For This Post:
# 3  
Old 12-11-2014
Thank you Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Parse file for fields and specific text

I have a file of ~500,000 entries in the following: file.txt chr1 11868 12227 ENSG00000223972.5 . + HAVANA exon . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type... (17 Replies)
Discussion started by: cmccabe
17 Replies

2. Shell Programming and Scripting

Extract specific line in an html file starting and ending with specific pattern to a text file

Hi This is my first post and I'm just a beginner. So please be nice to me. I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file? I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies

3. UNIX for Dummies Questions & Answers

Adding tags to a specific column of a space delimited text file

I have a space delimited text file with two columns. I would like to add NA to the first column of the text file. Input: 19625 10.4791768259 19700 10.8146489183 19701 10.9084026759 19702 10.9861346978 19703 10.9304364984 Output: NA19625 10.4791768259 NA19700 10.8146489183... (1 Reply)
Discussion started by: evelibertine
1 Replies

4. Shell Programming and Scripting

Parse and Join in a text file

I wanted to parse a text file and join in specific format. please suggest me how to get this done.. The output should be in fasta format which consists of lines starting with ID, PT, PA and Sequence. "//" the two slashes are dividing lines between two different sequences. Like... (10 Replies)
Discussion started by: empyrean
10 Replies

5. Shell Programming and Scripting

Assigning a specific format to a specific column in a text file using awk and printf

Hi, I have the following text file: 8 T1mapping_flip02 ok 128 108 30 1 665000-000008-000001.dcm 9 T1mapping_flip05 ok 128 108 30 1 665000-000009-000001.dcm 10 T1mapping_flip10 ok 128 108 30 1 665000-000010-000001.dcm 11 T1mapping_flip15 ok 128 108 30... (2 Replies)
Discussion started by: goodbenito
2 Replies

6. Shell Programming and Scripting

[bash help]Adding multiple lines of text into a specific spot into a text file

I am attempting to insert multiple lines of text into a specific place in a text file based on the lines above or below it. For example, Here is a portion of a zone file. IN NS ns1.domain.tld. IN NS ns2.domain.tld. IN ... (2 Replies)
Discussion started by: cdn_humbucker
2 Replies

7. UNIX for Dummies Questions & Answers

How to parse the specific data from the file

Hi, I need to parse this data FastEthernet0/9,|FastEthernet0/10,|FastEthernet0/11,FastEthernet0/13|, FastEthernet0/12,FastEthernet0/24 . and get only the value like e.g 0/24,0/11. how to do this in shell script. Thanks in Advance. (2 Replies)
Discussion started by: MuthuAlagappan
2 Replies

8. UNIX for Dummies Questions & Answers

parse through one text file and output many

Hi, everyone The input file pattern is like below: Begin Object1 txt1 end ; Begin Object2 txt2 end ; ... (14 Replies)
Discussion started by: sophiadun
14 Replies

9. Shell Programming and Scripting

parse text file

I have a file that has a header followed by 8 columns of data. I want to toss out the header, and then write the data to another file with a different header and footer. I also need to grab the first values of the first and second column to put in the header. How do I chop off the header? ... (9 Replies)
Discussion started by: craggm
9 Replies

10. Shell Programming and Scripting

parse text file

i am attempting to parse a simple text file with multiple lines and four fields in each line, formatted as such: 12/10/2006 12:34:06 77 38 this is what i'm having problems with in my bash script: sed '1,6d' $RAWDATA > $NEWFILE #removes first 6 lines from file, which are... (3 Replies)
Discussion started by: klick81
3 Replies
Login or Register to Ask a Question