get text between two tags in bash (awk)


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting get text between two tags in bash (awk)
# 1  
Old 02-23-2011
get text between two tags in bash (awk)

Hi,

I have a sample text file:

Code:
<category name="Temp1">something1</category><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
</TD></TR></TABLE></BODY></HTML>
<category name="Temp2">something2
</category>

New lines in the file may or may not occur.

I would like to get only those parts of the file which are between the closest 'category' tags, so in this example:

Code:
<category name="Temp1">something1</category><category name="Temp2">something2</category>

I am trying to force awk to do that like that:

Code:
awk -F "</?category.*>" '{ print $1 }' file.txt

But this command gives me only:

Code:
</TD></TR></TABLE></BODY></HTML>

Could anyone point me how to write the command properly?

Regards,
Robert
# 2  
Old 02-23-2011
What do you have if you replace your $1 with $2 ?

What do you get if you do this
Code:
awk -F "</?category.*>" '{ print $2 }' file.txt

?
# 3  
Old 02-23-2011
Try this,
Code:
awk -F">" '/category/{printf $1FS;printf $2 ~ /<\/category/?$2FS:$2}'  infile

This User Gave Thanks to pravin27 For This Post:
# 4  
Old 03-04-2011
Thanks for your answers. The last answer works of course with my file.

My mistake that I have cut my sample too much :/

Imagine the other text file, slightly more complex:

Code:
<category name="Temp1">something1<blah>some<test>aa</test></blah></category>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
</TD></TR></TABLE></BODY></HTML>
<category name="Temp2">something2<cat><test1>aa</test1>ww</cat></category>
<category name="Temp1">something1<blah>some<test>aa</test></blah></category> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> </TD></TR></TABLE></BODY></HTML> <category name="Temp2">something2<cat><test1>aa</test1>ww</cat></category>

I would like to get:
(Toggle Plain Text)

Code:
<category name="Temp1">something1<blah>some<test>aa</test></blah></category><category name="Temp2">something2<cat><test1>aa</test1>ww</cat></category><category name="Temp1">something1<blah>some<test>aa</test></blah></category><category name="Temp2">something2<cat><test1>aa</test1>ww</cat></category>

from it. Could you tell me how to rewrite the command?
# 5  
Old 03-04-2011
Assuming your file does not contain | character so i can use it to replace the "category" string (i use this tip to make sure that

/category>.*<category matches only a .* that does NOT contain any other "category" string

Code:
echo "$(tr -d '\n' <infile)" | sed 's/category/|/g;s:/|>[^|]*<|:/|><|:g;s/|/category/g'


Last edited by ctsgnb; 03-04-2011 at 06:42 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash to select text and apply it to a selected file in bash

In the bash below I am asking the user for a panel and reading that into bed. Then asking the user for a file and reading that into file1.Is the grep in bold the correct way to apply the selected panel to the file? I am getting a syntax error. Thank you :) ... (4 Replies)
Discussion started by: cmccabe
4 Replies

2. Shell Programming and Scripting

awk to skip lines find text and add text based on number

I am trying to use awk skip each line with a ## or # and check each line after for STB= and if that value in greater than or = to 0.8, then at the end of line the text "STRAND BIAS" is written in else "GOOD". So in the file of 4 entries attached. awk tried: awk NR > "##"' "#" -F"STB="... (6 Replies)
Discussion started by: cmccabe
6 Replies

3. Shell Programming and Scripting

Text manipulation with sed/awk in a bash script

Guys, I have a variable in a script that I want to transform to into something else Im hoping you guys can help. It doesn't have to use sed/awk but I figured these would be the simplest. DATE=20160120 I'd like to transform $DATE into "01-20-16" and move it into a new variable called... (8 Replies)
Discussion started by: dendenyc
8 Replies

4. Shell Programming and Scripting

Search text beween tags and write to file using awk

Hi Friends, I have a very big text file, that has code for multiple functions. I have scan through the file and write each function in seperate file. All functions starts with BEGIN DSFNC Identifier "ABCDDataValidationfnc" and ends with END DSFNC I need create a file(using identifier)... (2 Replies)
Discussion started by: anandapani
2 Replies

5. Shell Programming and Scripting

Reading a text file using bash

I've a file in linux with following text: ;ip address hostname put-location alt-put-location tftpserver 192.168.1.1 r01-lab1-net /mnt/nas1/fgbu/ /opt/fgbu/devicebackup 192.168.1.254Now I want to read these values and assign them to particular variables... (6 Replies)
Discussion started by: kashif.live
6 Replies

6. Shell Programming and Scripting

Extracting text from within a section of text using AWK

I have a command which returns the below output. How can I write a script to extract mainhost and secondhost from this output and put it into an array? I may sometimes have more hosts like thirdhost. I am redirecting this output to a variable. So I guess there should be a awk or sed command to... (7 Replies)
Discussion started by: heykiran
7 Replies

7. Shell Programming and Scripting

Bash script with text and wc

Hi! I´m all new to Unix and scripts, I´ve tried to write a script for wc with text so the output looks better, can anyone help me please? I want it like this example: >textWc file #<scriptname> <file to to run script on> File: file Rows: 7 Words: 56 Signs: 1312 > (3 Replies)
Discussion started by: oskis
3 Replies

8. Shell Programming and Scripting

[bash help]Adding multiple lines of text into a specific spot into a text file

I am attempting to insert multiple lines of text into a specific place in a text file based on the lines above or below it. For example, Here is a portion of a zone file. IN NS ns1.domain.tld. IN NS ns2.domain.tld. IN ... (2 Replies)
Discussion started by: cdn_humbucker
2 Replies

9. UNIX for Advanced & Expert Users

bash/grep/awk/sed: How to extract every appearance of text between two specific strings

I have a text wich looks like this: clid=2 cid=6 client_database_id=35 client_nickname=Peter client_type=0|clid=3 cid=22 client_database_id=57 client_nickname=Paul client_type=0|clid=5 cid=22 client_database_id=7 client_nickname=Mary client_type=0|clid=6 cid=22 client_database_id=6... (3 Replies)
Discussion started by: Pioneer1976
3 Replies

10. Shell Programming and Scripting

Changing text colour in bash

I am doing a basic script to check if services are disabled, and I was wondering how to change to colours for PASS and FAIL to green & red respectively. #!/usr/bin/bash clear TELNET=`svcs -a | grep telnet | awk '{print $1}'` if then RESULT=PASS else RESULT=FAIL fi... (3 Replies)
Discussion started by: detatchedd
3 Replies
Login or Register to Ask a Question