get text between two tags in bash (awk) | Unix Linux Forums | Shell Programming and Scripting

  Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

get text between two tags in bash (awk)

Shell Programming and Scripting


Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 02-23-2011
rkoziol7 rkoziol7 is offline
Registered User
 
Join Date: Feb 2011
Last Activity: 4 March 2011, 3:43 AM EST
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
get text between two tags in bash (awk)

Hi,

I have a sample text file:


Code:
<category name="Temp1">something1</category><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
</TD></TR></TABLE></BODY></HTML>
<category name="Temp2">something2
</category>

New lines in the file may or may not occur.

I would like to get only those parts of the file which are between the closest 'category' tags, so in this example:


Code:
<category name="Temp1">something1</category><category name="Temp2">something2</category>

I am trying to force awk to do that like that:


Code:
awk -F "</?category.*>" '{ print $1 }' file.txt

But this command gives me only:


Code:
</TD></TR></TABLE></BODY></HTML>

Could anyone point me how to write the command properly?

Regards,
Robert
Sponsored Links
    #2  
Old 02-23-2011
ctsgnb ctsgnb is offline Forum Advisor  
Registered User
 
Join Date: Oct 2010
Last Activity: 1 October 2014, 9:32 AM EDT
Location: France
Posts: 2,930
Thanks: 81
Thanked 624 Times in 596 Posts
What do you have if you replace your $1 with $2 ?

What do you get if you do this

Code:
awk -F "</?category.*>" '{ print $2 }' file.txt

?
Sponsored Links
    #3  
Old 02-23-2011
pravin27 pravin27 is offline Forum Advisor  
Advisor
 
Join Date: Sep 2009
Last Activity: 10 October 2014, 7:04 AM EDT
Location: ./India/Banglore
Posts: 1,207
Thanks: 53
Thanked 270 Times in 263 Posts
Try this,

Code:
awk -F">" '/category/{printf $1FS;printf $2 ~ /<\/category/?$2FS:$2}'  infile

The Following User Says Thank You to pravin27 For This Useful Post:
jimjimy (06-03-2013)
    #4  
Old 03-04-2011
rkoziol7 rkoziol7 is offline
Registered User
 
Join Date: Feb 2011
Last Activity: 4 March 2011, 3:43 AM EST
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
Thanks for your answers. The last answer works of course with my file.

My mistake that I have cut my sample too much :/

Imagine the other text file, slightly more complex:


Code:
<category name="Temp1">something1<blah>some<test>aa</test></blah></category>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
</TD></TR></TABLE></BODY></HTML>
<category name="Temp2">something2<cat><test1>aa</test1>ww</cat></category>
<category name="Temp1">something1<blah>some<test>aa</test></blah></category> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> </TD></TR></TABLE></BODY></HTML> <category name="Temp2">something2<cat><test1>aa</test1>ww</cat></category>

I would like to get:
(Toggle Plain Text)


Code:
<category name="Temp1">something1<blah>some<test>aa</test></blah></category><category name="Temp2">something2<cat><test1>aa</test1>ww</cat></category><category name="Temp1">something1<blah>some<test>aa</test></blah></category><category name="Temp2">something2<cat><test1>aa</test1>ww</cat></category>

from it. Could you tell me how to rewrite the command?
Sponsored Links
    #5  
Old 03-04-2011
ctsgnb ctsgnb is offline Forum Advisor  
Registered User
 
Join Date: Oct 2010
Last Activity: 1 October 2014, 9:32 AM EDT
Location: France
Posts: 2,930
Thanks: 81
Thanked 624 Times in 596 Posts
Assuming your file does not contain | character so i can use it to replace the "category" string (i use this tip to make sure that

/category>.*<category matches only a .* that does NOT contain any other "category" string


Code:
echo "$(tr -d '\n' <infile)" | sed 's/category/|/g;s:/|>[^|]*<|:/|><|:g;s/|/category/g'


Last edited by ctsgnb; 03-04-2011 at 05:42 AM..
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Read two text files in bash Muhammad Rahiz UNIX for Dummies Questions & Answers 4 10-20-2010 01:00 PM
Bash script with text and wc oskis Shell Programming and Scripting 3 03-09-2010 07:56 AM
[bash help]Adding multiple lines of text into a specific spot into a text file cdn_humbucker Shell Programming and Scripting 2 03-06-2010 02:11 AM
Changing text colour in bash detatchedd Shell Programming and Scripting 3 11-09-2009 06:44 AM
Remove html tags with bash dejavu88 Shell Programming and Scripting 4 05-22-2008 01:58 PM



All times are GMT -4. The time now is 03:00 AM.