Unix/Linux Go Back    


Shell Programming and Scripting Unix shell scripting - KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and shell scripts and shell scripting languages here.

get text between two tags in bash (awk)

Shell Programming and Scripting


Closed Linux or Unix Question    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 02-23-2011
rkoziol7 rkoziol7 is offline
Registered User
 
Join Date: Feb 2011
Last Activity: 4 March 2011, 3:43 AM EST
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
get text between two tags in bash (awk)

Hi,

I have a sample text file:


Code:
<category name="Temp1">something1</category><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
</TD></TR></TABLE></BODY></HTML>
<category name="Temp2">something2
</category>

New lines in the file may or may not occur.

I would like to get only those parts of the file which are between the closest 'category' tags, so in this example:


Code:
<category name="Temp1">something1</category><category name="Temp2">something2</category>

I am trying to force awk to do that like that:


Code:
awk -F "</?category.*>" '{ print $1 }' file.txt

But this command gives me only:


Code:
</TD></TR></TABLE></BODY></HTML>

Could anyone point me how to write the command properly?

Regards,
Robert
Sponsored Links
    #2  
Old Unix and Linux 02-23-2011
ctsgnb ctsgnb is offline Forum Advisor  
Registered User
 
Join Date: Oct 2010
Last Activity: 7 April 2015, 6:37 PM EDT
Location: France
Posts: 2,931
Thanks: 81
Thanked 626 Times in 597 Posts
What do you have if you replace your $1 with $2 ?

What do you get if you do this

Code:
awk -F "</?category.*>" '{ print $2 }' file.txt

?
Sponsored Links
    #3  
Old Unix and Linux 02-23-2011
pravin27 pravin27 is offline Forum Advisor  
Advisor
 
Join Date: Sep 2009
Last Activity: 10 April 2015, 1:52 AM EDT
Location: ./India/Bangalore
Posts: 1,231
Thanks: 57
Thanked 283 Times in 276 Posts
Try this,

Code:
awk -F">" '/category/{printf $1FS;printf $2 ~ /<\/category/?$2FS:$2}'  infile

The Following User Says Thank You to pravin27 For This Useful Post:
jimjimy (06-03-2013)
    #4  
Old Unix and Linux 03-04-2011
rkoziol7 rkoziol7 is offline
Registered User
 
Join Date: Feb 2011
Last Activity: 4 March 2011, 3:43 AM EST
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
Thanks for your answers. The last answer works of course with my file.

My mistake that I have cut my sample too much :/

Imagine the other text file, slightly more complex:


Code:
<category name="Temp1">something1<blah>some<test>aa</test></blah></category>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
</TD></TR></TABLE></BODY></HTML>
<category name="Temp2">something2<cat><test1>aa</test1>ww</cat></category>
<category name="Temp1">something1<blah>some<test>aa</test></blah></category> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> </TD></TR></TABLE></BODY></HTML> <category name="Temp2">something2<cat><test1>aa</test1>ww</cat></category>

I would like to get:
(Toggle Plain Text)


Code:
<category name="Temp1">something1<blah>some<test>aa</test></blah></category><category name="Temp2">something2<cat><test1>aa</test1>ww</cat></category><category name="Temp1">something1<blah>some<test>aa</test></blah></category><category name="Temp2">something2<cat><test1>aa</test1>ww</cat></category>

from it. Could you tell me how to rewrite the command?
Sponsored Links
    #5  
Old Unix and Linux 03-04-2011
ctsgnb ctsgnb is offline Forum Advisor  
Registered User
 
Join Date: Oct 2010
Last Activity: 7 April 2015, 6:37 PM EDT
Location: France
Posts: 2,931
Thanks: 81
Thanked 626 Times in 597 Posts
Assuming your file does not contain | character so i can use it to replace the "category" string (i use this tip to make sure that

/category>.*<category matches only a .* that does NOT contain any other "category" string


Code:
echo "$(tr -d '\n' <infile)" | sed 's/category/|/g;s:/|>[^|]*<|:/|><|:g;s/|/category/g'


Last edited by ctsgnb; 03-04-2011 at 05:42 AM..
Sponsored Links
Closed Linux or Unix Question

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Unix or Linux Image More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Read two text files in bash Muhammad Rahiz UNIX for Dummies Questions & Answers 4 10-20-2010 01:00 PM
Bash script with text and wc oskis Shell Programming and Scripting 3 03-09-2010 07:56 AM
[bash help]Adding multiple lines of text into a specific spot into a text file cdn_humbucker Shell Programming and Scripting 2 03-06-2010 02:11 AM
Changing text colour in bash detatchedd Shell Programming and Scripting 3 11-09-2009 06:44 AM
Remove html tags with bash dejavu88 Shell Programming and Scripting 4 05-22-2008 01:58 PM



All times are GMT -4. The time now is 03:06 AM.