Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 02-23-2011
Registered User
 
Join Date: Feb 2011
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
get text between two tags in bash (awk)

Hi,

I have a sample text file:


Code:
<category name="Temp1">something1</category><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
</TD></TR></TABLE></BODY></HTML>
<category name="Temp2">something2
</category>

New lines in the file may or may not occur.

I would like to get only those parts of the file which are between the closest 'category' tags, so in this example:


Code:
<category name="Temp1">something1</category><category name="Temp2">something2</category>

I am trying to force awk to do that like that:


Code:
awk -F "</?category.*>" '{ print $1 }' file.txt

But this command gives me only:


Code:
</TD></TR></TABLE></BODY></HTML>

Could anyone point me how to write the command properly?

Regards,
Robert
Sponsored Links
    #2  
Old 02-23-2011
ctsgnb ctsgnb is offline Forum Advisor  
Registered User
 
Join Date: Oct 2010
Location: France
Posts: 2,763
Thanks: 72
Thanked 587 Times in 561 Posts
What do you have if you replace your $1 with $2 ?

What do you get if you do this

Code:
awk -F "</?category.*>" '{ print $2 }' file.txt

?
Sponsored Links
    #3  
Old 02-23-2011
Advisor
 
Join Date: Sep 2009
Location: ./India/Mumbai
Posts: 992
Thanks: 34
Thanked 206 Times in 199 Posts
Try this,

Code:
awk -F">" '/category/{printf $1FS;printf $2 ~ /<\/category/?$2FS:$2}'  infile

    #4  
Old 03-04-2011
Registered User
 
Join Date: Feb 2011
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
Thanks for your answers. The last answer works of course with my file.

My mistake that I have cut my sample too much :/

Imagine the other text file, slightly more complex:


Code:
<category name="Temp1">something1<blah>some<test>aa</test></blah></category>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
</TD></TR></TABLE></BODY></HTML>
<category name="Temp2">something2<cat><test1>aa</test1>ww</cat></category>
<category name="Temp1">something1<blah>some<test>aa</test></blah></category> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> </TD></TR></TABLE></BODY></HTML> <category name="Temp2">something2<cat><test1>aa</test1>ww</cat></category>

I would like to get:
(Toggle Plain Text)


Code:
<category name="Temp1">something1<blah>some<test>aa</test></blah></category><category name="Temp2">something2<cat><test1>aa</test1>ww</cat></category><category name="Temp1">something1<blah>some<test>aa</test></blah></category><category name="Temp2">something2<cat><test1>aa</test1>ww</cat></category>

from it. Could you tell me how to rewrite the command?
Sponsored Links
    #5  
Old 03-04-2011
ctsgnb ctsgnb is offline Forum Advisor  
Registered User
 
Join Date: Oct 2010
Location: France
Posts: 2,763
Thanks: 72
Thanked 587 Times in 561 Posts
Assuming your file does not contain | character so i can use it to replace the "category" string (i use this tip to make sure that

/category>.*<category matches only a .* that does NOT contain any other "category" string


Code:
echo "$(tr -d '\n' <infile)" | sed 's/category/|/g;s:/|>[^|]*<|:/|><|:g;s/|/category/g'


Last edited by ctsgnb; 03-04-2011 at 05:42 AM..
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Read two text files in bash Muhammad Rahiz UNIX for Dummies Questions & Answers 4 10-20-2010 01:00 PM
Bash script with text and wc oskis Shell Programming and Scripting 3 03-09-2010 07:56 AM
[bash help]Adding multiple lines of text into a specific spot into a text file cdn_humbucker Shell Programming and Scripting 2 03-06-2010 02:11 AM
Changing text colour in bash detatchedd Shell Programming and Scripting 3 11-09-2009 06:44 AM
Remove html tags with bash dejavu88 Shell Programming and Scripting 4 05-22-2008 01:58 PM



All times are GMT -4. The time now is 12:01 PM.