|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
get text between two tags in bash (awk)
Hi, I have a sample text file: Code:
<category name="Temp1">something1</category><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> </TD></TR></TABLE></BODY></HTML> <category name="Temp2">something2 </category> New lines in the file may or may not occur. I would like to get only those parts of the file which are between the closest 'category' tags, so in this example: Code:
<category name="Temp1">something1</category><category name="Temp2">something2</category> I am trying to force awk to do that like that: Code:
awk -F "</?category.*>" '{ print $1 }' file.txtBut this command gives me only: Code:
</TD></TR></TABLE></BODY></HTML> Could anyone point me how to write the command properly? Regards, Robert |
| Sponsored Links | ||
|
|
#2
|
|||
|
|||
|
What do you have if you replace your $1 with $2 ? What do you get if you do this Code:
awk -F "</?category.*>" '{ print $2 }' file.txt? |
| Sponsored Links | ||
|
|
#3
|
|||
|
|||
|
Try this, Code:
awk -F">" '/category/{printf $1FS;printf $2 ~ /<\/category/?$2FS:$2}' infile |
|
#4
|
|||
|
|||
|
Thanks for your answers. The last answer works of course with my file. My mistake that I have cut my sample too much :/ Imagine the other text file, slightly more complex: Code:
<category name="Temp1">something1<blah>some<test>aa</test></blah></category> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> </TD></TR></TABLE></BODY></HTML> <category name="Temp2">something2<cat><test1>aa</test1>ww</cat></category> <category name="Temp1">something1<blah>some<test>aa</test></blah></category> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> </TD></TR></TABLE></BODY></HTML> <category name="Temp2">something2<cat><test1>aa</test1>ww</cat></category> I would like to get: (Toggle Plain Text) Code:
<category name="Temp1">something1<blah>some<test>aa</test></blah></category><category name="Temp2">something2<cat><test1>aa</test1>ww</cat></category><category name="Temp1">something1<blah>some<test>aa</test></blah></category><category name="Temp2">something2<cat><test1>aa</test1>ww</cat></category> from it. Could you tell me how to rewrite the command? |
| Sponsored Links | |
|
|
#5
|
|||
|
|||
|
Assuming your file does not contain
| character so i can use it to replace the
"category" string (i use this tip to make sure that /category>.*<category matches only a .* that does NOT contain any other "category" string Code:
echo "$(tr -d '\n' <infile)" | sed 's/category/|/g;s:/|>[^|]*<|:/|><|:g;s/|/category/g' Last edited by ctsgnb; 03-04-2011 at 05:42 AM.. |
| Sponsored Links | ||
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Read two text files in bash | Muhammad Rahiz | UNIX for Dummies Questions & Answers | 4 | 10-20-2010 01:00 PM |
| Bash script with text and wc | oskis | Shell Programming and Scripting | 3 | 03-09-2010 07:56 AM |
| [bash help]Adding multiple lines of text into a specific spot into a text file | cdn_humbucker | Shell Programming and Scripting | 2 | 03-06-2010 02:11 AM |
| Changing text colour in bash | detatchedd | Shell Programming and Scripting | 3 | 11-09-2009 06:44 AM |
| Remove html tags with bash | dejavu88 | Shell Programming and Scripting | 4 | 05-22-2008 01:58 PM |
|
|