Visit The New, Modern Unix Linux Community


how to get tags content by grep


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting how to get tags content by grep
# 1  
how to get tags content by grep

1) Is it possible to get tags content by grep -E ? For example title. Source text "<title>My page<title>"; to print "My page".

2) which bash utility to use when I want to use regex in this format?
(?<=title>).*(?=</title)
# 2  
Quote:
Originally Posted by visitor123
2) which bash utility to use when I want to use regex in this format?
(?<=title>).*(?=</title)
Perl.
Code:
perl -nle 'print $& if /(?<=title>).*(?=<\/title)/' file

# 3  
grep will not work across lines, so HTML tags that cross multiple lines of data won't match. Neither will other line-based tools like sed.

For a problem like this I'd use awk. It has powerful regexes like sed and grep's, but is an actual programming language where you get to pick exactly what gets printed when, remember things with variables, etc.

Code:
$ echo -e "<title>stuff\na\nb\nc</title>" |
awk -v RS="<" '
        /^title>/ { sub(/^title>/, "", $0); P=1 }
        /^\/title>/ { P=0 }
        P'
stuff
a
b
c

$

# 4  
Nice. Do you think I could use it with gnuwin32? I just downloaded GnuWin perl and there are pcregrep.exe and pcretest.exe. I would like to run it on Win XP.
# 5  
You should run these things in a bash/ksh/zsh shell or what have you. Windows CMD has awful quoting problems -- quoting is more or less left as a problem for the utility itself, not something CMD does -- which means every utility seems to handle quoting slightly differently. Sometimes there's just no way to control when an argument gets split or passed raw.

Which makes it extremely difficult to pass a regular expression into any program inside single quotes.

If you can install awk and bash in gnuwin32, I don't see why it wouldn't work.
# 6  
Quote:
Originally Posted by Corona688
grep will not work across lines, so HTML tags that cross multiple lines of data won't match. Neither will other line-based tools like sed.
For situations like this:
Code:
perl -ln0e '$,="\n";print /(?<=<title>).*?(?=<\/title)/sg' file

# 7  
Appears to work.

What do the commandline options actually mean? 'man perl' helpfully tells me they're not documented in 'man perl' but doesn't say where they are documented...

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #174
Difficulty: Medium
The Apache 2 'Fancy indexing' directive shows the files plus the date the file was last modified, the size and the user who originally created the files.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Grep content between specific lines

cat file1 *FileHeader* Partition 0 Total Data Bytes 1416 Avg Bytes/Record 1416 Others 1 PRDX22.AUDIT_DATA_INFO Partition 4 Total Data Bytes 4615 Avg... (8 Replies)
Discussion started by: Veera_V
8 Replies

2. UNIX for Dummies Questions & Answers

Grep content in xml file

I have an xml file with header as below. <Provider xmlns="http://www.xyzx.gov/xyz" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.xyzx.gov/xyz xyz.xsd" SCHEMA_VERSION="2.5" PROVIDER="5"> I want to get the schema version here that is 2.5 and put in a... (7 Replies)
Discussion started by: Ariean
7 Replies

3. Shell Programming and Scripting

Grep to display file name along with content in Solaris

Am using the following grep to match a particular patter in grep. grep xyz abc.txt now while i run this command, if the pattern matched, am getting the line containing xyz Output: xyz is doing some work Now if i want the file name also along with my output, what should i do Expected... (2 Replies)
Discussion started by: rituparna_gupta
2 Replies

4. Shell Programming and Scripting

How to grep the content performed by an User

I want to grep the content performed by an User from a file. Suppose that i have a following file HYD-HMS-2$ ls -lrt -rw-r--r-- 1 sdfrun sdf 31726356 Aug 1 13:04 journal.03.01082012.19.csv I could able to grep the content performed by a user by "sed" command as follows HYD-HMS-2$... (0 Replies)
Discussion started by: duppalav
0 Replies

5. Shell Programming and Scripting

grep variable with tricky content

Hello, i have this issue: text="-8x7YTVNk2KiuY-PWG5zzzjB-zzw" string=-8x7YTVNk2KiuY-PWG5zzzjB-zzw echo $text | grep -v \'$string\' -8x7YTVNk2KiuY-PWG5zzzjB-zzw echo \'$string\' '-8x7YTVNk2KiuY-PWG5zzzjB-zzw' ..and ofcourse if I do like this : echo $text | grep -v $string grep: invalid... (5 Replies)
Discussion started by: black_fender
5 Replies

6. Shell Programming and Scripting

Can i use grep to check variable content correctnes?

I need to know if is possible to use grep to check content of a local variable, for eg. i use read index and i want to check if the index i read is in correct form, how do i do that i tried with grep but i get errors all the time dont know how to make it work.. thanks! (3 Replies)
Discussion started by: Goroner
3 Replies

7. Shell Programming and Scripting

Grep content between timestamp

Hi all, I have a file which will be updated every half an hour and time stamp will be printed in the beginning of the updation. i just want to grep the content between every hoalf an hour. Pls help me on this issue. how to grep contents between tim stamp? Ex of file: 29/09/2010... (20 Replies)
Discussion started by: steve2216
20 Replies

8. Shell Programming and Scripting

Read content between xml tags with awk, grep, awk or what ever...

Hello, I trying to extract text that is surrounded by xml-tags. I tried this cat tst.xml | egrep "<SERVER>.*</SERVER>" |sed -e "s/<SERVER>\(.*\)<\/SERVER>/\1/"|tr "|" " " which works perfect, if the start-tag and the end-tag are in the same line, e.g.: <tag1>Hello Linux-Users</tag1> ... (5 Replies)
Discussion started by: Sebi0815
5 Replies

9. UNIX for Dummies Questions & Answers

How can we grep only those content according to regular expression

I try to collect first those content like <w c5=".*" hw=".*" pos=".*?">.*</w> in that A00.xml. I use the following pattern : egrep "<w c5=".*" hw=".*" pos=".*?">.*</w>" A00.xml The result is: <s n="396"><w c5="PNP" hw="we" pos="PRON">We </w><w c5="VVB" hw="make" pos="VERB">make </w><w... (3 Replies)
Discussion started by: Johnivy
3 Replies

10. UNIX for Dummies Questions & Answers

grep content of files sorted by time stamp

egrep Date: *.html > out.htm I would like to grep the match as sorted by time stamp of the html files. how do I do that? (1 Reply)
Discussion started by: zer0
1 Replies

Featured Tech Videos