How to remove all text except pattern


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to remove all text except pattern
# 8  
Old 01-04-2010
Quote:
Originally Posted by rdcwayx
Code:
echo $text| grep -o "title=\".*\""

This will fail (by matching more text than intended) if there is a quote after the quote that terminates the title, since it will be a greedy match. [^"]* instead of .* is best (again, assuming there's a possibility of another quote later on in the line).

Regards,
alister
# 9  
Old 01-04-2010
Indeed, you would have to create a non-greedy match.
Code:
echo "$text" | grep -o 'title="[^"]*"'



---------- Post updated at 02:50 ---------- Previous update was at 02:41 ----------

Quote:
Originally Posted by alister
Try:

Code:
echo "$text" | sed 's/^.*\(title="[^"]*"\).*$/\1/' > titles.html

This will only match one occurrence per line..

---------- Post updated at 03:09 ---------- Previous update was at 02:50 ----------


Alternatives to grep -o
Code:
echo "$text" | awk '/title=$/{getline;print "title=\""$0"\""}' RS=\"

Code:
echo "$text" | sed 's/title="[^"]*"/\n&\n/g' | sed '/^title="/!d'


Last edited by Scrutinizer; 01-04-2010 at 10:33 PM..
# 10  
Old 01-05-2010
Try...
Code:
perl -nle 'print for m/title=\".*?\"/g' nasty.html > titles.txt

# 11  
Old 01-05-2010
non of those works (output was empty)..ill give you 1 row from my html...
HTML Code:
;_OC_timingAction('search');</script><div class="scontentarea" id="scontentarea"><table style="width:100%"><tr><td id="sidebar"style="padding:24px 8px;width:190px;display:none;vertical-align:top"></td><td id="main_content"><div style="margin-bottom:6px; margin-top: 4px"><a class="link_aux" title="" href=""></a></div><div class="result_spacer"><br/></div><div class="rsiwrapper" ><table class="rsi" cellspacing=0 cellpadding=0 border=0 ><tr><td class="coverdstd" align="center"><a href="http://books.google.com/books?id=Ww1B9O_yVGsC&printsec=frontcover&dq=java&hl=sk&ie=ISO-8859-2&cd=1" ><img alt="The Java language specification" class="coverthumb" title="The Java language specification" dir=ltr src="http://bks2.books.google.com/books?id=Ww1B9O_yVGsC&printsec=frontcover&img=1&zoom=5&edge=curl&sig=ACfU3U3EBlPtT6KTuEx6mtanykCsu93qtA" border=0 height=80><script type="text/javascript">if (window['_OC_registerHover']){_OC_registerHover({"title":"The \u003cb\u003eJava\u003c/b\u003e language specification","authors":"James Gosling, Bill Joy","bib_key":"ISBN:0201310082","pub_date":"2000","snippet":"Developers will turn to this book again and again.","subject":"Computers","info_url":"http://books.google.com/books?id=Ww1B9O_yVGsC\u0026dq=java\u0026hl=sk\u0026ie=ISO-8859-2","preview_url":"http://books.google.com/books?id=Ww1B9O_yVGsC\u0026printsec=frontcover\u0026dq=java\u0026hl=sk\u0026ie=ISO-8859-2\u0026cd=1","thumbnail_url":"http://bks2.books.google.com/books?id=Ww1B9O_yVGsC\u0026printsec=frontcover\u0026img=1\u0026zoom=5\u0026edge=curl\u0026sig=ACfU3U3EBlPtT6KTuEx6mtanykCsu93qtA","num_pages":505,"viewability":2,"preview":"partial","embeddable":true})}</script></a><div class="starrating"></div></td><td valign=top><div class=resbdy><h2 class="resbdy"><a href="http://books.google.com/books?id=Ww1B9O_yVGsC&printsec=frontcover&dq=java&hl=sk&ie=ISO-8859-2&cd=1"><span dir=ltr>The <b>Java</b> language specification</span></a></h2><font size=-1><span style="line-height: 1.2em;"><span class=ln2><a href="http://books.google.com/books?q=+inauthor:%22James+Gosling%22&hl=sk&ie=ISO-8859-2" class="link_aux">James Gosling</a>, <a href="http://books.google.com/books?q=+inauthor:%22Bill+Joy%22&hl=sk&ie=ISO-8859-2" class="link_aux">Bill Joy</a> - 2000 - Počet stránok 505</span><br/><div class="snippet sa" dir=ltr>Developers will turn to this book again and again.</div><div><span style="color:#99522e">Obmedzený náhµad</span> - <a class="link_aux axs_about" href="http://books.google.com/books?id=Ww1B9O_yVGsC&dq=java&hl=sk&ie=ISO-8859-2"">O tejto knihe</a> - <span class="res_ann">
i need from this title="Some text(but not empty)"

Scrutinizer, thanks it works...can you explain to me your solution? (cos I dont know exactly what does [^"]* mean thanks)

Last edited by Lukasito; 01-05-2010 at 06:24 AM.. Reason: omg it works
# 12  
Old 01-05-2010
[^"]* means zero or more occurrences of any character that is not a double quote.
# 13  
Old 01-05-2010
Bug

Quote:
Originally Posted by Scrutinizer
[^"]* means zero or more occurrences of any character that is not a double quote.
ah..i see, thanks Smilie
# 14  
Old 01-05-2010
Previous perl code produces output...
Code:
title=""
title="The Java language specification"

New requirement is to exclude empty text, so try...
Code:
perl -nle 'print for m/title=\"[^\"]+\"/g' nasty.html > titles.txt

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk to remove pattern and lines above pattern

In the awk below I am trying to remove all lines above and including the pattern Test or Test2. Each block is seperated by a newline and Test2 also appears in the lines to keep but it will always have additional text after it. The Test to remove will not. The awk executed until the || was added... (2 Replies)
Discussion started by: cmccabe
2 Replies

2. Shell Programming and Scripting

How to remove the text between all curly brackets from text file?

Hello experts, I have a text file with lot of curly brackets (both opening { & closing } ). I need to delete them alongwith the text between opening & closing brackets' pair. For ex: Input:- 59. Rh1 Qe4 {(Qf5-e4 Qd8-g8+ Kg6-f5 Qg8-h7+ Kf5-e5 Qh7-e7+ Ke5-f5 Qe7-d7+ Qe4-e6 Qd7-h7+ Qe6-g6... (6 Replies)
Discussion started by: prvnrk
6 Replies

3. UNIX for Advanced & Expert Users

How to remove a char before a pattern?

Hi I have a file where i want to remove a char before a specific pattern. exp: CREATE TABLE ( A, B, C, ----comma needs to be removed )AS SELECT A, B, C, ----comma needs to be removed FROM TABLE. So i want to delete the comma(,) after the C both ways.Pattern can be... (11 Replies)
Discussion started by: raju2016
11 Replies

4. Shell Programming and Scripting

Remove comments like pattern from text

Hi , We need to remove comment like pattern from a code text. The possible comment expressions are as follows. Input BizComment : Special/*@ Name:bzt_53_3aea640a_51783afa_5d64_0 BizHidden:true @*/ /* lookup Disease Category Therapuetic Class */ a=b;... (6 Replies)
Discussion started by: VikashKumar
6 Replies

5. Shell Programming and Scripting

Remove duplicate occurrences of text pattern

Hi folks! I have a file which contains a 1000 lines. On each line i have multiple occurrences ( 26 to be exact ) of pattern folder#/folder#. # is depicting the line number in the file some text here folder1/folder1 some text here folder1/folder1 some text here folder1/folder1 some text... (7 Replies)
Discussion started by: martinsmith
7 Replies

6. Shell Programming and Scripting

Search a pattern in a line and remove another pattern

Hi, I want to search a pattern in a text file and remove another pattern in that file. my text file look like this 0.000000 1.970000 F 303 - 1.970000 2.080000 VH VH + 2.080000 2.250000 VH VH + 2.250000 2.330000 VH L - 2.330000 2.360000 F H + 2.360000 2.410000 L VL - 2.410000 ... (6 Replies)
Discussion started by: sreejithalokkan
6 Replies

7. Shell Programming and Scripting

Remove last pattern

I have a file with entries below. domain1.com.http: domain2.com.49503: I need this to be sorted like below. ie remove the patten after the last right-hand side . (dot). domain1.com domain2.com (7 Replies)
Discussion started by: anil510
7 Replies

8. Shell Programming and Scripting

Help with remove last text of a file that have specific pattern

Input file matrix-remodelling_associated_8_ aurora_interacting_1_ L20 von_factor_A_domain_1 ATP_containing_3B_ . . Output file matrix-remodelling_associated_8 aurora_interacting_1 L20 von_factor_A_domain_1 ATP_containing_3B . . (3 Replies)
Discussion started by: perl_beginner
3 Replies

9. Shell Programming and Scripting

sed: Find start of pattern and extract text to end of line, including the pattern

This is my first post, please be nice. I have tried to google and read different tutorials. The task at hand is: Input file input.txt (example) abc123defhij-E-1234jslo 456ujs-W-abXjklp From this file the task is to grep the -E- and -W- strings that are unique and write a new file... (5 Replies)
Discussion started by: TestTomas
5 Replies

10. Shell Programming and Scripting

process text between pattern and print other text

Hi All, The file has the following. =========start of file=== This is a file containing employee info START name john id 123 date 12/1/09 END START name sam id 4234 date 12/1/08 resigned END (9 Replies)
Discussion started by: vlinet
9 Replies
Login or Register to Ask a Question