The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
Google UNIX.COM


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
grep problem yogesh_powar Shell Programming and Scripting 2 11-09-2006 03:43 PM
grep problem asal_email2 UNIX for Dummies Questions & Answers 4 06-22-2005 05:49 PM
problem with grep vivekshankar UNIX for Dummies Questions & Answers 7 05-25-2005 10:49 AM
grep problem svennie UNIX for Dummies Questions & Answers 5 11-08-2004 01:29 AM
grep problem xiamin UNIX for Dummies Questions & Answers 4 09-06-2001 06:26 AM

Reply
 
Submit Tools LinkBack Thread Tools Search this Thread Display Modes
  #8  
Old 07-23-2008
Moderator
 

Join Date: Sep 2007
Location: Germany
Posts: 1,031
I guess I live in another timezone and when work is over, I go home and usually don't look into here anymore. That's why I didn't answer that fast Can you post the pm question here please? I will describe the syntax soon as edit in this post here.

EDIT:

Code:
sed 's/.*<TAG1>\([^>]*\)<\/TAG1>.*/\1/g'
s = substitution
/ = start of the pattern
. = any character
* = zero or as many of the former character
\( = escaped starting bracket of the group I want to extract
[ = starting squared bracket of a group of characters
^ = inside the squared bracket means "not/none" of the following, in this case as long as no > shows up
] = ending the group
* = zero or as many of the former character, in this case as many as it is no >
\) = escaped bracket to end the group definition
\/ = just escaping the slash of that end tag
.* = zero or as many of any character (you know that by know already )
/ = here ends the pattern I want to find and starts that, through what I want to substitute
\1 = print out the 1st group I defined inside the \( \) escaped curled brackets
/ = end of the substitution input
g = globally, do it on the whole line of input


Best search the web for sites that explain regular expressions (reg exp) or get that nice awk&sed book from O'Reilly which is worth it, the small reference book and/or the "bigger" one.

Last edited by zaxxon; 07-23-2008 at 09:48 PM.
Reply With Quote
Forum Sponsor
  #9  
Old 07-29-2008
Registered User
 

Join Date: Jun 2008
Posts: 29
Thanks a lot Zaxxon ... that post really helped me a lot ... however if the file is a huge file then it is taking a lot of time to get executed.
I'll tell you exactly what I am tryin to do:
I need to get the value in between the tags <TAG></TAG1>.
Then I need to count the number of some other tag. So the situation here is:
sed 's/.*<L:TAG>\([^>]*\)<\/L:TAG>.*/\1/' $hugefile
So this is reading the whole file in spite of removin the g at the end!!!
Then I have got this line to count the number of the other tags which is again reading the entire file!
otherTagCount=`egrep -hc -e $searchString $line | awk '{sum+=$1};END{print sum}'`
I have got the above line 4 times to count different tags.
So it is taking double time to execute the script.
Thanks in advance!!!
Reply With Quote
  #10  
Old 07-29-2008
era era is offline
Herder of Useless Cats
 

Join Date: Mar 2008
Location: /there/is/only/bin/sh
Posts: 3,650
sed is only processing the file one line at a time. It does not read across newline boundaries.

If the tag structure is complex then perhaps it would be wiser to use XSLT or something to get precisely what you want.

Combining the grep | awks into a single awk script sounds like the obvious thing to optimize. Something like

Code:
sed whatever $hugefile | awk '/searchstring1/ { ++count1 }
  /searchstring2/ { ++count2 }
  /searchstring3/ { ++count3 }
END { print "count1=" count1 " count2=" count2 " count3=" count3 }'
perhaps?
Reply With Quote
Google The UNIX and Linux Forums
Reply

Tags
regex, regular expressions

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes




All times are GMT -7. The time now is 08:59 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008. All Rights Reserved.Ad Management by RedTyger Visit The Complex Event Processing Blog

Content Relevant URLs by vBSEO 3.2.0