![]() |
|
|
|
|
|||||||
| Forums | Portal | Register | Forum Rules | FAQ | Contribute | Members List | Arcade | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| grep problem | yogesh_powar | Shell Programming and Scripting | 2 | 11-09-2006 03:43 PM |
| grep problem | asal_email2 | UNIX for Dummies Questions & Answers | 4 | 06-22-2005 05:49 PM |
| problem with grep | vivekshankar | UNIX for Dummies Questions & Answers | 7 | 05-25-2005 10:49 AM |
| grep problem | svennie | UNIX for Dummies Questions & Answers | 5 | 11-08-2004 01:29 AM |
| grep problem | xiamin | UNIX for Dummies Questions & Answers | 4 | 09-06-2001 06:26 AM |
|
|
Submit Tools | LinkBack | Thread Tools | Search this Thread | Display Modes |
|
#8
|
|||
|
|||
|
I guess I live in another timezone and when work is over, I go home and usually don't look into here anymore. That's why I didn't answer that fast
EDIT: Code:
sed 's/.*<TAG1>\([^>]*\)<\/TAG1>.*/\1/g' / = start of the pattern . = any character * = zero or as many of the former character \( = escaped starting bracket of the group I want to extract [ = starting squared bracket of a group of characters ^ = inside the squared bracket means "not/none" of the following, in this case as long as no > shows up ] = ending the group * = zero or as many of the former character, in this case as many as it is no > \) = escaped bracket to end the group definition \/ = just escaping the slash of that end tag .* = zero or as many of any character (you know that by know already / = here ends the pattern I want to find and starts that, through what I want to substitute \1 = print out the 1st group I defined inside the \( \) escaped curled brackets / = end of the substitution input g = globally, do it on the whole line of input Best search the web for sites that explain regular expressions (reg exp) or get that nice awk&sed book from O'Reilly which is worth it, the small reference book and/or the "bigger" one. Last edited by zaxxon; 07-23-2008 at 09:48 PM. |
| Forum Sponsor | ||
|
|
|
#9
|
|||
|
|||
|
Thanks a lot Zaxxon ... that post really helped me a lot ... however if the file is a huge file then it is taking a lot of time to get executed.
I'll tell you exactly what I am tryin to do: I need to get the value in between the tags <TAG></TAG1>. Then I need to count the number of some other tag. So the situation here is: sed 's/.*<L:TAG>\([^>]*\)<\/L:TAG>.*/\1/' $hugefile So this is reading the whole file in spite of removin the g at the end!!! Then I have got this line to count the number of the other tags which is again reading the entire file! otherTagCount=`egrep -hc -e $searchString $line | awk '{sum+=$1};END{print sum}'` I have got the above line 4 times to count different tags. So it is taking double time to execute the script. Thanks in advance!!! |
|
#10
|
|||
|
|||
|
sed is only processing the file one line at a time. It does not read across newline boundaries.
If the tag structure is complex then perhaps it would be wiser to use XSLT or something to get precisely what you want. Combining the grep | awks into a single awk script sounds like the obvious thing to optimize. Something like Code:
sed whatever $hugefile | awk '/searchstring1/ { ++count1 }
/searchstring2/ { ++count2 }
/searchstring3/ { ++count3 }
END { print "count1=" count1 " count2=" count2 " count3=" count3 }'
|
|||
| Google The UNIX and Linux Forums |
| Tags |
| regex, regular expressions |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|