![]() |
|
|
|
|
|||||||
| Forums | Portal | Register | Forum Rules | FAQ | Contribute | Members List | Arcade | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here. |
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| shell script for extracting out the shortest substring from the given starting and en | pankajd | Shell Programming and Scripting | 18 | 03-10-2008 03:20 AM |
| Extracting a substring starting from last occurance of a string/character | krramkumar | Shell Programming and Scripting | 2 | 12-19-2007 12:16 AM |
| Extracting a string from one file and searching the same string in other files | mohancrr | Shell Programming and Scripting | 1 | 09-19-2007 12:17 AM |
| problem extracting substring in korn shell | nashrul | UNIX for Dummies Questions & Answers | 3 | 08-14-2007 11:45 PM |
| AWK - Extracting matched line | not4google | Shell Programming and Scripting | 9 | 11-02-2006 08:02 AM |
|
|
Submit Tools | LinkBack | Thread Tools | Search this Thread | Display Modes |
|
#1
|
||||
|
||||
|
sed, grep, awk, regex -- extracting a matched substring from a file/string
Ok, I'm stumped and can't seem to find relevant info.
(I'm not even sure, I might have asked something similar before.): I'm trying to use shell scripting/UNIX commands to extract URLs from a fairly large web page, with a view to ultimately wrapping this in PHP with exec() and including the URLs in a webpage that I'm trying to then generate for myself. Here's what I have so far: I'm catching the page with cURL: Code:
# curl -s http://archive.wbai.org/ Code:
# curl -s http://archive.wbai.org/ | grep -i talkback Having gotten this far, I have a number of strings like this one (which is supposed to be one single line): Code:
<td width="5%" valign="top" bgcolor="#767676" align="center"> \ <span class=headline3>1 \ <td width="10%" valign="top" bgcolor="#EFEFEF"><span class=archivelink> \ <a href="pls.php?mp3fil=4541"><u>Play</u></a></span> \ <span class=archivelink> \ <a href="http://archive.wbai.org/files/mp3/060222_150002talkback.MP3"> \ <u>Download</u></a> I have tried to match Code:
href*\.MP3 NB: I am aware that if I got awk to work right, I might no longer need to grep. For now, I'm still grepping though. Also, I heard that first grepping and then awking might hypothetically be a tiny bit quicker as allegedly, reportedly grep is (reputed to be) somewhat faster than awk. (Comments welcome.) I am also aware that it's probably perfectly possible (and conceivably even quicker) to do all of this in PHP. I haven't tried that because (a) my PHP skills blow even harder than my scripting skills and (b) I'd really like to know how to do this kind of manipulation using "standard" UNIX shell commands. (Yes, yes, I know, a lot of people consider PHP "standard" as well, and one can even write php shell scripts, etc. etc... but PLEASE, have mercy on my soul. I'd be extremely grateful for any help on this. Edit: PS: I should add that I want to make as little assumptions about the source page as possible, so I don't want to just extract the nth $something (say, e.g. $9) with awk, because I don't want to assume that the talkback .mp3 URL always stays in the same place. |
| Forum Sponsor | ||
|
|
|
#2
|
||||
|
||||
|
How about
Code:
sed -n -e "s_.*a href=.\([^\"]*[Tt][Aa][Ll][Kk][.]*[Bb][Aa][Cc][Kk].[Mm][Pp]3\).*_\1_p" |
|
#3
|
||||
|
||||
|
Thanks a bunch vino. Your kung fu is strong.
As for myself, I will actually continue to read and search for a while, until I am darn sure I fully understand everything. But now I definitely know which way to go! |
||||
| Google The UNIX and Linux Forums |