sed, grep, awk, regex -- extracting a matched substring from a file/string


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sed, grep, awk, regex -- extracting a matched substring from a file/string
# 1  
Old 05-23-2006
sed, grep, awk, regex -- extracting a matched substring from a file/string

Ok, I'm stumped and can't seem to find relevant info.
(I'm not even sure, I might have asked something similar before.):

I'm trying to use shell scripting/UNIX commands to extract URLs from a fairly large web page, with a view to ultimately wrapping this in PHP with exec() and including the URLs in a webpage that I'm trying to then generate for myself.

Here's what I have so far:

I'm catching the page with cURL:
Code:
# curl -s http://archive.wbai.org/

I'm then grepping all the lines that include the (case-insensitive) string "talkback":
Code:
# curl -s http://archive.wbai.org/ | grep -i talkback

NB: I wanted to grep for either "talk back" or "talkback", and the docs I found said that I could use "talk*back", wherein the asterisk would signify zero or more characters, but I can't seem to get this to work.

Having gotten this far, I have a number of strings like this one (which is supposed to be one single line):
Code:
<td width="5%" valign="top" bgcolor="#767676" align="center"> \
<span class=headline3>1 \
<td width="10%" valign="top" bgcolor="#EFEFEF"><span class=archivelink> \
<a href="pls.php?mp3fil=4541"><u>Play</u></a></span>&nbsp; \
<span class=archivelink> \
<a href="http://archive.wbai.org/files/mp3/060222_150002talkback.MP3"> \
<u>Download</u></a>

I now want to extract just the second URL, the one with the .mp3 file.
I have tried to match
Code:
href*\.MP3

and then to somehow only get the URLs printed, but it just doesn't seem to work.

NB: I am aware that if I got awk to work right, I might no longer need to grep. For now, I'm still grepping though. Also, I heard that first grepping and then awking might hypothetically be a tiny bit quicker as allegedly, reportedly grep is (reputed to be) somewhat faster than awk. (Comments welcome.)

I am also aware that it's probably perfectly possible (and conceivably even quicker) to do all of this in PHP. I haven't tried that because (a) my PHP skills blow even harder than my scripting skills and (b) I'd really like to know how to do this kind of manipulation using "standard" UNIX shell commands. (Yes, yes, I know, a lot of people consider PHP "standard" as well, and one can even write php shell scripts, etc. etc... but PLEASE, have mercy on my soul. Smilie)

I'd be extremely grateful for any help on this. Smilie

Edit:
PS: I should add that I want to make as little assumptions about the source page as possible, so I don't want to just extract the nth $something (say, e.g. $9) with awk, because I don't want to assume that the talkback .mp3 URL always stays in the same place.
# 2  
Old 05-23-2006
How about

Code:
sed -n -e "s_.*a href=.\([^\"]*[Tt][Aa][Ll][Kk][.]*[Bb][Aa][Cc][Kk].[Mm][Pp]3\).*_\1_p"

It says, capture everything within the quotes until you encounter .mp3 or .MP3 or anything similiar. It should handle "talk back", "talkback" and is case insensitive.
# 3  
Old 05-23-2006
Thanks a bunch vino. Your kung fu is strong.
As for myself, I will actually continue to read and search for a while, until I am darn sure I fully understand everything. But now I definitely know which way to go! Smilie Thanks again!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting substring within string between 2 token within the string

Hello. First best wishes for everybody. here is the input file ("$INPUT1") contents : BASH_FUNC_message_begin_script%%=() { local -a L_ARRAY; BASH_FUNC_message_debug%%=() { local -a L_ARRAY; BASH_FUNC_message_end_script%%=() { local -a L_ARRAY; BASH_FUNC_message_error%%=() { local... (3 Replies)
Discussion started by: jcdole
3 Replies

2. Shell Programming and Scripting

Replace string of a file with a string of another file for matches using grep,sed,awk

I have a file comp.pkglist which mention package version and release . In 'version change' and 'release change' line there are two versions 'old' and 'new' Version Change: --> Release Change: --> cat comp.pkglist Package list: nss-util-devel-3.28.4-1.el6_9.x86_64 Version Change: 3.28.4 -->... (1 Reply)
Discussion started by: Paras Pandey
1 Replies

3. Shell Programming and Scripting

Help on extracting a substring from the input string

Hi, I am new to Unix. I am trying to extract a substring from an input string: Ex - input string: deploy_v11_9_1 i want to extract and store the value v11_9_1 from the input string in a new variable. I am using following command in my shell script file: echo "Enter the folder name u... (5 Replies)
Discussion started by: Pranav Bhasker
5 Replies

4. Shell Programming and Scripting

Use grep sed or awk to extract string from log file and put into CSV

I'd like to copy strings from a log file and put them into a CSV. The strings could be on different line numbers, depending on size of log. Example Log File: File = foo.bat Date = 11/11/11 User = Foo Bar Size = 1024 ... CSV should look like: "foo.bat","11/11/11","Foo Bar","1024" (7 Replies)
Discussion started by: chipperuga
7 Replies

5. Shell Programming and Scripting

Extracting a substring from a string in unix

Hi, I would like to extract a substring from a string in unix. eg: ./checkfile.sh -- i need only checkfile.sh from this string. Could someone help me out in this... Regards Arun (19 Replies)
Discussion started by: arunkumarmc
19 Replies

6. Shell Programming and Scripting

Extracting substring from string

Hi awk and sed gurus, Please help me in the following. I have the following entries in the file ABCDErules AbHDPrules ABCrules -- -- and other entries in the file. Now, I want to extract from the file that contain entries for *rules and process it separately. How can i do it... (6 Replies)
Discussion started by: sdosanjh
6 Replies

7. Shell Programming and Scripting

Extracting particular string in a file and storing matched string in output file

Hi , I have input file and i want to extract below strings <msisdn xmlns="">0492001956</ msisdn> => numaber inside brackets <resCode>3000</resCode> => 3000 needs to be extracted <resMessage>Request time getBalances_PSM.c(37): d out</resMessage></ns2:getBalancesResponse> => the word... (14 Replies)
Discussion started by: sushmab82
14 Replies

8. Shell Programming and Scripting

extracting matched pattern from a line using sed

I am trying to pull certain pieces of data out of a line of a file that matches a certain pattern: The three pieces that I want to pull out of this line are the only occurrences of that pattern within the line, but the rest of the line is not consistent in each file. Basically the line is... (3 Replies)
Discussion started by: ellhef
3 Replies

9. Shell Programming and Scripting

Extracting pattern only with AWK | SED | GREP

We have the following statement working in CGYWIN, but when we move the program to Solaris 10 it fails. x=`echo "ABC196925XYZ" | grep -o --only-matching "\{6\}"` How can we use AWK or SED to extract only the number from the string? The following outputs the entire string. We only want... (5 Replies)
Discussion started by: James Clark
5 Replies

10. Shell Programming and Scripting

AWK - Extracting matched line

Hi all, I have one more query related to AWK. I have the following csv data: ,qwertyA, field1, field2, field3, field4, field5, field6 ,,,,,,,,,,,,,,,,,,,100,200 ,,,,,,,,,,,,,,,,,,,300,400 ,qwertyB, field1, field2, field3, field4, field5, field6 ,,,,,,,,,,,,,,,,,,,100,200... (9 Replies)
Discussion started by: not4google
9 Replies
Login or Register to Ask a Question