Extract values from multi lined url source


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract values from multi lined url source
# 1  
Old 11-26-2012
Extract values from multi lined url source

Hello,

I want extract multi values from multi url source to a csv text. Thank you very much for help.

my curl code : curl "http://www.web.com/cities/[1-1000]/city.html

Source code:
Code:
div class="clear"></div>

                    <table class="listing-details">
                        <tr>
                            <td class="icon"><img src="/theme/images/building.png" alt="address" /></td>
                                                <span class='streetaddress'>5149 West 84th Street</span>, <br />                                    </div>
                                    <span id='cityname' class='locality'>Yuma</span>, <span id='statename' class='region'>Arizona</span> <span class='postal-code'>85220</span></td>
                        </tr>
                            </td>
                            </tr>
                            
                        

                                                                                    <tr>
                                    <td class="icon"><img src="/theme/images/arrow.png" alt="website" /></td>
                                    <td>Visit <a href="http://www.web.com/" class="url" rel="nofollow">Web Optiq</a></td>
                                </tr>


Desire output:
Code:
 
"Web Optiq","http://www.web.com/","5149 West 84th Street Yuma, Arizona 85220"

# 2  
Old 11-26-2012
Try

Code:
$awk -F "[<>]" '/streetaddress/{s=$3}
/cityname/{s=s" "$3""$5""$7" "$11}
/<td>Visit/{split($4,P,"\"");print "\""$5"\",\""P[2]"\",\""s"\""}' file

"Web Optiq","http://www.web.com/","5149 West 84th Street Yuma, Arizona 85220"

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Quick and easy way to comment out multi lined print statements

Is there a quick and easy way to comment out multi lined print statements? something like this? printf("3408 strings_line_tokens %s \n", strings_line_tokens); (6 Replies)
Discussion started by: cokedude
6 Replies

2. Shell Programming and Scripting

Gawk - to extract values from multi lined file -I

Hi, Request your help in getting help with the below text formatting using awk. I am still learning awk and your help here is appreciated. Thanks in advance. Desireoutput ---------------- Virtual Pool Destination Profile Profile Profile Profile 1. virtual-1 pool-1 212.254.110.174:https... (2 Replies)
Discussion started by: pratheeshp
2 Replies

3. Shell Programming and Scripting

Gawk - to extract values from multi lined file

Hi, I am new to awk and trying to extract some specific fields from the a large file. Can you please help me to write gawk code displaying the out put in the below format: Desired Output: name fallback_ip member member www-trymps.extlb.plstry.com-pool-1 180.254.112.50 ... (4 Replies)
Discussion started by: pratheeshp
4 Replies

4. Shell Programming and Scripting

How to substract selective values in multi row, multi column file (using awk or sed?)

Hi, I have a problem where I need to make this input: nameRow1a,text1a,text2a,floatValue1a,FloatValue2a,...,floatValue140a nameRow1b,text1b,text2b,floatValue1b,FloatValue2b,...,floatValue140b look like this output: nameRow1a,text1b,text2a,(floatValue1a - floatValue1b),(floatValue2a -... (4 Replies)
Discussion started by: nricardo
4 Replies

5. Shell Programming and Scripting

How to use GREP to extract URL from file

Hi All , Here is what I want to do: Given a line: 98.70.217.222 - - "GET /liveupdate-aka.symantec.com/1340071490jtun_nav2k8enn09m25.m25?h=abcdefgh HTTP/1.1" 200 159229484 "-" "hBU1OhDsPXknMepDBJNScBj4BQcmUz5TwAAAAA" "-" 1. Get the URL component: ... (2 Replies)
Discussion started by: Naks_Sh10
2 Replies

6. Shell Programming and Scripting

How to extract url from html page?

for example, I have an html file, contain <a href="http://awebsite" id="awebsite" class="first">website</a>and sometime a line contains more then one link, for example <a href="http://awebsite" id="awebsite" class="first">website</a><a href="http://bwebsite" id="bwebsite"... (36 Replies)
Discussion started by: 14th
36 Replies

7. Shell Programming and Scripting

Extract URL from RSS Feed in AWK

Hi, I have following data file; <outline title="Matt Cutts" type="rss" version="RSS" xmlUrl="http://www.mattcutts.com/blog/feed/" htmlUrl="http://www.mattcutts.com/blog"/> <outline title="Stone" text="Stone" type="rss" version="RSS" xmlUrl="http://feeds.feedburner.com/STC-Art"... (8 Replies)
Discussion started by: fahdmirza
8 Replies

8. Shell Programming and Scripting

SED extract url - please help a lamer

Hello everybody. I have lines that looks something like this: <done16=""118"" done18=""$ title=""thisisatitle"" href=""/JoeBanana" alt=""Joe""><done16=""118"" done18=""$ title=""thisisatitle"" href=""/GeraldGiraffe" alt=""Gerald""> What kind of SED command would I need to use to extract... (4 Replies)
Discussion started by: digi
4 Replies

9. Shell Programming and Scripting

Extract single or multi word string in Cshell

I am using the following code: set LASInputFile = `ls *. | head -1` set COMPLine = `grep -i :COMPANY $LASInputFile` to extract the following line from my input file: COMP. XYZ Public Company :COMPANY NAME I now need to extract the full name of the company which... (15 Replies)
Discussion started by: phudgens
15 Replies

10. UNIX for Advanced & Expert Users

Does anybody know Kernel-2.4.33 source rpm URL??

Hi Everybody, I want to download the kernel-2.4.33 source rpm.Does anybody know from where should i download?? As in kernel.org there are .gz or .bz2 versions of kernel packages are available. Any help is welcome. Thanks, Sriram (2 Replies)
Discussion started by: sriram.ec
2 Replies
Login or Register to Ask a Question