[lynx dump] Order (by name/URL)


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting [lynx dump] Order (by name/URL)
# 8  
Old 04-08-2009
given a sample google.com:
Code:
[1] txt1 [2]blabla [3]Other txt
[4] some text
1. http://url_of_txt1
2. http://url_of_blabla
3. http://url_of_Other_txt
4. http://url_of_some_text

Code:
nawk -f aspire.awk google.com > foo.txt

produces foo.txt:
Code:
 txt1
http://url_of_txt1
blabla
http://url_of_blabla
Other txt
http://url_of_Other_txt
 some text
http://url_of_some_text

# 9  
Old 04-08-2009
If you use my example your script is ok, thanks Smilie

but if you try with others www pages there are some problems...

so for example try this:

$ lynx -dump http://www.google.com > foo1.txt ; nawk -f aspire.awk foo1.txt > foo.txt


this command produces an empty foo.txt file...

I search for an universal name/URL extractor for any www dump page Smilie

Last edited by aspire; 04-08-2009 at 06:18 PM..
# 10  
Old 04-08-2009
ok, how about this - a bit closer - not perfect though:
Code:
BEGIN {
}
/[[]/ {
  while (match($0, "[[][0-9]*[]]") ) {
     idx=substr($0,RSTART+1, RLENGTH-2) "."
     rem=substr($0,RSTART+RLENGTH)
     match(rem, "[^[]*([[]|$)")
     name=substr(rem, RSTART,RLENGTH-1)
     if (length(rem)==(length(name)+1))
         name=substr(rem, RSTART)

     arr[idx]= name
  $0=substr(rem, RSTART+RLENGTH-1)
  }
  next
}
$1 in arr { print arr[$1] ORS $2 }

# 11  
Old 04-09-2009
very thanks vgersh99 Smilie

So, now your script is good for google.com but i have some problems with others pages...

I try aspire.awk with lots of random www pages... and the result isn't always good Smilie

The problem is that i have a lot of pages with variable contents (and aspire.awk will be inside another script), so i search for a universal program...

(I understand that this request is very hard... :P )
# 12  
Old 04-09-2009
I'm not sure if it's at all possible to write a generic 'extractor', but give me a couple of URLs and I'll see what I can do.
No promises of course....
# 13  
Old 04-10-2009
Many thanks vgersh99,

I solved in another way but i use aspire.awk in another my script Smilie
Very usefull for me!!!

Thanks again for your help and ypur time Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Reading URL using Mechanize and dump all the contents of the URL to a file

Hello, Am very new to perl , please help me here !! I need help in reading a URL from command line using PERL:: Mechanize and needs all the contents from the URL to get into a file. below is the script which i have written so far , #!/usr/bin/perl use LWP::UserAgent; use... (2 Replies)
Discussion started by: scott_cog
2 Replies

2. UNIX for Dummies Questions & Answers

Read URL data from UNIX without wget,curl,lynx,w3m.

Hi Experts, Problem statement : We have an URL for which we need to read the data and get parsed inside the shell scripts. My Aix has very limited perl utility, i cant install any utility as well. Precisely, wget,cURL,Lynx,w3m and Lwp cant be used as i got these utilities only when i googled... (0 Replies)
Discussion started by: scott_cog
0 Replies

3. UNIX for Dummies Questions & Answers

Awk: print all URL addresses between iframe tags without repeating an already printed URL

Here is what I have so far: find . -name "*php*" -or -name "*htm*" | xargs grep -i iframe | awk -F'"' '/<iframe*/{gsub(/.\*iframe>/,"\"");print $2}' Here is an example content of a PHP or HTM(HTML) file: <iframe src="http://ADDRESS_1/?click=5BBB08\" width=1 height=1... (18 Replies)
Discussion started by: striker4o
18 Replies

4. Web Development

Regex to rewrite URL to another URL based on HTTP_HOST?

I am trying to find a way to test some code, but I need to rewrite a specific URL only from a specific HTTP_HOST The call goes out to http://SUB.DOMAIN.COM/showAssignment/7bde10b45efdd7a97629ef2fe01f7303/jsmodule/Nevow.Athena The ID in the middle is always random due to the cookie. I... (5 Replies)
Discussion started by: EXT3FSCK
5 Replies

5. Shell Programming and Scripting

lynx --dump on site that needs username and password??

I'm trying to use lynx --dump to keep an eye on updates for a website. The site needs a username and password and I can't find a way to log in using lynx --dump Any ideas?? Thanks in advance! (12 Replies)
Discussion started by: 64mb
12 Replies

6. UNIX for Dummies Questions & Answers

Trying to make fixtures table with lynx --dump and pipe filters

Hey, I'm trying to make a nice clear table of fixtures. lynx --dump Fixtures & Reports | Fixtures | Arsenal.com | tail -n+360 | less #tail to remove 1st 360 line I'm trying to remove the 'Add to Calendar' bit next I tried pipping through sed but not sure if I did it right sed 's/\Add... (3 Replies)
Discussion started by: 64mb
3 Replies

7. UNIX for Dummies Questions & Answers

ReDirecting a URL to another URL - Linux

Hello, I need to redirect an existing URL, how can i do that? There's a current web address to a GUI that I have to redirect to another webaddress. Does anyone know how to do this? This is on Unix boxes Linux. example: https://m45.testing.address.net/host.php make it so the... (3 Replies)
Discussion started by: SkySmart
3 Replies

8. Shell Programming and Scripting

url calling and parameter passing to url in script

Hi all, I need to write a unix script in which need to call a url. Then need to pass parameters to that url. please help. Regards, gander_ss (1 Reply)
Discussion started by: gander_ss
1 Replies

9. UNIX for Advanced & Expert Users

url calling and parameter passing to url in script

Hi all, I need to write a unix script in which need to call a url. Then need to pass parameters to that url. please help. Regards, gander_ss (1 Reply)
Discussion started by: gander_ss
1 Replies

10. UNIX for Dummies Questions & Answers

help, what is the difference between core dump and panic dump?

help, what is the difference between core dump and panic dump? (1 Reply)
Discussion started by: aileen
1 Replies
Login or Register to Ask a Question