I am having a text file which has url's in it.
so i want to write a script where it gets the url from the text file and opens the url if it is correct and existing, if not it should comment the one which is not existing.
example:
now i saved a file by name slist, which contains data as
http://yahoomail.com/
http://yahoommail.com/
http://gmail.com/
http://gmmaaiill.com/ and so on.
from this slist file which has url's , it should open each of the url's and check whether they are valid and are opening, if not valid (may be some 404 page not found errors are any problem) then those url's should be commented out in the slist file as shown below.
#htp:/gmail.com/
http://yahoomail.com/
#
http://yahoommail.com/
http://gmail.com/
#
http://gmmaaiill.com/
i tried as
wget -i ./sitelist
and it displayed as,
$ wget -i ./sitelist
------------------------------------------------------------------------
./sitelist: Invalid URL htp:/gmail.com/: Unsupported scheme
--07:04:31--
http://yahoomail.com/
=> `index.html'
Resolving yahoomail.com... 216.109.112.135, 66.94.234.13
Connecting to yahoomail.com|216.109.112.135|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location:
http://mail.yahoo.com/ [following]
--07:04:32--
http://mail.yahoo.com/
=> `index.html'
Resolving mail.yahoo.com... 209.73.168.74
Connecting to mail.yahoo.com|209.73.168.74|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location:
https://login.yahoo.com/config/login_verify2?&.src=ym [following]
--07:04:32--
https://login.yahoo.com/config/login_verify2?&.src=ym
=> `login_verify2?&.src=ym'
Resolving login.yahoo.com... 209.73.168.74
Connecting to login.yahoo.com|209.73.168.74|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
[ <=> ] 26,138 --.--K/s
07:04:32 (44.43 MB/s) - `login_verify2?&.src=ym' saved [26138]
--07:04:32--
http://gmail.com/
=> `index.html'
Resolving gmail.com... 72.14.253.83, 64.233.171.83, 64.233.161.83
Connecting to gmail.com|72.14.253.83|:80... connected.
HTTP request sent, awaiting response... 302 Found
Cookie coming from gmail.com attempted to set domain to google.com
Location:
http://mail.google.com/mail/ [following]
--07:04:32--
http://mail.google.com/mail/
=> `index.html'
Resolving mail.google.com... 209.85.147.19, 209.85.147.18, 209.85.147.83
Connecting to mail.google.com|209.85.147.19|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location:
https://www.google.com/accounts/Serv...l&passive=true
&rm=false&continue=http%3A%2F%2Fmail.google.com%2Fmail%2F%3Fui%3Dhtml%26zy%3Dl&l
tmpl=default<mplcache=2 [following]
--07:04:33--
https://www.google.com/accounts/Serv...=mail&passive=
true&rm=false&continue=http%3A%2F%2Fmail.google.com%2Fmail%2F%3Fui%3Dhtml%26zy%3
Dl<mpl=default<mplcache=2
=> `ServiceLogin?service=mail&passive=true&rm=false&continue=http:%2F
%2Fmail.google.com%2Fmail%2F?ui=html&zy=l<mpl=default<mplcache=2'
Resolving
Google... 72.14.253.103, 72.14.253.99, 72.14.253.147, ...
Connecting to www.google.com|72.14.253.103|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16,131 (16K) [text/html]
100%[====================================>] 16,131 --.--K/s
07:04:33 (110.83 KB/s) - `ServiceLogin?service=mail&passive=true&rm=false&contin
ue=http:%2F%2Fmail.google.com%2Fmail%2F?ui=html&zy=l<mpl=default<mplcache=2'
saved [16131/16131]
--07:04:33--
Gmmail.com
=> `index.html'
Resolving gmmail.com... 206.207.87.4
Connecting to gmmail.com|206.207.87.4|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
[ <=> ] 32,276 172.51K/s
07:04:34 (172.20 KB/s) - `index.html' saved [32276]
--07:04:34--
http://yahoommail.com/
=> `index.html.1'
Resolving yahoommail.com... 216.109.112.135, 66.94.234.13
Connecting to yahoommail.com|216.109.112.135|:80... connected.
HTTP request sent, awaiting response... 404
07:04:34 ERROR 404: (no description).
FINISHED --07:04:34--
Downloaded: 74,545 bytes in 3 files
------------------------------------------------------------------------
So please suggest to get the following output