how to judge wether a url is valid or not using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting how to judge wether a url is valid or not using awk
# 1  
Old 05-10-2010
Question how to judge wether a url is valid or not using awk

rt
3ksSmilie
# 2  
Old 05-10-2010
So how do you define a valid url?
# 3  
Old 05-10-2010
i am using a c++ html parser to extracr links from the web pages.
but there are many abnormal url in the results.
fro exampel:
http://百度:http://www.g.cn
or
http://123/a.html

---------- Post updated at 11:41 AM ---------- Previous update was at 11:40 AM ----------

i am using a c++ html parser to extracr links from the web pages.
but there are many abnormal urls in the results.
for example:
http://百度:http://www.g.cn
or
http://123/a.html
# 4  
Old 05-10-2010
In your first example, there are non-ASCII code in url, you will think it is not valid url, or there should not have two http in one url?

in your second example, there is no . in first // / sesson?

Are the only roles for your request?
# 5  
Old 05-10-2010
yes, can you give me any idea .
thanks
# 6  
Old 05-10-2010
Code:
$ cat urfile
http://百度:http://www.g.cn
http://www.google.com/ab.html
http://123/a.html

$ grep -E -iv "http.*http|\/\/[0-9a-z]*\/" urfile
http://www.google.com/ab.html

# 7  
Old 05-10-2010
Code:
awk '{if ($0 ~ /^http:\/\/www*/) { print $0 ,"valid" } else { print $0 ,"invalid" }}' abc.txt


http://www.abc.com valid
http://abc.com invalid
http://123/a.html invalid
http://..:http://www.g.cn invalid
https://www.abc.com invalid

HTH,
PL
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Regex for a valid URL

Hi guys, What is the regex to check for only valid URL from a file using grep? (2 Replies)
Discussion started by: Meeran Rizvi
2 Replies

2. Shell Programming and Scripting

Wget fails for a valid URL

Wget Error Codes: 0 No problems occurred. 1 Generic error code. 2 Parse error—for instance, when parsing command-line options, the .wgetrc or .netrc… 3 File I/O error. 4 Network failure. 5 SSL verification failure. 6 Username/password authentication failure. ... (3 Replies)
Discussion started by: mohtashims
3 Replies

3. Shell Programming and Scripting

Using awk to determine if field value is valid

Hi Forum. I tried to search the forum posts for an answer but I haven't been able to do so for what I'm trying to accomplish. I have the following source file: 11936385~TFSA|11936385|4431|3401458067|10/09/1982|25.00|IBSBONUS|3200|||||CASH| 3401458067|1005|... (3 Replies)
Discussion started by: pchang
3 Replies

4. UNIX for Dummies Questions & Answers

URL decoding with awk

The challenge: Decode URL's, i.e. convert %HEX to the corresponding special characters, using only UNIX base utilities, and without having to type out each special character. I have an anonymous C code snippet where the author assigns each hex digit a number from 0 to 16 and then does some... (2 Replies)
Discussion started by: uiop44
2 Replies

5. UNIX for Dummies Questions & Answers

Awk: print all URL addresses between iframe tags without repeating an already printed URL

Here is what I have so far: find . -name "*php*" -or -name "*htm*" | xargs grep -i iframe | awk -F'"' '/<iframe*/{gsub(/.\*iframe>/,"\"");print $2}' Here is an example content of a PHP or HTM(HTML) file: <iframe src="http://ADDRESS_1/?click=5BBB08\" width=1 height=1... (18 Replies)
Discussion started by: striker4o
18 Replies

6. UNIX for Advanced & Expert Users

Wether does it successful or unreachable?

Hi, all: How can I check what happen with my own NIC driver which response "successful" when local PC "ping" a remote linux PC but "unreachable" when it "ping" a remote windows XP PC? My writed driver runs in linux 3.0.4 kernel. thanks! li, kunlun (1 Reply)
Discussion started by: liklstar
1 Replies

7. Shell Programming and Scripting

awk command - not a valid identifier message

Trying to run the following awk command : export com.mics.ara.server.tools.sch_reports.Runner.num_threads=`awk -F= '!/^#/ && /com.mics.ara.server.tools.sch_reports.Runner.num_threads/{print $2}' $BKUPDIR/env.properties` -bash: export:... (6 Replies)
Discussion started by: venhart
6 Replies

8. UNIX for Dummies Questions & Answers

checking wether an inputed character is already in a variable

Hi, i have a variable which holds a variety of letters. eg, var=qwertyuiop what i want to do is determine wether an inputed letter is already stored inside the variable, so i can say to enter a new one. i have been playing around using tr and grep but nothing seems to work at all. ... (2 Replies)
Discussion started by: castillo
2 Replies

9. UNIX for Dummies Questions & Answers

checking wether an input is using letters of the alphabet

afternoon forums. I need to get a way of testing as to wether an inputed character is part of the english alphabet. i have come up with the following code but its not working at all. until '] do echo This is not a Letter done any help would be beneficial to me. (1 Reply)
Discussion started by: strasner
1 Replies
Login or Register to Ask a Question