Small script for website links and regular expressions


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Small script for website links and regular expressions
# 1  
Old 09-06-2011
Small script for website links and regular expressions

Need a help with a simple(i hope) script that would get a website location from stdin and then check all the links that site contains for some random regular expression ,and then save the links name and the expression found in some random file.Any help would be really helpfull.

Considerin i`m not that good with shell scripting i wouldnt know where to start Smilie
# 2  
Old 09-06-2011
I think I a little good in shell scripting but I don't know where to start too... Smilie

PS This would not be a "simple" script in any ways. You either need to get all site with a special program like HTTPTrack and then grepping through for your regex and links, or use a special library for some programming language (like libwww for perl) and write a non trivial program.
# 3  
Old 09-06-2011
There r only 2 issues with the script that r hard i guess , how to get to download the links , and how to search the contents of the links itself , but as to how to do that , i got no idea :S
# 4  
Old 09-06-2011
Do you need get and search links from one page or from the whole site?
For one page you can use:
Code:
curl -s URL | perl -0777 -nE 'say for /<a href.*?>/g'

get links and then use grep on them.
# 5  
Old 09-06-2011
uhh from 1 domain i think
# 6  
Old 09-06-2011
Then see my post above. (You need to work with inner, outer, cross, multi links, redirection and so on, and so on).
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Regular expressions

I need to pick a part of string lets stay started with specific character and end with specific character to replace using sed command the line is like this:my audio book 71-skhdfon1dufgjhgf8.wav' I want to move the characters beginning with - end before. I have different files with random... (2 Replies)
Discussion started by: XP_2600
2 Replies

2. Shell Programming and Scripting

Regular expressions in tcsh script

Hi, I have a shell script in tcsh to which I pass an argument, the length of which can vary. The possible values of the argument are the letters -c,s,i,q,a. and also a combination of these letters. (e.g: cs,si,ca,iq,qa,csq,acs,csia ..etc). The order of the letters does not matter. My problem... (2 Replies)
Discussion started by: Vaisakh P
2 Replies

3. Shell Programming and Scripting

wget crawl website by extracting links

I am using wget to crawl a website using the following command: wget --wait=20 --limit-rate=20K -r -p -U Mozilla http://www.stupidsite.com What I have found is that after two days of crawling some links are still not downloaded. For example, if some page has 10 links in it as anchor texts... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies

4. Shell Programming and Scripting

Help with regular expressions

I have a file that I'm trying to find all the cases of phone number extensions and deleting them. So input file looks like: abc x93825 def 13234 x52673 hello output looks like: abc def 13234 hello Basically delete lines that have 5 numbers following "x". I tried: x\(4) but it... (7 Replies)
Discussion started by: pxalpine
7 Replies

5. Shell Programming and Scripting

Regular Expressions in K Shell Script

I need to write a K shell script to find full file names , line numbers and lines which have words meeting either of following 2 criterias - 1)words which are 6 to 8 character long and alphanumeric. 2)Minimum 8 characters, one upper case, one lower case letter, one of the special characters... (1 Reply)
Discussion started by: Rajpreet1985
1 Replies

6. UNIX for Dummies Questions & Answers

Regular expressions

In regular expressions with grep(or egrep), ^ works if we want something in starting of line..but what if we write ^^^ or ^ for pattern matching??..Hope u all r familiar with regular expressions for pattern matching.. (1 Reply)
Discussion started by: aadi_uni
1 Replies

7. Shell Programming and Scripting

regular expressions using perl script

i have a set of regular expressions. The words in the regular expression should be used to replace the i/p with hyphens '---'. i need perl script to evaluate these regular expression. the words in the regexes when found in the i/p file should be replaced with hyphens '---'. the set of regular... (3 Replies)
Discussion started by: Sgiri1
3 Replies

8. UNIX for Dummies Questions & Answers

Execute a shell script using regular expressions

I am have a configuration script that my shell script uses. There is a regular expression defined for the input file. How do execute the shell script and pass the name of the input file using a regular expression. I would greatly appreciate some help. If you could point my to a website that... (1 Reply)
Discussion started by: supergirl3954
1 Replies
Login or Register to Ask a Question