Parsing a file which contains urls from different sites
Hi
I have a file which have millions of urls from different sites. Count of lines are 4000000.
I want some command or code which can give me count of urls from individual sites e.g imdb, experts-exchange. gallery.mobile9
Last edited by radoulov; 09-30-2009 at 07:33 AM..
Reason: please use code tags
---------- Post updated at 12:49 PM ---------- Previous update was at 12:48 PM ----------
To keep the forums high quality for all users, please take the time to format your posts correctly.
First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags [code] and [/code] by hand.)
Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.
Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.
Hi,
I am looking for a regex that will validate a URL and files accessed in a browser.
For example:http://www.google.co.uk
http://www.google.com
https://www.google.co.uk
https://www.google.com
ftp://
file:///somefile/on/a/server/accessed/from/browser/file.txt
So far I have:
... (4 Replies)
Hi ALL,
I have a file A which contains
A=www.google.com
B=www.abcd.com
C=www.nick.com
D=567
file B Contains
A=www.google1234.com
B=www.bacd.com
C=www.mick.com
D=789
I wanted a script which can replace file A contents with B Contents (5 Replies)
Discussion started by: nikhil jain
5 Replies
3. Post Here to Contact Site Administrators and Moderators
Hi,
I tried to post some perl code for discussion (wrapped in swaddling . However, a regex has an escaped backslash so the forum parser sees it as an URL?
Had the same experience with the sample data that I tried to provide for the same discussion. It contains emails addresses,... (1 Reply)
I am a total newbie to Apache. I need to do this only for this weekend during an upgrade from old system to new system
We have different URLs http://domain.name/xxx (xxx varies to any length and words - it can be /home, /login, /home/daily, /daily/report, etc).
How do i redirect all those to... (0 Replies)
So, I am writing a script that will read output from Bulk Extractor (which gathers data based on regular expressions). My script then reads the column that has the URL found, hashes it with MD5, then outputs the URL and hash to a file.
Where I am stuck on is that I want to read the bulk... (7 Replies)
Hi everyone. I have an html file with lines like so:
link href="localFolder/...">
link href="htp://...">
img src="localFolder/...">
img src="htp://...">
I want to remove the links with http in the href and imgs with http in its src. I'm having trouble removing them because there... (4 Replies)
Hey guys,
I have this file generated by me... i want to create some HTML output from it.
The problem is that i am really confused about how do I go about reading the file.
The file is in the following format:
TID1 Name1 ATime=xx AResult=yyy AExpected=yyy BTime=xx BResult=yyy... (8 Replies)
I need to archive a large website onto a DVD. Many of the links and image srcs are absolute URLs. As I don't want to alter them all manually, I'm looking for a perl or unix command that would remove:
http://www.mydomain.com/mysubfolder/
and replace with:
./
Can anyone help me with this... (3 Replies)