![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Advanced & Expert Users Expert-to-Expert. Learn advanced UNIX, UNIX commands, Linux, Operating Systems, System Administration, Programming, Shell, Shell Scripts, Solaris, Linux, HP-UX, AIX, OS X, BSD. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| checking the request for urls - perl | KiD0 | Shell Programming and Scripting | 0 | 08-09-2009 10:02 PM |
| Parsing of file for Report Generation (String parsing and splitting) | umar.shaikh | Shell Programming and Scripting | 8 | 03-02-2009 01:38 AM |
| find and replace urls | benkyma | UNIX for Dummies Questions & Answers | 3 | 10-08-2008 08:22 AM |
| Microsoft Security Advisory (923762): Long URLs to sites using HTTP 1.1 and compressi | iBot | Security Advisories (RSS) - Microsoft | 0 | 12-24-2007 10:00 AM |
| Difficulty in Posting URLs in my threads | Nisha | Post Here to Contact Site Administrators and Moderators | 3 | 07-12-2002 12:32 AM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
Parsing a file which contains urls from different sites
Hi
I have a file which have millions of urls from different sites. Count of lines are 4000000. Code:
http://www.chipchick.com/2009/09/usb_hand_grenade.html http://www.engadget.com/page/5 http://www.mp3raid.com/search/download-mp3/20173/michael_jackson_fall_again_instrumental.html http://www.myacrobatpdf.com/8713/canon-speedlite-430ex-manual.html http://www.mobileheart.com/cell-phone-screensavers/1167-Sony-Ericsson-W200-Screensavers.aspx http://www.india-forums.com/forum_posts.asp?TID=1256207&TPN=2 http://gallery.mobile9.com/f/923680 http://www.phoronix.com/scan.php?page=article&item=xorg_vdpau_vaapi&num=1 http://www.experts-exchange.com/Software/Photos_Graphics http://www.jigzone.com/mpc/expired.php http://ultimatetop200.com/ http://www.mp3raid.com/search/for/the_maine/4.html http://gallery.mobile9.com/f/907594?view=download http://gallery.mobile9.com/f/907594 http://www.imdb.com/title/tt0813715/board/thread/147969365 http://www.imdb.com/name/nm0002028 Last edited by radoulov; 09-30-2009 at 06:33 AM.. Reason: please use code tags |
|
|||||
|
With GNU AWK you can do something like this:
Code:
gawk -F'http://(www\\.)?|/' '!_[$2]++{print $2}' infile
Code:
perl -nle'
print $1 unless $_{(m|http://(?:www.)?([^/]*)|)[0]}++
' infile
---------- Post updated at 12:49 PM ---------- Previous update was at 12:48 PM ---------- To keep the forums high quality for all users, please take the time to format your posts correctly. First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags [code] and [/code] by hand.) Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red. Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property. Thank You. The UNIX and Linux Forums |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|