Extracting the column containing URL from a text file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extracting the column containing URL from a text file
# 1  
Old 07-16-2014
Extracting the column containing URL from a text file

I have the file like this:

Timestamp URL Text 1331635241000 http://example.com Peoples footage at www.test.com,http://example4.com 1331635231000 http://example1.net crack the nuts http://example6.com 1331635280000 http://example2.net Loving thisEach column is tab separated. I need to extract only the URLs from column 2 and column 3 if in case of the no URLs then leave it empty for example to get the result like this:

URL Text http://example.com www.test.com,http://example4.com http://example1.net http://example6.com http://example2.net

I tried this script
Code:
awk 'BEGIN {FS="\t"} {print $2,$3}' file | grep -oP '(((http|https|ftp|gopher)|mailto)[.:][^ >"\t]*|www\.[-a-z0-9.]+)[^ .,;\t>">\):]'

This script can give me the all URLS in a single column without the header. Any suggestion to resolve this.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting the column containing URL from a text file

I have the file like this: Timestamp URL Text 1331635241000 http://example.com Peoples footage at www.test.com,http://example4.com 1331635231000 http://example1.net crack the nuts http://example6.com 1331635280000 http://example2.net ... (3 Replies)
Discussion started by: csim_mohan
3 Replies

2. Shell Programming and Scripting

Extracting the column containing URL from a text file

I have the file like this: Timestamp URL Text 1331635241000 http://example.com Peoples footage at www.test.com,http://example4.com 1331635231000 http://example1.net crack the nuts http://example6.com 1331635280000 http://example2.net ... (0 Replies)
Discussion started by: csim_mohan
0 Replies

3. UNIX for Dummies Questions & Answers

Extracting rows from a text file if the value of a column falls between a certain range

Hi, I have a file that looks like the following: 10 100080417 rs7915867 ILMN_1343295 12 6243093 7747537 10 100190264 rs2296431 ILMN_1343295 12 6643093 6647537 10 100719451 SNP94374 ILMN_1343295 12 6688093 7599537 ... (1 Reply)
Discussion started by: evelibertine
1 Replies

4. Shell Programming and Scripting

Extracting the file name from the specified URL

Hello Everyone, I am trying to write a shell script(or Perl Script) that would do the following: I have a file that contains the following lines: File: https://ims-svnus.com/dev/DB/trunk/feeds/templates/shell_script.txt -r860... (5 Replies)
Discussion started by: filter
5 Replies

5. UNIX for Dummies Questions & Answers

Extracting the last column of a text file

I would like to extract the last column of a text file but different rows of the text file have different numbers of columns. How do I go about doing that? Thanks! (1 Reply)
Discussion started by: evelibertine
1 Replies

6. UNIX for Dummies Questions & Answers

Extracting rows from a space delimited text file based on the values of a column

I have a space delimited text file. I want to extract rows where the third column has 0 as a value and write those rows into a new space delimited text file. How do I go about doing that? Thanks! (2 Replies)
Discussion started by: evelibertine
2 Replies

7. UNIX for Dummies Questions & Answers

Extracting rows from a text file based on numerical values of a column

I have a text file where the second column is a list of numbers going from small to large. I want to extract the rows where the second column is smaller than or equal to 0.0001. My input: rs10082730 9e-08 12 46002702 rs2544081 1e-07 12 46015487 rs1425136 1e-06 7 35396742 rs2712590... (1 Reply)
Discussion started by: evelibertine
1 Replies

8. UNIX for Dummies Questions & Answers

Extracting rows from a text file based on the first column

I have a tab delimited text file where the first column can take on three different values : 100, 150, 250. I want to extract all the rows where the first column is 100 and put them into a separate text file and so on. This is what my text file looks like now: 100 rs3794811 0.01 0.3434 100... (1 Reply)
Discussion started by: evelibertine
1 Replies

9. UNIX for Dummies Questions & Answers

Extracting rows from a text file based on the first column

I have a tab delimited text file where the first column can take on three different values : 100, 150, 250. I want to extract all the rows where the first column is 100 and put them into a separate text file and so on. This is what my text file looks like now: 100 rs3794811 0.01 0.3434... (1 Reply)
Discussion started by: evelibertine
1 Replies

10. Shell Programming and Scripting

Extracting anchor text and its URL from HTML files in BASH

Hi All, I have some HTML files and my requirement is to extract all the anchor text words from the HTML files along with their URLs and store the result in a separate text file separated by space. For example, <a href="/kid/stay_healthy/">Staying Healthy</a> which has /kid/stay_healthy/ as... (3 Replies)
Discussion started by: shoaibjameel123
3 Replies
Login or Register to Ask a Question
Mojo::URL(3pm)						User Contributed Perl Documentation					    Mojo::URL(3pm)

NAME
Mojo::URL - Uniform Resource Locator SYNOPSIS
use Mojo::URL; # Parse my $url = Mojo::URL->new('http://sri:foobar@kraih.com:3000/foo/bar?foo=bar#23'); say $url->scheme; say $url->userinfo; say $url->host; say $url->port; say $url->path; say $url->query; say $url->fragment; # Build my $url = Mojo::URL->new; $url->scheme('http'); $url->userinfo('sri:foobar'); $url->host('kraih.com'); $url->port(3000); $url->path('/foo/bar'); $url->path('baz'); $url->query->param(foo => 'bar'); $url->fragment(23); say $url; DESCRIPTION
Mojo::URL implements a subset of RFC 3986 and RFC 3987 for Uniform Resource Locators with support for IDNA and IRIs. ATTRIBUTES
Mojo::URL implements the following attributes. "authority" my $authority = $url->authority; $url = $url->authority('root:pass%3Bw0rd@localhost:8080'); Authority part of this URL. "base" my $base = $url->base; $url = $url->base(Mojo::URL->new); Base of this URL. "fragment" my $fragment = $url->fragment; $url = $url->fragment('foo'); Fragment part of this URL. "host" my $host = $url->host; $url = $url->host('127.0.0.1'); Host part of this URL. "port" my $port = $url->port; $url = $url->port(8080); Port part of this URL. "scheme" my $scheme = $url->scheme; $url = $url->scheme('http'); Scheme part of this URL. "userinfo" my $userinfo = $url->userinfo; $url = $url->userinfo('root:pass%3Bw0rd'); Userinfo part of this URL. METHODS
Mojo::URL inherits all methods from Mojo::Base and implements the following new ones. "new" my $url = Mojo::URL->new; my $url = Mojo::URL->new('http://127.0.0.1:3000/foo?f=b&baz=2#foo'); Construct a new Mojo::URL object. "clone" my $url2 = $url->clone; Clone this URL. "ihost" my $ihost = $url->ihost; $url = $url->ihost('xn--bcher-kva.ch'); Host part of this URL in punycode format. # "xn--da5b0n.net" Mojo::URL->new('http://X.net')->ihost; "is_abs" my $success = $url->is_abs; Check if URL is absolute. "parse" $url = $url->parse('http://127.0.0.1:3000/foo/bar?fo=o&baz=23#foo'); Parse URL. "path" my $path = $url->path; $url = $url->path('/foo/bar'); $url = $url->path('foo/bar'); $url = $url->path(Mojo::Path->new); Path part of this URL, relative paths will be appended to the existing path, defaults to a Mojo::Path object. # "http://mojolicio.us/DOM/HTML" Mojo::URL->new('http://mojolicio.us/perldoc/Mojo')->path('/DOM/HTML'); # "http://mojolicio.us/perldoc/DOM/HTML" Mojo::URL->new('http://mojolicio.us/perldoc/Mojo')->path('DOM/HTML'); # "http://mojolicio.us/perldoc/Mojo/DOM/HTML" Mojo::URL->new('http://mojolicio.us/perldoc/Mojo/')->path('DOM/HTML'); "query" my $query = $url->query; $url = $url->query(replace => 'with'); $url = $url->query([merge => 'with']); $url = $url->query({append => 'to'}); $url = $url->query(Mojo::Parameters->new); Query part of this URL, defaults to a Mojo::Parameters object. # "2" Mojo::URL->new('http://mojolicio.us?a=1&b=2')->query->param('b'); # "http://mojolicio.us?a=2&c=3" Mojo::URL->new('http://mojolicio.us?a=1&b=2')->query(a => 2, c => 3); # "http://mojolicio.us?a=2&b=2&c=3" Mojo::URL->new('http://mojolicio.us?a=1&b=2')->query([a => 2, c => 3]); # "http://mojolicio.us?b=2" Mojo::URL->new('http://mojolicio.us?a=1&b=2')->query([a => undef]); # "http://mojolicio.us?a=1&b=2&a=2&c=3" Mojo::URL->new('http://mojolicio.us?a=1&b=2')->query({a => 2, c => 3}); "to_abs" my $abs = $url->to_abs; my $abs = $url->to_abs(Mojo::URL->new('http://kraih.com/foo')); Clone relative URL and turn it into an absolute one. "to_rel" my $rel = $url->to_rel; my $rel = $url->to_rel(Mojo::URL->new('http://kraih.com/foo')); Clone absolute URL and turn it into a relative one. "to_string" my $string = $url->to_string; Turn URL into a string. SEE ALSO
Mojolicious, Mojolicious::Guides, <http://mojolicio.us>. perl v5.14.2 2012-09-05 Mojo::URL(3pm)