Extracting the column containing URL from a text file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extracting the column containing URL from a text file
# 1  
Old 07-16-2014
Extracting the column containing URL from a text file

I have the file like this:

Timestamp URL Text 1331635241000 http://example.com Peoples footage at www.test.com,http://example4.com 1331635231000 http://example1.net crack the nuts http://example6.com 1331635280000 http://example2.net Loving thisEach column is tab separated. I need to extract only the URLs from column 2 and column 3 if in case of the no URLs then leave it empty for example to get the result like this:

URL Text http://example.com www.test.com,http://example4.com http://example1.net http://example6.com http://example2.net

I tried this script
Code:
awk 'BEGIN {FS="\t"} {print $2,$3}' file | grep -oP '(((http|https|ftp|gopher)|mailto)[.:][^ >"\t]*|www\.[-a-z0-9.]+)[^ .,;\t>">\):]'

This script can give me the all URLS in a single column without the header. Any suggestion to resolve this.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting the column containing URL from a text file

I have the file like this: Timestamp URL Text 1331635241000 http://example.com Peoples footage at www.test.com,http://example4.com 1331635231000 http://example1.net crack the nuts http://example6.com 1331635280000 http://example2.net ... (3 Replies)
Discussion started by: csim_mohan
3 Replies

2. Shell Programming and Scripting

Extracting the column containing URL from a text file

I have the file like this: Timestamp URL Text 1331635241000 http://example.com Peoples footage at www.test.com,http://example4.com 1331635231000 http://example1.net crack the nuts http://example6.com 1331635280000 http://example2.net ... (0 Replies)
Discussion started by: csim_mohan
0 Replies

3. UNIX for Dummies Questions & Answers

Extracting rows from a text file if the value of a column falls between a certain range

Hi, I have a file that looks like the following: 10 100080417 rs7915867 ILMN_1343295 12 6243093 7747537 10 100190264 rs2296431 ILMN_1343295 12 6643093 6647537 10 100719451 SNP94374 ILMN_1343295 12 6688093 7599537 ... (1 Reply)
Discussion started by: evelibertine
1 Replies

4. Shell Programming and Scripting

Extracting the file name from the specified URL

Hello Everyone, I am trying to write a shell script(or Perl Script) that would do the following: I have a file that contains the following lines: File: https://ims-svnus.com/dev/DB/trunk/feeds/templates/shell_script.txt -r860... (5 Replies)
Discussion started by: filter
5 Replies

5. UNIX for Dummies Questions & Answers

Extracting the last column of a text file

I would like to extract the last column of a text file but different rows of the text file have different numbers of columns. How do I go about doing that? Thanks! (1 Reply)
Discussion started by: evelibertine
1 Replies

6. UNIX for Dummies Questions & Answers

Extracting rows from a space delimited text file based on the values of a column

I have a space delimited text file. I want to extract rows where the third column has 0 as a value and write those rows into a new space delimited text file. How do I go about doing that? Thanks! (2 Replies)
Discussion started by: evelibertine
2 Replies

7. UNIX for Dummies Questions & Answers

Extracting rows from a text file based on numerical values of a column

I have a text file where the second column is a list of numbers going from small to large. I want to extract the rows where the second column is smaller than or equal to 0.0001. My input: rs10082730 9e-08 12 46002702 rs2544081 1e-07 12 46015487 rs1425136 1e-06 7 35396742 rs2712590... (1 Reply)
Discussion started by: evelibertine
1 Replies

8. UNIX for Dummies Questions & Answers

Extracting rows from a text file based on the first column

I have a tab delimited text file where the first column can take on three different values : 100, 150, 250. I want to extract all the rows where the first column is 100 and put them into a separate text file and so on. This is what my text file looks like now: 100 rs3794811 0.01 0.3434 100... (1 Reply)
Discussion started by: evelibertine
1 Replies

9. UNIX for Dummies Questions & Answers

Extracting rows from a text file based on the first column

I have a tab delimited text file where the first column can take on three different values : 100, 150, 250. I want to extract all the rows where the first column is 100 and put them into a separate text file and so on. This is what my text file looks like now: 100 rs3794811 0.01 0.3434... (1 Reply)
Discussion started by: evelibertine
1 Replies

10. Shell Programming and Scripting

Extracting anchor text and its URL from HTML files in BASH

Hi All, I have some HTML files and my requirement is to extract all the anchor text words from the HTML files along with their URLs and store the result in a separate text file separated by space. For example, <a href="/kid/stay_healthy/">Staying Healthy</a> which has /kid/stay_healthy/ as... (3 Replies)
Discussion started by: shoaibjameel123
3 Replies
Login or Register to Ask a Question
DPKG-WWW(1)						      General Commands Manual						       DPKG-WWW(1)

NAME
dpkg-www - program to remotely open a WWW Debian package browser SYNOPSIS
dpkg-www [-s|--stdout] [-h|--host host] [query] DESCRIPTION
dpkg-www is used to remotely control a WEB browser and open a dpkg URL on the local host from an interactive shell or script. If the command is invoked while running under X-window the script will try to find an installed X browser to open the specified URL. If not running under X it will try to use a text browser instead. If Netscape is found and a Netscape instance is already running it will be asked to open the dpkg URL on localhost with the optional query supplied on the command line. If there is no browser running, it will start automatically a new one. The functionality provided by this program is identical to running a web browser with the -remote openURL(http://localhost/cgi-bin/dpkg) argument or opening the same URL from within the browser. dpkg-www-installer is an helper application which can configured in the WWW browser for web installation. It should never be invoked directly by the user. OPTIONS
-s, --stdout Redirect output to stdout. Requires one of the text browsers (lynx, lynx-ssl or links) installed. -h, --host host Send the query to a remote host, where dpkg-www must be installed. query Specifies an optional package name, an absolute pathname or a query argument which is passed to the dpkg cgi-bin. See dpkg-www(8) for more information about the use of the cgi. FILES
/etc/dpkg-www.conf ~/.dpkg-www Configuration files for dpkg-www. It is not necessary for these files to exist, there are sensible defaults for everything, but you can specify your preferred www browser with the DPKG_WWW_BROWSER variable, for example: DPKG_WWW_BROWSER=mozilla EXAMPLES
dpkg-www This would open a dpkg URL on localhost listing all the installed packages. dpkg-www bash This would open a dpkg URL asking info on the bash package. dpkg-www -h pisolo bash This would open a dpkg URL asking info on the bash package on host pisolo. dpkg-www 'dpkg*' This would open a dpkg URL listing all packages matching dpkg*. dpkg-www /bin/bash This would open a dpkg URL asking info on the package(s) owning the file /bin/bash . dpkg-www depends=svgalib This would open a dpkg URL listing all packages depending on svgalib. dpkg-www --stdout depends=awk | grep ^ii This would list on stdout all packages depending on awk and grep all lines of installed packages. SEE ALSO
dpkg(8), dpkg-www(8) AUTHOR
Massimo Dal Zotto <dz@debian.org>. Bugs should be reported via the normal Debian bug reporting system. LICENCE
dpkg-www is licensed under the GNU General Public License version 2. September 1, 2004 DPKG-WWW(1)