Extracting the column containing URL from a text file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extracting the column containing URL from a text file
# 1  
Old 07-16-2014
Extracting the column containing URL from a text file

I have the file like this:

Code:
Timestamp       URL                    Text                     1331635241000   http://example.com     Peoples footage at www.test.com,http://example4.com 1331635231000   http://example1.net    crack the nuts http://example6.com    1331635280000   http://example2.net    Loving this

Each column is tab separated. I need to extract only the URLs from column 2 and column 3 if in case of the no URLs then leave it empty for example to get the result like this:

Code:
URL                    Text http://example.com     www.test.com,http://example4.com  http://example1.net    http://example6.com http://example2.net

I tried this script
Code:
awk 'BEGIN {FS="\t"} {print $2,$3}' file | grep -oP '(((http|https|ftp|gopher)|mailto)[.:][^ >"\t]*|www\.[-a-z0-9.]+)[^ .,;\t>">\):]'

This script can give me the all URLS in a single column without the header. Any suggestion to resolve this.
# 2  
Old 07-16-2014
Not clear. Please rephrase carefully.

EDIT: Guessing wildly:
Code:
awk 'BEGIN {OFS=FS="\t"} {n=split ($3, T, " "); $3=""; for (i=1;i<=n;i++) if (T[i]~/www|http|ftp|mailto|Text/) $3=$3 T[i];print $2,$3}' file
URL         Text    
http://example.com    www.test.com,http://example4.com
http://example1.net    http://example6.com
http://example2.net


Last edited by RudiC; 07-16-2014 at 03:56 PM.. Reason: Added the match for "Text"
This User Gave Thanks to RudiC For This Post:
# 3  
Old 07-16-2014
Or this one?
Code:
awk 'BEGIN {OFS=FS="\t"} {s=""; for (col=2; col<=3; col++) {sep=""; n=split ($col,T,"[ ,<>]+"); for (i=1;i<=n;i++) if (NR==1 || T[i]~/^(http|https|ftp|gopher|mailto):|^www\.[-a-z0-9]+\./) {s=s sep T[i]; sep=" "}; s=s OFS} print s}' file
URL     Text
http://example.com      www.test.com http://example4.com
http://example1.net     http://example6.com
http://example2.net

This User Gave Thanks to MadeInGermany For This Post:
# 4  
Old 07-17-2014
Thank you RudiC it worked as I want. Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting the column containing URL from a text file

I have the file like this: Timestamp URL Text 1331635241000 http://example.com Peoples footage at www.test.com,http://example4.com 1331635231000 http://example1.net crack the nuts http://example6.com 1331635280000 http://example2.net ... (0 Replies)
Discussion started by: csim_mohan
0 Replies

2. Shell Programming and Scripting

Extracting the column containing URL from a text file

I have the file like this: Timestamp URL Text 1331635241000 http://example.com Peoples footage at www.test.com,http://example4.com 1331635231000 http://example1.net crack the nuts http://example6.com 1331635280000 http://example2.net ... (0 Replies)
Discussion started by: csim_mohan
0 Replies

3. UNIX for Dummies Questions & Answers

Extracting rows from a text file if the value of a column falls between a certain range

Hi, I have a file that looks like the following: 10 100080417 rs7915867 ILMN_1343295 12 6243093 7747537 10 100190264 rs2296431 ILMN_1343295 12 6643093 6647537 10 100719451 SNP94374 ILMN_1343295 12 6688093 7599537 ... (1 Reply)
Discussion started by: evelibertine
1 Replies

4. Shell Programming and Scripting

Extracting the file name from the specified URL

Hello Everyone, I am trying to write a shell script(or Perl Script) that would do the following: I have a file that contains the following lines: File: https://ims-svnus.com/dev/DB/trunk/feeds/templates/shell_script.txt -r860... (5 Replies)
Discussion started by: filter
5 Replies

5. UNIX for Dummies Questions & Answers

Extracting the last column of a text file

I would like to extract the last column of a text file but different rows of the text file have different numbers of columns. How do I go about doing that? Thanks! (1 Reply)
Discussion started by: evelibertine
1 Replies

6. UNIX for Dummies Questions & Answers

Extracting rows from a space delimited text file based on the values of a column

I have a space delimited text file. I want to extract rows where the third column has 0 as a value and write those rows into a new space delimited text file. How do I go about doing that? Thanks! (2 Replies)
Discussion started by: evelibertine
2 Replies

7. UNIX for Dummies Questions & Answers

Extracting rows from a text file based on numerical values of a column

I have a text file where the second column is a list of numbers going from small to large. I want to extract the rows where the second column is smaller than or equal to 0.0001. My input: rs10082730 9e-08 12 46002702 rs2544081 1e-07 12 46015487 rs1425136 1e-06 7 35396742 rs2712590... (1 Reply)
Discussion started by: evelibertine
1 Replies

8. UNIX for Dummies Questions & Answers

Extracting rows from a text file based on the first column

I have a tab delimited text file where the first column can take on three different values : 100, 150, 250. I want to extract all the rows where the first column is 100 and put them into a separate text file and so on. This is what my text file looks like now: 100 rs3794811 0.01 0.3434 100... (1 Reply)
Discussion started by: evelibertine
1 Replies

9. UNIX for Dummies Questions & Answers

Extracting rows from a text file based on the first column

I have a tab delimited text file where the first column can take on three different values : 100, 150, 250. I want to extract all the rows where the first column is 100 and put them into a separate text file and so on. This is what my text file looks like now: 100 rs3794811 0.01 0.3434... (1 Reply)
Discussion started by: evelibertine
1 Replies

10. Shell Programming and Scripting

Extracting anchor text and its URL from HTML files in BASH

Hi All, I have some HTML files and my requirement is to extract all the anchor text words from the HTML files along with their URLs and store the result in a separate text file separated by space. For example, <a href="/kid/stay_healthy/">Staying Healthy</a> which has /kid/stay_healthy/ as... (3 Replies)
Discussion started by: shoaibjameel123
3 Replies
Login or Register to Ask a Question