Sponsored Content
Top Forums Shell Programming and Scripting Extracting the column containing URL from a text file Post 302909399 by MadeInGermany on Wednesday 16th of July 2014 02:49:43 PM
Old 07-16-2014
Or this one?
Code:
awk 'BEGIN {OFS=FS="\t"} {s=""; for (col=2; col<=3; col++) {sep=""; n=split ($col,T,"[ ,<>]+"); for (i=1;i<=n;i++) if (NR==1 || T[i]~/^(http|https|ftp|gopher|mailto):|^www\.[-a-z0-9]+\./) {s=s sep T[i]; sep=" "}; s=s OFS} print s}' file
URL     Text
http://example.com      www.test.com http://example4.com
http://example1.net     http://example6.com
http://example2.net

This User Gave Thanks to MadeInGermany For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting anchor text and its URL from HTML files in BASH

Hi All, I have some HTML files and my requirement is to extract all the anchor text words from the HTML files along with their URLs and store the result in a separate text file separated by space. For example, <a href="/kid/stay_healthy/">Staying Healthy</a> which has /kid/stay_healthy/ as... (3 Replies)
Discussion started by: shoaibjameel123
3 Replies

2. UNIX for Dummies Questions & Answers

Extracting rows from a text file based on the first column

I have a tab delimited text file where the first column can take on three different values : 100, 150, 250. I want to extract all the rows where the first column is 100 and put them into a separate text file and so on. This is what my text file looks like now: 100 rs3794811 0.01 0.3434... (1 Reply)
Discussion started by: evelibertine
1 Replies

3. UNIX for Dummies Questions & Answers

Extracting rows from a text file based on the first column

I have a tab delimited text file where the first column can take on three different values : 100, 150, 250. I want to extract all the rows where the first column is 100 and put them into a separate text file and so on. This is what my text file looks like now: 100 rs3794811 0.01 0.3434 100... (1 Reply)
Discussion started by: evelibertine
1 Replies

4. UNIX for Dummies Questions & Answers

Extracting rows from a text file based on numerical values of a column

I have a text file where the second column is a list of numbers going from small to large. I want to extract the rows where the second column is smaller than or equal to 0.0001. My input: rs10082730 9e-08 12 46002702 rs2544081 1e-07 12 46015487 rs1425136 1e-06 7 35396742 rs2712590... (1 Reply)
Discussion started by: evelibertine
1 Replies

5. UNIX for Dummies Questions & Answers

Extracting rows from a space delimited text file based on the values of a column

I have a space delimited text file. I want to extract rows where the third column has 0 as a value and write those rows into a new space delimited text file. How do I go about doing that? Thanks! (2 Replies)
Discussion started by: evelibertine
2 Replies

6. UNIX for Dummies Questions & Answers

Extracting the last column of a text file

I would like to extract the last column of a text file but different rows of the text file have different numbers of columns. How do I go about doing that? Thanks! (1 Reply)
Discussion started by: evelibertine
1 Replies

7. Shell Programming and Scripting

Extracting the file name from the specified URL

Hello Everyone, I am trying to write a shell script(or Perl Script) that would do the following: I have a file that contains the following lines: File: https://ims-svnus.com/dev/DB/trunk/feeds/templates/shell_script.txt -r860... (5 Replies)
Discussion started by: filter
5 Replies

8. UNIX for Dummies Questions & Answers

Extracting rows from a text file if the value of a column falls between a certain range

Hi, I have a file that looks like the following: 10 100080417 rs7915867 ILMN_1343295 12 6243093 7747537 10 100190264 rs2296431 ILMN_1343295 12 6643093 6647537 10 100719451 SNP94374 ILMN_1343295 12 6688093 7599537 ... (1 Reply)
Discussion started by: evelibertine
1 Replies

9. Shell Programming and Scripting

Extracting the column containing URL from a text file

I have the file like this: Timestamp URL Text 1331635241000 http://example.com Peoples footage at www.test.com,http://example4.com 1331635231000 http://example1.net crack the nuts http://example6.com 1331635280000 http://example2.net ... (0 Replies)
Discussion started by: csim_mohan
0 Replies

10. Shell Programming and Scripting

Extracting the column containing URL from a text file

I have the file like this: Timestamp URL Text 1331635241000 http://example.com Peoples footage at www.test.com,http://example4.com 1331635231000 http://example1.net crack the nuts http://example6.com 1331635280000 http://example2.net ... (0 Replies)
Discussion started by: csim_mohan
0 Replies
COL(1)							    BSD General Commands Manual 						    COL(1)

NAME
col -- filter reverse line feeds from input SYNOPSIS
col [-bfhpx] [-l num] DESCRIPTION
The col utility filters out reverse (and half reverse) line feeds so that the output is in the correct order with only forward and half for- ward line feeds, and replaces white-space characters with tabs where possible. This can be useful in processing the output of nroff(1) and tbl(1). The col utility reads from the standard input and writes to the standard output. The options are as follows: -b Do not output any backspaces, printing only the last character written to each column position. -f Forward half line feeds are permitted (``fine'' mode). Normally characters printed on a half line boundary are printed on the fol- lowing line. -h Do not output multiple spaces instead of tabs (default). -l num Buffer at least num lines in memory. By default, 128 lines are buffered. -p Force unknown control sequences to be passed through unchanged. Normally, col will filter out any control sequences from the input other than those recognized and interpreted by itself, which are listed below. -x Output multiple spaces instead of tabs. The control sequences for carriage motion that col understands and their decimal values are listed in the following table: ESC-7 reverse line feed (escape then 7) ESC-8 half reverse line feed (escape then 8) ESC-9 half forward line feed (escape then 9) backspace moves back one column (8); ignored in the first column carriage return (13) newline forward line feed (10); also does carriage return shift in shift to normal character set (15) shift out shift to alternate character set (14) space moves forward one column (32) tab moves forward to next tab stop (9) vertical tab reverse line feed (11) All unrecognized control characters and escape sequences are discarded. The col utility keeps track of the character set as characters are read and makes sure the character set is correct when they are output. If the input attempts to back up to the last flushed line, col will display a warning message. ENVIRONMENT
The LANG, LC_ALL and LC_CTYPE environment variables affect the execution of col as described in environ(7). EXIT STATUS
The col utility exits 0 on success, and >0 if an error occurs. SEE ALSO
colcrt(1), expand(1), nroff(1), tbl(1) STANDARDS
The col utility conforms to Version 2 of the Single UNIX Specification (``SUSv2''). HISTORY
A col command appeared in Version 6 AT&T UNIX. BSD
August 4, 2004 BSD
All times are GMT -4. The time now is 10:40 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy