Sponsored Content
Top Forums Shell Programming and Scripting Make copy of text file with columns removed (based on header) Post 302932504 by Don Cragun on Thursday 22nd of January 2015 03:35:29 AM
Old 01-22-2015
On a recent Apple MacBook Pro, the following script:
Code:
#!/bin/ksh
if [ $# -gt 0 ]
then	rem_list="$@"
else	rem_list="dxv1 k2"
fi
awk -v del="$rem_list" '
BEGIN {	# Split del string into remhdr[] (indexed 1..nrem)...
	nrem = split(del, remhdr)
	# Create rem[] (indexed by titles of fields to be removed).
	for(i = 1; i <= nrem; i++) {
		rem[remhdr[i]]
		delete remhdr[i]
	}
	# Set OFS
	OFS = "\t"
}
NR <= 3 {
	# Read header lines into hdr[]...
	for(i = 1; i <= NF; i++)
		hdr[NR, i] = $i
	if(NR == 3) {
		# Create array of output fields to delete: od[]...
		for(i = 1; i <= NF; i++)
			if(hdr[3, i] in rem) {
				od[i]
				odc++
				delete rem[hdr[3,i]]
				nrem--
			}
		if(nrem) {
			for(i in rem)
				printf("*Field heading \"%s\" not found.*\n", i)
			printf("** Processing aborted. **\n");
			exit 1
		}
		# Print updated headers...
		for(i = 1; i <= 3; i++) {
			oc = NF - odc
			for(j = 1; j <= NF; j++)
				if(!(j in od))
					printf("%s%s", hdr[i, j],
						(--oc) ? OFS : ORS)
		}
	}
	next
}
{	# Print data lines...
	for(i in od)
		$i = ""
	$0 = $0
	$1 = $1
	print
}' original_f0_RSV_1912_A_S1v6_RI7_1916_15-01-10.txt

when invoked with no arguments or with the arguments dvx1 and k2 (in either order) with the files you uploaded in post #4 in this thread, produces output identical to the contents of the file intended_f0_RSV_1912_A_S1v6_RI7_1916_15-01-10.txt and the longest runtime from timing that script ten times when the script output is redirected to a regular file was:
Code:
real	0m0.07s
user	0m0.07s
sys	0m0.01s

and the fastest was:
Code:
real	0m0.06s
user	0m0.06s
sys	0m0.00s

As always, is you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk.

If this is part of a larger script that puts the header strings to be deleted in an array as you did in your last post, change the first few lines:
Code:
if [ $# -gt 0 ]
then	rem_list="$@"
else	rem_list="dxv1 k2"
fi
awk -v del="$rem_list" '

to:
Code:
awk -v del="${LIST_TO_REMOVE[@]}" '

or modify the above awk script to read your list file and your data file.

Although tested using ksh (a version of ksh93 on OS X), this will work with recent versions of both bash and ksh.
 

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Merging two files based on two columns to make a third file

Hi there, I'm trying to merge two files and make a third file. However, two of the columns need to match exactly in both files AND I want everything from both files in the output if the two columns match in that row. First file looks like this: chr1 10001980 T A Second... (12 Replies)
Discussion started by: infiniteabyss
12 Replies

2. Shell Programming and Scripting

Copy and Paste Columns in a Tab-Limited Text file

I have this text file with a very large number of columns (10,000+) and I want to move the first column to the position of the six column so that the text file looks like this: Before cutting and pasting ID Family Mother Father Trait Phenotype aaa bbb ... (5 Replies)
Discussion started by: evelibertine
5 Replies

3. UNIX for Dummies Questions & Answers

Extracting rows from a text file based on the values of two columns (given ranges)

Hi, I have a tab delimited text file with multiple columns. The second and third columns include numbers that have not been sorted. I want to extract rows where the second column includes a value between -0.01 and 0.01 (including both numbers) and the first third column includes a value between... (1 Reply)
Discussion started by: evelibertine
1 Replies

4. Shell Programming and Scripting

Reading columns from a text file and to make an array for each column

Hi, I am not so familiar with bash scripting and would appreciate your help here. I have a text file 'input.txt' like this: 2 3 4 5 6 7 8 9 10 I want to store each column in an array like this a ={2 5 8}, b={3 6 9}, c={4 7 10} so that i can access any element, e.g b=6 for the later use. (1 Reply)
Discussion started by: Asif Siddique
1 Replies

5. Shell Programming and Scripting

Extract columns based on header

Hi to all, I have two files. File1 has no header, two columns: sample1 A sample2 B sample3 B sample4 C sample5 A sample6 D sample7 D File2 has a header, except for the first 3 columns (chr,start,end). "sample1" is the header for the 4th ,5th ,6th columns, "sample2" is the header... (4 Replies)
Discussion started by: aec
4 Replies

6. Emergency UNIX and Linux Support

Average columns based on header name

Hi Friends, I have files with columns like this. This sample input below is partial. Please check below for main file link. Each file will have only two rows. ... (8 Replies)
Discussion started by: jacobs.smith
8 Replies

7. UNIX for Beginners Questions & Answers

Keep only columns in first two rows based on partial header pattern.

I have this code below that only prints out certain columns from the first two rows (doesn't affect rows 3 and beyond). How can I do the same on a partial header pattern “G_TP” instead of having to know specific column numbers (e.g. 374-479)? I've tried many other commands within this pipe with no... (4 Replies)
Discussion started by: aachave1
4 Replies

8. Shell Programming and Scripting

Find columns in a file based on header and print to new file

Hello, I have to fish out some specific columns from a file based on the header value. I have the list of columns I need in a different file. I thought I could read in the list of headers I need, # file with header names of required columns in required order headers_file=$2 # read contents... (11 Replies)
Discussion started by: LMHmedchem
11 Replies

9. Shell Programming and Scripting

Find header in a text file and prepend it to all lines until another header is found

I've been struggling with this one for quite a while and cannot seem to find a solution for this find/replace scenario. Perhaps I'm getting rusty. I have a file that contains a number of metrics (exactly 3 fields per line) from a few appliances that are collected in parallel. To identify the... (3 Replies)
Discussion started by: verdepollo
3 Replies
All times are GMT -4. The time now is 04:10 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy