Make copy of text file with columns removed (based on header)


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Make copy of text file with columns removed (based on header)
# 8  
Old 01-21-2015
Quote:
Originally Posted by RudiC
Due to missing samples, the assumption was every record is spread over three lines, so the relevant values had to be removed in the third lines...

Just remove the !(NR%3) to remove the columns in every line.
Yes, I didn't provide quite enough information there.

I have tested a bit and this doesn't appear to remove the column from the first two rows. From row 3 to the end, it looks fine. I am trying to remove the entire column based on the value of row 3. Removal also includes the corresponding column in the first two rows.

For this example,
Code:
awk 'NR==3          {MX=split (RM, T, " ")
                     for (i=1; i<=NF; i++)
                         for (n=1; n<=MX; n++)
                            if ($i==T[n]) T[n]=i
                    }
                    {for (n=1; n<=MX; n++) $(T[n])=""
                     $0=$0; $1=$1
                    }
     1
    ' FS="\t+" OFS="\t" RM="AtR_Ptb_L" $BASE_INPUT_FILE > $REVISED_FILE

The input file $BASE_INPUT_FILE has 58 columns. The output file $REVISED_FILEhas 57 columns for row 3 to the end, but the first two rows still have 58 columns. Is the issue that there is no match found until the 3rd row, so the first two rows are printed as is?

I will need to use a bash variable to pass in a value for RM. This is looping and the value of RM will be changing. It could be a single value or several. Passing a bash array there like RM="${LIST_TO_REMOVE[@]}" seems to work for a single element, but seems to be a problem with more than one element on the list. If I convert the array to a space delimited string, then it works for one variable or more than one. What do you think the best method is here?

LMHmedchem
# 9  
Old 01-22-2015
On a recent Apple MacBook Pro, the following script:
Code:
#!/bin/ksh
if [ $# -gt 0 ]
then	rem_list="$@"
else	rem_list="dxv1 k2"
fi
awk -v del="$rem_list" '
BEGIN {	# Split del string into remhdr[] (indexed 1..nrem)...
	nrem = split(del, remhdr)
	# Create rem[] (indexed by titles of fields to be removed).
	for(i = 1; i <= nrem; i++) {
		rem[remhdr[i]]
		delete remhdr[i]
	}
	# Set OFS
	OFS = "\t"
}
NR <= 3 {
	# Read header lines into hdr[]...
	for(i = 1; i <= NF; i++)
		hdr[NR, i] = $i
	if(NR == 3) {
		# Create array of output fields to delete: od[]...
		for(i = 1; i <= NF; i++)
			if(hdr[3, i] in rem) {
				od[i]
				odc++
				delete rem[hdr[3,i]]
				nrem--
			}
		if(nrem) {
			for(i in rem)
				printf("*Field heading \"%s\" not found.*\n", i)
			printf("** Processing aborted. **\n");
			exit 1
		}
		# Print updated headers...
		for(i = 1; i <= 3; i++) {
			oc = NF - odc
			for(j = 1; j <= NF; j++)
				if(!(j in od))
					printf("%s%s", hdr[i, j],
						(--oc) ? OFS : ORS)
		}
	}
	next
}
{	# Print data lines...
	for(i in od)
		$i = ""
	$0 = $0
	$1 = $1
	print
}' original_f0_RSV_1912_A_S1v6_RI7_1916_15-01-10.txt

when invoked with no arguments or with the arguments dvx1 and k2 (in either order) with the files you uploaded in post #4 in this thread, produces output identical to the contents of the file intended_f0_RSV_1912_A_S1v6_RI7_1916_15-01-10.txt and the longest runtime from timing that script ten times when the script output is redirected to a regular file was:
Code:
real	0m0.07s
user	0m0.07s
sys	0m0.01s

and the fastest was:
Code:
real	0m0.06s
user	0m0.06s
sys	0m0.00s

As always, is you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk.

If this is part of a larger script that puts the header strings to be deleted in an array as you did in your last post, change the first few lines:
Code:
if [ $# -gt 0 ]
then	rem_list="$@"
else	rem_list="dxv1 k2"
fi
awk -v del="$rem_list" '

to:
Code:
awk -v del="${LIST_TO_REMOVE[@]}" '

or modify the above awk script to read your list file and your data file.

Although tested using ksh (a version of ksh93 on OS X), this will work with recent versions of both bash and ksh.
# 10  
Old 01-22-2015
Adapted to your revised spec:
Code:
RM="RIexp dxv1 k2 THBint5"
awk     'function prep()
                        {for (n=1; n<=MX; n++) $(T[n])=""
                         $0=$0; $1=$1
                         print
                        }
         NR<3           {TMP[NR]=$0; next}
         NR==3          {MX=split (RM, T, " ")
                         for (i=1; i<=NF; i++)
                             for (n=1; n<=MX; n++)
                                 if ($i==T[n]) T[n]=i
                         SV=$0
                         for (j=1; j<3; j++)
                                {$0=TMP[j]
                                 prep()
                                }
                         $0=SV
                        }
                        {prep()}
        ' FS="\t+" OFS="\t" RM="$RM" /tmp/o.txt

Should work with arrays as well.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find header in a text file and prepend it to all lines until another header is found

I've been struggling with this one for quite a while and cannot seem to find a solution for this find/replace scenario. Perhaps I'm getting rusty. I have a file that contains a number of metrics (exactly 3 fields per line) from a few appliances that are collected in parallel. To identify the... (3 Replies)
Discussion started by: verdepollo
3 Replies

2. Shell Programming and Scripting

Find columns in a file based on header and print to new file

Hello, I have to fish out some specific columns from a file based on the header value. I have the list of columns I need in a different file. I thought I could read in the list of headers I need, # file with header names of required columns in required order headers_file=$2 # read contents... (11 Replies)
Discussion started by: LMHmedchem
11 Replies

3. UNIX for Beginners Questions & Answers

Keep only columns in first two rows based on partial header pattern.

I have this code below that only prints out certain columns from the first two rows (doesn't affect rows 3 and beyond). How can I do the same on a partial header pattern “G_TP” instead of having to know specific column numbers (e.g. 374-479)? I've tried many other commands within this pipe with no... (4 Replies)
Discussion started by: aachave1
4 Replies

4. Emergency UNIX and Linux Support

Average columns based on header name

Hi Friends, I have files with columns like this. This sample input below is partial. Please check below for main file link. Each file will have only two rows. ... (8 Replies)
Discussion started by: jacobs.smith
8 Replies

5. Shell Programming and Scripting

Extract columns based on header

Hi to all, I have two files. File1 has no header, two columns: sample1 A sample2 B sample3 B sample4 C sample5 A sample6 D sample7 D File2 has a header, except for the first 3 columns (chr,start,end). "sample1" is the header for the 4th ,5th ,6th columns, "sample2" is the header... (4 Replies)
Discussion started by: aec
4 Replies

6. Shell Programming and Scripting

Reading columns from a text file and to make an array for each column

Hi, I am not so familiar with bash scripting and would appreciate your help here. I have a text file 'input.txt' like this: 2 3 4 5 6 7 8 9 10 I want to store each column in an array like this a ={2 5 8}, b={3 6 9}, c={4 7 10} so that i can access any element, e.g b=6 for the later use. (1 Reply)
Discussion started by: Asif Siddique
1 Replies

7. UNIX for Dummies Questions & Answers

Extracting rows from a text file based on the values of two columns (given ranges)

Hi, I have a tab delimited text file with multiple columns. The second and third columns include numbers that have not been sorted. I want to extract rows where the second column includes a value between -0.01 and 0.01 (including both numbers) and the first third column includes a value between... (1 Reply)
Discussion started by: evelibertine
1 Replies

8. Shell Programming and Scripting

Copy and Paste Columns in a Tab-Limited Text file

I have this text file with a very large number of columns (10,000+) and I want to move the first column to the position of the six column so that the text file looks like this: Before cutting and pasting ID Family Mother Father Trait Phenotype aaa bbb ... (5 Replies)
Discussion started by: evelibertine
5 Replies

9. UNIX for Dummies Questions & Answers

Merging two files based on two columns to make a third file

Hi there, I'm trying to merge two files and make a third file. However, two of the columns need to match exactly in both files AND I want everything from both files in the output if the two columns match in that row. First file looks like this: chr1 10001980 T A Second... (12 Replies)
Discussion started by: infiniteabyss
12 Replies
Login or Register to Ask a Question