Make copy of text file with columns removed (based on header)

01-21-2015

Registered User

362, 16

Join Date: Mar 2010

Last Activity: 3 March 2020, 10:38 PM EST

Location: Boston

Posts: 362

Thanks Given: 193

Thanked 16 Times in 15 Posts

Quote:

Originally Posted by RudiC

Due to missing samples, the assumption was every record is spread over three lines, so the relevant values had to be removed in the third lines...

Just remove the !(NR%3) to remove the columns in every line.

Yes, I didn't provide quite enough information there.

I have tested a bit and this doesn't appear to remove the column from the first two rows. From row 3 to the end, it looks fine. I am trying to remove the entire column based on the value of row 3. Removal also includes the corresponding column in the first two rows.

For this example,

Code:

awk 'NR==3          {MX=split (RM, T, " ")
                     for (i=1; i<=NF; i++)
                         for (n=1; n<=MX; n++)
                            if ($i==T[n]) T[n]=i
                    }
                    {for (n=1; n<=MX; n++) $(T[n])=""
                     $0=$0; $1=$1
                    }
     1
    ' FS="\t+" OFS="\t" RM="AtR_Ptb_L" $BASE_INPUT_FILE > $REVISED_FILE

The input file $BASE_INPUT_FILE has 58 columns. The output file $REVISED_FILEhas 57 columns for row 3 to the end, but the first two rows still have 58 columns. Is the issue that there is no match found until the 3rd row, so the first two rows are printed as is?

I will need to use a bash variable to pass in a value for RM. This is looping and the value of RM will be changing. It could be a single value or several. Passing a bash array there like RM="${LIST_TO_REMOVE[@]}" seems to work for a single element, but seems to be a problem with more than one element on the list. If I convert the array to a space delimited string, then it works for one variable or more than one. What do you think the best method is here?

LMHmedchem

LMHmedchem

View Public Profile for LMHmedchem

Find all posts by LMHmedchem

01-22-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

On a recent Apple MacBook Pro, the following script:

Code:

#!/bin/ksh
if [ $# -gt 0 ]
then	rem_list="$@"
else	rem_list="dxv1 k2"
fi
awk -v del="$rem_list" '
BEGIN {	# Split del string into remhdr[] (indexed 1..nrem)...
	nrem = split(del, remhdr)
	# Create rem[] (indexed by titles of fields to be removed).
	for(i = 1; i <= nrem; i++) {
		rem[remhdr[i]]
		delete remhdr[i]
	}
	# Set OFS
	OFS = "\t"
}
NR <= 3 {
	# Read header lines into hdr[]...
	for(i = 1; i <= NF; i++)
		hdr[NR, i] = $i
	if(NR == 3) {
		# Create array of output fields to delete: od[]...
		for(i = 1; i <= NF; i++)
			if(hdr[3, i] in rem) {
				od[i]
				odc++
				delete rem[hdr[3,i]]
				nrem--
			}
		if(nrem) {
			for(i in rem)
				printf("*Field heading \"%s\" not found.*\n", i)
			printf("** Processing aborted. **\n");
			exit 1
		}
		# Print updated headers...
		for(i = 1; i <= 3; i++) {
			oc = NF - odc
			for(j = 1; j <= NF; j++)
				if(!(j in od))
					printf("%s%s", hdr[i, j],
						(--oc) ? OFS : ORS)
		}
	}
	next
}
{	# Print data lines...
	for(i in od)
		$i = ""
	$0 = $0
	$1 = $1
	print
}' original_f0_RSV_1912_A_S1v6_RI7_1916_15-01-10.txt

when invoked with no arguments or with the arguments dvx1 and k2 (in either order) with the files you uploaded in post #4 in this thread, produces output identical to the contents of the file intended_f0_RSV_1912_A_S1v6_RI7_1916_15-01-10.txt and the longest runtime from timing that script ten times when the script output is redirected to a regular file was:

Code:

real	0m0.07s
user	0m0.07s
sys	0m0.01s

and the fastest was:

Code:

real	0m0.06s
user	0m0.06s
sys	0m0.00s

As always, is you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk.

If this is part of a larger script that puts the header strings to be deleted in an array as you did in your last post, change the first few lines:

Code:

if [ $# -gt 0 ]
then	rem_list="$@"
else	rem_list="dxv1 k2"
fi
awk -v del="$rem_list" '

to:

Code:

awk -v del="${LIST_TO_REMOVE[@]}" '

or modify the above awk script to read your list file and your data file.

Although tested using ksh (a version of ksh93 on OS X), this will work with recent versions of both bash and ksh.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

01-22-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Adapted to your revised spec:

Code:

RM="RIexp dxv1 k2 THBint5"
awk     'function prep()
                        {for (n=1; n<=MX; n++) $(T[n])=""
                         $0=$0; $1=$1
                         print
                        }
         NR<3           {TMP[NR]=$0; next}
         NR==3          {MX=split (RM, T, " ")
                         for (i=1; i<=NF; i++)
                             for (n=1; n<=MX; n++)
                                 if ($i==T[n]) T[n]=i
                         SV=$0
                         for (j=1; j<3; j++)
                                {$0=TMP[j]
                                 prep()
                                }
                         $0=SV
                        }
                        {prep()}
        ' FS="\t+" OFS="\t" RM="$RM" /tmp/o.txt

Should work with arrays as well.

RudiC

View Public Profile for RudiC

Find all posts by RudiC

Shell Programming and Scripting

Make copy of text file with columns removed (based on header)

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find header in a text file and prepend it to all lines until another header is found

Discussion started by: verdepollo

2. Shell Programming and Scripting

Find columns in a file based on header and print to new file

Discussion started by: LMHmedchem

3. UNIX for Beginners Questions & Answers

Keep only columns in first two rows based on partial header pattern.

Discussion started by: aachave1

4. Emergency UNIX and Linux Support

Average columns based on header name

Discussion started by: jacobs.smith

5. Shell Programming and Scripting

Extract columns based on header

Discussion started by: aec

6. Shell Programming and Scripting

Reading columns from a text file and to make an array for each column

Discussion started by: Asif Siddique

7. UNIX for Dummies Questions & Answers

Extracting rows from a text file based on the values of two columns (given ranges)

Discussion started by: evelibertine

8. Shell Programming and Scripting

Copy and Paste Columns in a Tab-Limited Text file

Discussion started by: evelibertine

9. UNIX for Dummies Questions & Answers

Merging two files based on two columns to make a third file

Discussion started by: infiniteabyss