Need to remove columns from file at run time

10-16-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by Amit Joshi

Already given answer of Don Cragun's questions.

Make it generic means same code should work for any file where as file structure may differ and column number which need to drop may differ.

Let's be clear about how you answered my questions...

I asked what operating system and shell you're using. If someone asked me that question, I would say Apple OS X El Capitan Version 10.11 and ksh version 93u+ 2012-08-01. Your answer: "I am using Unix as OS." tells us that you may be using one of hundreds of releases of AIX, HP/UX, OS X, Solaris, or of several older operating systems and no indication at all of what shell nor version of the shell that you're using. (But, at least we do know that you are not using a Linux distribution and you are not using a BSD-based system.)

And, I asked what have you tried to solve this problem? Where are you stuck? Your answer: "I just thought of making it generic so that it should work for any file just we need to pass column number which we need to drop. so I do not have idea on how to make it generic thats why I have posted here as experts are here." I interpret that to mean you haven't tried anything and you want us to do the work for you.

You said you used awk as suggested by sam05121988, but sam05121988 suggested using cut; not awk. Do you have awk code that does what you want to do for a non-generic list of fields to be removed from a non-generic file? If so show us that and we can help you modify it to work with any CSV file named at run-time to remove a list of fields specified at run time. Please just show us that you have made some attempt to produce some of the output you say you want from the sample input you have shown us.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10-16-2015

Registered User

18, 0

Join Date: Oct 2012

Last Activity: 10 November 2015, 6:44 AM EST

Posts: 18

Thanks Given: 6

Thanked 0 Times in 0 Posts

Apologies for this, Will take care in future.

Please find the details

Operating System

Code:

uname -a
Linux XXXXXXXX 2.6.32-220.25.1.el6.x86_64 #1 SMP Tue Aug 14 13:14:38 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux

Shell

Code:

echo $SHELL
/bin/ksh
echo $KSH_VERSION
Version JM 93t+ 2010-06-21

Code which I use to drop the column

Code:

cut -d, -f1 test1
awk -F, '{ print $1 }' test1
1
2
3
 
cut -d, -f1,3 test2
awk -F, '{ print $1 "," $3 }' test2
11,22
12,23

I am not able to make it generic because I am not sure about total number of column in file as it may vary so Just wanted to know is there any way or option in above command to mention specific column number in command to drop instead of giving column number to be printed.

I hope now I am clear with my words.

Amit Joshi

View Public Profile for Amit Joshi

Find all posts by Amit Joshi

10-16-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Perhaps something like this would be a good starting point:

Code:

#!/bin/ksh
IAm=${0##*/}
if [ $# -lt 2 ]
then	printf '%s: Not enough operands\n' "$IAm" >&2
	printf 'Usage: %s file field_to_delete...\n' "$IAm" >&2
	exit 1
fi
file="$1"
shift
printf '%s\n' "$@" | awk '
BEGIN {	FS = OFS = ","
}
FNR == NR {
	d[$1]
	next
}
{	ofs = ""
	for(i = 1; i <= NF; i++)
		if(!(i in d)) {
			printf("%s%s", ofs, $i)
			ofs = OFS
		}
	print ""
}' - "$file"

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10-16-2015

Registered User

85, 7

Join Date: Jul 2011

Last Activity: 26 May 2016, 12:05 AM EDT

Location: Pune, India

Posts: 85

Thanks Given: 12

Thanked 7 Times in 7 Posts

perl solution

Code:

$ cat test2
11,aa,22,ss
12,ss,23,ff
11,aa,22,ss
12,ss,23,ff

Script and usage

Code:

$ perl drop_columns.pl
Usage: drop_columns.pl -d:<delim> -f:fields -file:filename

Script Code

Code:

#!/usr/bin/perl -w

use strict;

unless(@ARGV){
print "Usage: $0 -d:<delim> -f:fields -file:filename\n";
exit 1;
}

my ($file) = grep { $_ =~ /-file/ } @ARGV;
($file) = (split(/:/,$file))[1];
$file =~ s/\"\'//g;

my ($delim) = grep { $_ =~ /-d/ } @ARGV;
($delim) = (split(/:/,$delim))[1];
$delim =~ s/\"\'//g;

my ($fields) = grep { $_ =~ /-f/ } @ARGV;
($fields) = (split(/:/,$fields))[1];
$fields =~ s/\"\'//g;
$fields =~ s/\W/|/g;

open( my $rfh, '<', "$file" ) || die "Can't read $file, $!";
while(<$rfh>){
        chomp;
        my @line = split(/$delim/,$_,-1);
        my $out = '';
        for ( my $i=0; $i<@line; $i++ ) {
                my $x = $i + 1;
                $out .= $line[$i].$delim        unless ( $x =~ /^($fields)$/ )
        }
        $out =~ s/$delim$/\n/;
        print "$out";
}

Output

Code:

$ perl drop_columns.pl -d:',' -f:2,4 -file:test2
11,22
12,23
11,22
12,23

njoy

sam05121988

View Public Profile for sam05121988

Find all posts by sam05121988

10-16-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

For the fun of it, a bashism:

Code:

OIFS=$IFS
IFS=","
FD=(2 4)
read -a ARR < test2
ARR[${#ARR[@]}]=XX 
for i in 0 ${FD[@]}; do unset ARR[$i]; done
cut -d, -f"${!ARR[*]}" test2
IFS=$OIFS

RudiC

View Public Profile for RudiC

Find all posts by RudiC

10-16-2015

Registered User

18, 0

Join Date: Oct 2012

Last Activity: 10 November 2015, 6:44 AM EST

Posts: 18

Thanks Given: 6

Thanked 0 Times in 0 Posts

Thanks Don

, Solution is working fine but Need clarification for below

IAm=${0##*/} - I have understand that it is something like $0 but here we are removing leading string if it matches with pattern */

printf '%s\n' "$@" - It is converting horizontal argument to verital(like Column to Rows) but It would be helpful if you can explain how awk works after passing these values to awk.

I have tried to understand with manual in filesystem but could not understand.

Amit Joshi

View Public Profile for Amit Joshi

Find all posts by Amit Joshi

10-16-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by Amit Joshi

Thanks Don Smilie

It looks like you're doing well.

The script sets IAm to the basename of the name of the script. The shell expands $0 to the pathname used to invoke your script. Let's assume your script is named removecols and is located in your $HOME/bin directory. If you invoke your command with the command line:

Code:

$HOME/bin/removecols myfile.csv 4 2

the script sets IAm by throwing away everything from the beginning of the string through the last / character leaving just removecols as the string to be substituted in the usage message. And, if $HOME/bin is in your PATH variable and you invoke your script using the command:

Code:

removecols file1 2

The shell sets $0 to removecols when it starts and IAm gets the same string since there are no / characters in the expansion of $0.

The shift command throws away the 1st positional parameter after file=$1 saves the name of the file to be processed that you passed to your script as the 1st positional parameter. So, after the shift, all of the remaining positional parameters are the numbers of the fields you want to delete. And "$@" expands to a list of each positional parameter as a quoted string with each parameter treated as a separate argument. So, for the 1st command-line shown above, the shell executes the command:
printf '%s\n' "4" "2"
which produces the output:

Code:

4
2

which is fed as standard input through the pipeline into the awk script. And the - as the 2nd operand passed to the awk script tells it to read its standard input as its 1st input file.
The awk code:

Code:

BEGIN {	FS = OFS = ","
}

tells awk to set the input field separator (FS) and the output field separator (OFS) to a comma before reading any lines from any input files.
The awk code:

Code:

FNR == NR {
	d[$1]
	next
}

says that for each line where the line # in the current input file (FNR) is the same as the line # of all lines read (NR) (which is true only for lines read from the 1st input file) create an array element with the value in the 1st field on that line as a subscript (d[$1]) and the awk next command tells the script to read the next input line and start again.

For lines read from the second input file (where FNR is not the same value as NR) we execute the remaining commands in the awk script:

Code:

{	ofs = ""
	for(i = 1; i <= NF; i++)
		if(!(i in d)) {
			printf("%s%s", ofs, $i)
			ofs = OFS
		}
	print ""
}

which sets ofs to an empty string (note that awk variable names are case sensitive). Then for each field in the current input line starting with the first field (i = 1) and continuing while i is less than the number of fields in that line (NF), and incrementing i by one (i++) each time through the loop, it checks to see if the field number is a subscript in the array containing being used as the list of fields to delete ((i in d)) and if it is not (!), then it prints the current value of ofs followed by the current field ($i) and then sets ofs to the current output field separator (OFS). (So the first field printed will not have a field separator printed before it and all fields printed after that will print a comma as a field separator. After the loop completes the awk command:

Code:

print ""

prints a trailing <newline> character.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

Shell Programming and Scripting

Need to remove columns from file at run time

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove blank columns from a tab delimited text file

Discussion started by: LMHmedchem

2. Shell Programming and Scripting

Remove Duplicates on multiple Key Columns and get the Latest Record from Date/Time Column

Discussion started by: vijaykodukula

3. Shell Programming and Scripting

Remove few columns from pipe delimited file

Discussion started by: greenworld123

4. Shell Programming and Scripting

How to take the file name in run time using shell.?

Discussion started by: praveen265

5. Shell Programming and Scripting

remove white space from specific columns in text file

Discussion started by: shelladdict

6. Shell Programming and Scripting

Remove duplicate columns in input file

Discussion started by: linux_usr

7. Shell Programming and Scripting

Search based on 1,2,4,5 columns and remove duplicates in the same file.

Discussion started by: onesuri

8. Shell Programming and Scripting

Single command for add 2 columns and remove 2 columns in unix/performance tuning

Discussion started by: onesuri

9. Shell Programming and Scripting

How to run this program with only one input file at a time

Discussion started by: Prat007

10. Shell Programming and Scripting

Remove lines, Sorted with Time based columns using AWK & SORT

Discussion started by: karthikn7974