Filtering first file columns based on second file column

10-15-2012

Registered User

100, 0

Join Date: Feb 2009

Last Activity: 7 November 2016, 6:38 AM EST

Posts: 100

Thanks Given: 19

Thanked 0 Times in 0 Posts

Filtering first file columns based on second file column

Hi friends,

I have one file like below. (.csv type)

Code:

SNo,data1,data2
1,1,2
2,2,3
3,3,2

and another file like below.

Code:

Exclude
data1

where Exclude should be treated as column name in file2.
I want the output shown below.

Code:

SNo,data2
1,2
2,3
3,2

Where my data1 column got removed from my first file as it is mentioned in next file.
In reality i have 1000's of columns in my file 1 and want to remove some of them by keep updating in second file.
I got a one liner R code, but to load the first file and writing the result into another file is taking lot of time as well as more memory commit.

Regards
Sidda

ks_reddy

View Public Profile for ks_reddy

Find all posts by ks_reddy

10-15-2012

Registered User

2,524, 241

Join Date: Dec 2007

Last Activity: 17 March 2020, 2:04 PM EDT

Posts: 2,524

Thanks Given: 173

Thanked 241 Times in 206 Posts

Not sure what you have tried...

The following will help find the column to be excluded:

Code:

$ echo sn,data1,data2 | tr "," "\n" | cat
sn
data1
data2

$ echo sn,data1,data2 | tr "," "\n" | cat -n
     1  sn
     2  data1
     3  data2

$ echo sn,data1,data2 | tr "," "\n" | cat -n | grep "data1"
     2  data1

$ echo sn,data1,data2 | tr "," "\n" | cat -n | grep "data1" | cut -f1
     2

Now, to exclude a column, you can see if your 'cut' command recognizes the --complement option. Something like:

Code:

cut -f2 -- complement sample1.txt

Or...

Code:

 awk '{$2=""; print}' sample1.txt

joeyg

View Public Profile for joeyg

Find all posts by joeyg

10-15-2012

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

If you save the following in a file named dropheaders and make it executable, have a file named input that contains the data, and a file named exclude that contains the list of headers to skip:

Code:

#!/bin/ksh
# Usage: drophdrs [data [exclude]]
awk 'BEGIN{FS = OFS = ","}
dbg{    printf("FILENAME=%s, FNR=%d, NR=%d, NF=%d, $0=\"%s\"\n",
                FILENAME, FNR, NR, NF, $0)
}
FNR==1{ if(dbg) printf("%s file header with %d fields: %s\n", FILENAME, NF, $0)
        if(FNR==NR) {
                efn = FILENAME # Save filename of exclude file for diagnostics.
                next
        }
        # Determine which fields to skip from headers in the data file.
        for(i = 1; i <= NF; i++) if($i in skiphdr) {
                sf[i]
                delete skiphdr[$i]
                if(dbg) printf("Field %d added to sf[] for header %s.\n", i, $i)
        }
        first = 1
        for(i in skiphdr) {
                if(first) {
                        first = 0
                        printf("File %s will not be processed because:\n",
                                FILENAME)
                }
                printf("\theader \"%s\" in exclude file (%s) was not found\n",
                        i, efn, FILENAME)
        }
        if(first == 0) exit 1
}
FNR==NR{# gather names of columns to be skipped from exclude (1st) file
        skiphdr[$1]
        if(dbg) printf("%s added to skiphdr\n", $1)
        next
}
{       sep = ""
        for(i = 1; i <= NF; i++)
                if(!(i in sf)) {
                        printf("%s%s", sep, $i)
                        sep = OFS
                }
        printf("\n")
}' ${2:-exclude} ${1:-input}

should do what you want just by entering the command:

Code:

dropheaders

If your data and exclude files have different names, use:

Code:

dropheaders data_file_name exclude_file_name

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

Shell Programming and Scripting

Filtering first file columns based on second file column

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Filtering records of a csv file based on a value of a column

Discussion started by: sunilmudikonda

2. UNIX for Beginners Questions & Answers

Filtering based on column values

Discussion started by: daashti

3. Shell Programming and Scripting

Replacing 12 columns of one file by second file based on mapping in third file

Discussion started by: megh12

4. Linux

To get all the columns in a CSV file based on unique values of particular column

Discussion started by: sanvel

5. Shell Programming and Scripting

Filtering lines for column elements based on corresponding counts in another column

Discussion started by: polsum

6. UNIX for Dummies Questions & Answers

Filtering records from 1 file based on some manipulation doen on second file

Discussion started by: mintu41

7. Shell Programming and Scripting

Filtering issues with multiple columns in a single file

Discussion started by: crunchie

8. Shell Programming and Scripting

filtering one file based on results from other- AGAIN

Discussion started by: digipak

9. Shell Programming and Scripting

filtering one file based on results from other

Discussion started by: digipak

10. UNIX for Dummies Questions & Answers

Filtering records of a file based on a value of a column

Discussion started by: risk_sly