Filtering first file columns based on second file column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Filtering first file columns based on second file column
# 1  
Old 10-15-2012
Filtering first file columns based on second file column

Hi friends,

I have one file like below. (.csv type)
Code:
SNo,data1,data2
1,1,2
2,2,3
3,3,2

and another file like below.
Code:
Exclude
data1

where Exclude should be treated as column name in file2.
I want the output shown below.
Code:
SNo,data2
1,2
2,3
3,2

Where my data1 column got removed from my first file as it is mentioned in next file.
In reality i have 1000's of columns in my file 1 and want to remove some of them by keep updating in second file.
I got a one liner R code, but to load the first file and writing the result into another file is taking lot of time as well as more memory commit.

Regards
Sidda
# 2  
Old 10-15-2012
Not sure what you have tried...

The following will help find the column to be excluded:

Code:
$ echo sn,data1,data2 | tr "," "\n" | cat
sn
data1
data2

$ echo sn,data1,data2 | tr "," "\n" | cat -n
     1  sn
     2  data1
     3  data2

$ echo sn,data1,data2 | tr "," "\n" | cat -n | grep "data1"
     2  data1

$ echo sn,data1,data2 | tr "," "\n" | cat -n | grep "data1" | cut -f1
     2

Now, to exclude a column, you can see if your 'cut' command recognizes the --complement option. Something like:
Code:
cut -f2 -- complement sample1.txt

Or...
Code:
 awk '{$2=""; print}' sample1.txt

# 3  
Old 10-15-2012
If you save the following in a file named dropheaders and make it executable, have a file named input that contains the data, and a file named exclude that contains the list of headers to skip:
Code:
#!/bin/ksh
# Usage: drophdrs [data [exclude]]
awk 'BEGIN{FS = OFS = ","}
dbg{    printf("FILENAME=%s, FNR=%d, NR=%d, NF=%d, $0=\"%s\"\n",
                FILENAME, FNR, NR, NF, $0)
}
FNR==1{ if(dbg) printf("%s file header with %d fields: %s\n", FILENAME, NF, $0)
        if(FNR==NR) {
                efn = FILENAME # Save filename of exclude file for diagnostics.
                next
        }
        # Determine which fields to skip from headers in the data file.
        for(i = 1; i <= NF; i++) if($i in skiphdr) {
                sf[i]
                delete skiphdr[$i]
                if(dbg) printf("Field %d added to sf[] for header %s.\n", i, $i)
        }
        first = 1
        for(i in skiphdr) {
                if(first) {
                        first = 0
                        printf("File %s will not be processed because:\n",
                                FILENAME)
                }
                printf("\theader \"%s\" in exclude file (%s) was not found\n",
                        i, efn, FILENAME)
        }
        if(first == 0) exit 1
}
FNR==NR{# gather names of columns to be skipped from exclude (1st) file
        skiphdr[$1]
        if(dbg) printf("%s added to skiphdr\n", $1)
        next
}
{       sep = ""
        for(i = 1; i <= NF; i++)
                if(!(i in sf)) {
                        printf("%s%s", sep, $i)
                        sep = OFS
                }
        printf("\n")
}' ${2:-exclude} ${1:-input}

should do what you want just by entering the command:
Code:
dropheaders

If your data and exclude files have different names, use:
Code:
dropheaders data_file_name exclude_file_name

This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Filtering records of a csv file based on a value of a column

Hi, I tried filtering the records in a csv file using "awk" command listed below. awk -F"~" '$4 ~ /Active/{print }' inputfile > outputfile The output always has all the entries. The same command worked for different users from one of the forum links. content of file I was... (3 Replies)
Discussion started by: sunilmudikonda
3 Replies

2. UNIX for Beginners Questions & Answers

Filtering based on column values

Hi there, I am trying to filter a big file with several columns using values on a column with values like (AC=5;AN=10;SF=341,377,517,643,662;VRT=1). I wont to filter the data based on SF= values that are (bigger than 400) ... (25 Replies)
Discussion started by: daashti
25 Replies

3. Shell Programming and Scripting

Replacing 12 columns of one file by second file based on mapping in third file

i have a real data prod file with 80+ fields containing 1k -2k records. i have to extract say 12 columns out of this which are sensitive fields along with one primary key say SEQ_ID (like DOB,account no, name, SEQ_ID, govtid etc) in a lookup file. i have to replace these sensitive fields in... (11 Replies)
Discussion started by: megh12
11 Replies

4. Linux

To get all the columns in a CSV file based on unique values of particular column

cat sample.csv ID,Name,no 1,AAA,1 2,BBB,1 3,AAA,1 4,BBB,1 cut -d',' -f2 sample.csv | sort | uniq this gives only the 2nd column values Name AAA BBB How to I get all the columns of CSV along with this? (1 Reply)
Discussion started by: sanvel
1 Replies

5. Shell Programming and Scripting

Filtering lines for column elements based on corresponding counts in another column

Hi, I have a file like this ACC 2 2 21 aaa AC 443 3 22 aaa GCT 76 1 33 xxx TCG 34 2 33 aaa ACGT 33 1 22 ggg TTC 99 3 44 wee CCA 33 2 33 ggg AAC 1 3 55 ddd TTG 10 1 22 ddd TTGC 98 3 22 ddd GCT 23 1 21 sds GTC 23 4 32 sds ACGT 32 2 33 vvv CGT 11 2 33 eee CCC 87 2 44... (1 Reply)
Discussion started by: polsum
1 Replies

6. UNIX for Dummies Questions & Answers

Filtering records from 1 file based on some manipulation doen on second file

Hi, I am looking for an awk script which should help me to meet the following requirement: File1 has records in following format INF: FAILEd RECORD AB1234 INF: FAILEd RECORD PQ1145 INF: FAILEd RECORD AB3215 INF: FAILEd RECORD AB6114 ............................ (2 Replies)
Discussion started by: mintu41
2 Replies

7. Shell Programming and Scripting

Filtering issues with multiple columns in a single file

Hi, I am new to unix and would greatly appreciate some help. I have a file containing multiple colums containing different sets of data e.g. File 1: John Ireland 27_December_69 Mary England 13_March_55 Mike France 02_June_80 I am currently using the awk... (10 Replies)
Discussion started by: crunchie
10 Replies

8. Shell Programming and Scripting

filtering one file based on results from other- AGAIN

I have asked this question here before and got the answer too. Unfortunately I used only one record as an example and the script works fine for one record but not for more than one record. Can anybody help me with writing a script for the data that I want to use from one file based on the... (13 Replies)
Discussion started by: digipak
13 Replies

9. Shell Programming and Scripting

filtering one file based on results from other

Can anybody help me with writing a script for the data that I want to use from one file based on the data from another file. I have file1 in this form; (the first field represents a well name and the second field represents the depth of interest) FILE1 -------- DATA_35_0 ... (2 Replies)
Discussion started by: digipak
2 Replies

10. UNIX for Dummies Questions & Answers

Filtering records of a file based on a value of a column

Hi all, I would like to extract records of a file based on a condition. The file contains 47 fields, and I would like to extract only those records that match a certain value in one of the columns, e.g. COL1 COL2 COL3 ............... COL47 1 XX 45 ... (4 Replies)
Discussion started by: risk_sly
4 Replies
Login or Register to Ask a Question