Filtering first file columns based on second file column Post: 302715853

Sponsored Content

Top Forums Shell Programming and Scripting Filtering first file columns based on second file column Post 302715853 by Don Cragun on Monday 15th of October 2012 01:55:13 PM

10-15-2012

Registered User

If you save the following in a file named dropheaders and make it executable, have a file named input that contains the data, and a file named exclude that contains the list of headers to skip:

Code:

#!/bin/ksh
# Usage: drophdrs [data [exclude]]
awk 'BEGIN{FS = OFS = ","}
dbg{    printf("FILENAME=%s, FNR=%d, NR=%d, NF=%d, $0=\"%s\"\n",
                FILENAME, FNR, NR, NF, $0)
}
FNR==1{ if(dbg) printf("%s file header with %d fields: %s\n", FILENAME, NF, $0)
        if(FNR==NR) {
                efn = FILENAME # Save filename of exclude file for diagnostics.
                next
        }
        # Determine which fields to skip from headers in the data file.
        for(i = 1; i <= NF; i++) if($i in skiphdr) {
                sf[i]
                delete skiphdr[$i]
                if(dbg) printf("Field %d added to sf[] for header %s.\n", i, $i)
        }
        first = 1
        for(i in skiphdr) {
                if(first) {
                        first = 0
                        printf("File %s will not be processed because:\n",
                                FILENAME)
                }
                printf("\theader \"%s\" in exclude file (%s) was not found\n",
                        i, efn, FILENAME)
        }
        if(first == 0) exit 1
}
FNR==NR{# gather names of columns to be skipped from exclude (1st) file
        skiphdr[$1]
        if(dbg) printf("%s added to skiphdr\n", $1)
        next
}
{       sep = ""
        for(i = 1; i <= NF; i++)
                if(!(i in sf)) {
                        printf("%s%s", sep, $i)
                        sep = OFS
                }
        printf("\n")
}' ${2:-exclude} ${1:-input}

should do what you want just by entering the command:

Code:

dropheaders

If your data and exclude files have different names, use:

Code:

dropheaders data_file_name exclude_file_name

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Filtering records of a file based on a value of a column

Hi all, I would like to extract records of a file based on a condition. The file contains 47 fields, and I would like to extract only those records that match a certain value in one of the columns, e.g. COL1 COL2 COL3 ............... COL47 1 XX 45 ...

2. Shell Programming and Scripting

filtering one file based on results from other

Can anybody help me with writing a script for the data that I want to use from one file based on the data from another file. I have file1 in this form; (the first field represents a well name and the second field represents the depth of interest) FILE1 -------- DATA_35_0 ...

3. Shell Programming and Scripting

filtering one file based on results from other- AGAIN

I have asked this question here before and got the answer too. Unfortunately I used only one record as an example and the script works fine for one record but not for more than one record. Can anybody help me with writing a script for the data that I want to use from one file based on the...

4. Shell Programming and Scripting

Filtering issues with multiple columns in a single file

Hi, I am new to unix and would greatly appreciate some help. I have a file containing multiple colums containing different sets of data e.g. File 1: John Ireland 27_December_69 Mary England 13_March_55 Mike France 02_June_80 I am currently using the awk...

5. UNIX for Dummies Questions & Answers

Filtering records from 1 file based on some manipulation doen on second file

Hi, I am looking for an awk script which should help me to meet the following requirement: File1 has records in following format INF: FAILEd RECORD AB1234 INF: FAILEd RECORD PQ1145 INF: FAILEd RECORD AB3215 INF: FAILEd RECORD AB6114 ............................

6. Shell Programming and Scripting

Filtering lines for column elements based on corresponding counts in another column

Hi, I have a file like this ACC 2 2 21 aaa AC 443 3 22 aaa GCT 76 1 33 xxx TCG 34 2 33 aaa ACGT 33 1 22 ggg TTC 99 3 44 wee CCA 33 2 33 ggg AAC 1 3 55 ddd TTG 10 1 22 ddd TTGC 98 3 22 ddd GCT 23 1 21 sds GTC 23 4 32 sds ACGT 32 2 33 vvv CGT 11 2 33 eee CCC 87 2 44...

7. Linux

To get all the columns in a CSV file based on unique values of particular column

cat sample.csv ID,Name,no 1,AAA,1 2,BBB,1 3,AAA,1 4,BBB,1 cut -d',' -f2 sample.csv | sort | uniq this gives only the 2nd column values Name AAA BBB How to I get all the columns of CSV along with this?

8. Shell Programming and Scripting

Replacing 12 columns of one file by second file based on mapping in third file

i have a real data prod file with 80+ fields containing 1k -2k records. i have to extract say 12 columns out of this which are sensitive fields along with one primary key say SEQ_ID (like DOB,account no, name, SEQ_ID, govtid etc) in a lookup file. i have to replace these sensitive fields in...

9. UNIX for Beginners Questions & Answers

Filtering based on column values

Hi there, I am trying to filter a big file with several columns using values on a column with values like (AC=5;AN=10;SF=341,377,517,643,662;VRT=1). I wont to filter the data based on SF= values that are (bigger than 400) ...

10. UNIX for Beginners Questions & Answers

Filtering records of a csv file based on a value of a column

Hi, I tried filtering the records in a csv file using "awk" command listed below. awk -F"~" '$4 ~ /Active/{print }' inputfile > outputfile The output always has all the entries. The same command worked for different users from one of the forum links. content of file I was...

LEARN ABOUT DEBIAN

pygettext2.6

PYGETTEXT(1)						      General Commands Manual						      PYGETTEXT(1)

NAME

       pygettext - Python equivalent of xgettext(1)

SYNOPSIS

       pygettext [OPTIONS] INPUTFILE ...

DESCRIPTION

       pygettext is deprecated. The current version of xgettext supports many languages, including Python.

       pygettext uses Python's standard tokenize module to scan Python source code, generating .pot files identical to what GNU xgettext generates
       for C and C++ code.  From there, the standard GNU tools can be used.

       pygettext searches only for _() by default, even though GNU xgettext recognizes the following keywords: gettext, dgettext,  dcgettext,  and
       gettext_noop. See the -k/--keyword flag below for how to augment this.

OPTIONS

       -a, --extract-all
	      Extract all strings.

       -d, --default-domain=NAME
	      Rename the default output file from messages.pot to name.pot.

       -E, --escape
	      Replace non-ASCII characters with octal escape sequences.

       -D, --docstrings
	      Extract  module,	class, method, and function docstrings.  These do not need to be wrapped in _() markers, and in fact cannot be for
	      Python to consider them docstrings. (See also the -X option).

       -h, --help
	      Print this help message and exit.

       -k, --keyword=WORD
	      Keywords to look for in addition to the default set, which are: _

	      You can have multiple -k flags on the command line.

       -K, --no-default-keywords
	      Disable the default set of keywords (see above).	Any keywords explicitly added with the -k/--keyword option are still recognized.

       --no-location
	      Do not write filename/lineno location comments.

       -n, --add-location
	      Write filename/lineno location comments indicating where each extracted string is found in the source.  These  lines  appear  before
	      each msgid.  The style of comments is controlled by the -S/--style option.  This is the default.

       -o, --output=FILENAME
	      Rename the default output file from messages.pot to FILENAME.  If FILENAME is `-' then the output is sent to standard out.

       -p, --output-dir=DIR
	      Output files will be placed in directory DIR.

       -S, --style=STYLENAME
	      Specify which style to use for location comments.  Two styles are supported:

	      o   Solaris   # File: filename, line: line-number

	      o   GNU	    #: filename:line

	      The style name is case insensitive.  GNU style is the default.

       -v, --verbose
	      Print the names of the files being processed.

       -V, --version
	      Print the version of pygettext and exit.

       -w, --width=COLUMNS
	      Set width of output to columns.

       -x, --exclude-file=FILENAME
	      Specify  a  file	that  contains	a list of strings that are not be extracted from the input files.  Each string to be excluded must
	      appear on a line by itself in the file.

       -X, --no-docstrings=FILENAME
	      Specify a file that contains a list of files (one per line) that should not have their docstrings extracted.  This is only useful in
	      conjunction with the -D option above.

       If `INPUTFILE' is -, standard input is read.

BUGS

       pygettext  attempts  to	be option and feature compatible with GNU xgettext where ever possible.  However some options are still missing or
       are not fully implemented.  Also, xgettext's use of command line switches with option arguments is broken, and in  these  cases,  pygettext
       just defines additional switches.

AUTHOR

       pygettext is written by Barry Warsaw <barry@zope.com>.

       Joonas Paalasmaa <joonas.paalasmaa@iki.fi> put this manual page together based on "pygettext --help".

pygettext 1.4															      PYGETTEXT(1)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Filtering records of a file based on a value of a column

Discussion started by: risk_sly

2. Shell Programming and Scripting

filtering one file based on results from other

Discussion started by: digipak

3. Shell Programming and Scripting

filtering one file based on results from other- AGAIN

Discussion started by: digipak

4. Shell Programming and Scripting

Filtering issues with multiple columns in a single file

Discussion started by: crunchie

5. UNIX for Dummies Questions & Answers

Filtering records from 1 file based on some manipulation doen on second file

Discussion started by: mintu41

6. Shell Programming and Scripting

Filtering lines for column elements based on corresponding counts in another column

Discussion started by: polsum

7. Linux

To get all the columns in a CSV file based on unique values of particular column

Discussion started by: sanvel

8. Shell Programming and Scripting

Replacing 12 columns of one file by second file based on mapping in third file

Discussion started by: megh12

9. UNIX for Beginners Questions & Answers

Filtering based on column values

Discussion started by: daashti

10. UNIX for Beginners Questions & Answers

Filtering records of a csv file based on a value of a column

Discussion started by: sunilmudikonda

LEARN ABOUT DEBIAN

pygettext2.6