Sponsored Content
Top Forums Shell Programming and Scripting Filtering first file columns based on second file column Post 302715853 by Don Cragun on Monday 15th of October 2012 01:55:13 PM
Old 10-15-2012
If you save the following in a file named dropheaders and make it executable, have a file named input that contains the data, and a file named exclude that contains the list of headers to skip:
Code:
#!/bin/ksh
# Usage: drophdrs [data [exclude]]
awk 'BEGIN{FS = OFS = ","}
dbg{    printf("FILENAME=%s, FNR=%d, NR=%d, NF=%d, $0=\"%s\"\n",
                FILENAME, FNR, NR, NF, $0)
}
FNR==1{ if(dbg) printf("%s file header with %d fields: %s\n", FILENAME, NF, $0)
        if(FNR==NR) {
                efn = FILENAME # Save filename of exclude file for diagnostics.
                next
        }
        # Determine which fields to skip from headers in the data file.
        for(i = 1; i <= NF; i++) if($i in skiphdr) {
                sf[i]
                delete skiphdr[$i]
                if(dbg) printf("Field %d added to sf[] for header %s.\n", i, $i)
        }
        first = 1
        for(i in skiphdr) {
                if(first) {
                        first = 0
                        printf("File %s will not be processed because:\n",
                                FILENAME)
                }
                printf("\theader \"%s\" in exclude file (%s) was not found\n",
                        i, efn, FILENAME)
        }
        if(first == 0) exit 1
}
FNR==NR{# gather names of columns to be skipped from exclude (1st) file
        skiphdr[$1]
        if(dbg) printf("%s added to skiphdr\n", $1)
        next
}
{       sep = ""
        for(i = 1; i <= NF; i++)
                if(!(i in sf)) {
                        printf("%s%s", sep, $i)
                        sep = OFS
                }
        printf("\n")
}' ${2:-exclude} ${1:-input}

should do what you want just by entering the command:
Code:
dropheaders

If your data and exclude files have different names, use:
Code:
dropheaders data_file_name exclude_file_name

This User Gave Thanks to Don Cragun For This Post:
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Filtering records of a file based on a value of a column

Hi all, I would like to extract records of a file based on a condition. The file contains 47 fields, and I would like to extract only those records that match a certain value in one of the columns, e.g. COL1 COL2 COL3 ............... COL47 1 XX 45 ... (4 Replies)
Discussion started by: risk_sly
4 Replies

2. Shell Programming and Scripting

filtering one file based on results from other

Can anybody help me with writing a script for the data that I want to use from one file based on the data from another file. I have file1 in this form; (the first field represents a well name and the second field represents the depth of interest) FILE1 -------- DATA_35_0 ... (2 Replies)
Discussion started by: digipak
2 Replies

3. Shell Programming and Scripting

filtering one file based on results from other- AGAIN

I have asked this question here before and got the answer too. Unfortunately I used only one record as an example and the script works fine for one record but not for more than one record. Can anybody help me with writing a script for the data that I want to use from one file based on the... (13 Replies)
Discussion started by: digipak
13 Replies

4. Shell Programming and Scripting

Filtering issues with multiple columns in a single file

Hi, I am new to unix and would greatly appreciate some help. I have a file containing multiple colums containing different sets of data e.g. File 1: John Ireland 27_December_69 Mary England 13_March_55 Mike France 02_June_80 I am currently using the awk... (10 Replies)
Discussion started by: crunchie
10 Replies

5. UNIX for Dummies Questions & Answers

Filtering records from 1 file based on some manipulation doen on second file

Hi, I am looking for an awk script which should help me to meet the following requirement: File1 has records in following format INF: FAILEd RECORD AB1234 INF: FAILEd RECORD PQ1145 INF: FAILEd RECORD AB3215 INF: FAILEd RECORD AB6114 ............................ (2 Replies)
Discussion started by: mintu41
2 Replies

6. Shell Programming and Scripting

Filtering lines for column elements based on corresponding counts in another column

Hi, I have a file like this ACC 2 2 21 aaa AC 443 3 22 aaa GCT 76 1 33 xxx TCG 34 2 33 aaa ACGT 33 1 22 ggg TTC 99 3 44 wee CCA 33 2 33 ggg AAC 1 3 55 ddd TTG 10 1 22 ddd TTGC 98 3 22 ddd GCT 23 1 21 sds GTC 23 4 32 sds ACGT 32 2 33 vvv CGT 11 2 33 eee CCC 87 2 44... (1 Reply)
Discussion started by: polsum
1 Replies

7. Linux

To get all the columns in a CSV file based on unique values of particular column

cat sample.csv ID,Name,no 1,AAA,1 2,BBB,1 3,AAA,1 4,BBB,1 cut -d',' -f2 sample.csv | sort | uniq this gives only the 2nd column values Name AAA BBB How to I get all the columns of CSV along with this? (1 Reply)
Discussion started by: sanvel
1 Replies

8. Shell Programming and Scripting

Replacing 12 columns of one file by second file based on mapping in third file

i have a real data prod file with 80+ fields containing 1k -2k records. i have to extract say 12 columns out of this which are sensitive fields along with one primary key say SEQ_ID (like DOB,account no, name, SEQ_ID, govtid etc) in a lookup file. i have to replace these sensitive fields in... (11 Replies)
Discussion started by: megh12
11 Replies

9. UNIX for Beginners Questions & Answers

Filtering based on column values

Hi there, I am trying to filter a big file with several columns using values on a column with values like (AC=5;AN=10;SF=341,377,517,643,662;VRT=1). I wont to filter the data based on SF= values that are (bigger than 400) ... (25 Replies)
Discussion started by: daashti
25 Replies

10. UNIX for Beginners Questions & Answers

Filtering records of a csv file based on a value of a column

Hi, I tried filtering the records in a csv file using "awk" command listed below. awk -F"~" '$4 ~ /Active/{print }' inputfile > outputfile The output always has all the entries. The same command worked for different users from one of the forum links. content of file I was... (3 Replies)
Discussion started by: sunilmudikonda
3 Replies
PYGETTEXT(1)						      General Commands Manual						      PYGETTEXT(1)

NAME
pygettext - Python equivalent of xgettext(1) SYNOPSIS
pygettext [OPTIONS] INPUTFILE ... DESCRIPTION
pygettext is deprecated. The current version of xgettext supports many languages, including Python. pygettext uses Python's standard tokenize module to scan Python source code, generating .pot files identical to what GNU xgettext generates for C and C++ code. From there, the standard GNU tools can be used. pygettext searches only for _() by default, even though GNU xgettext recognizes the following keywords: gettext, dgettext, dcgettext, and gettext_noop. See the -k/--keyword flag below for how to augment this. OPTIONS
-a, --extract-all Extract all strings. -d, --default-domain=NAME Rename the default output file from messages.pot to name.pot. -E, --escape Replace non-ASCII characters with octal escape sequences. -D, --docstrings Extract module, class, method, and function docstrings. These do not need to be wrapped in _() markers, and in fact cannot be for Python to consider them docstrings. (See also the -X option). -h, --help Print this help message and exit. -k, --keyword=WORD Keywords to look for in addition to the default set, which are: _ You can have multiple -k flags on the command line. -K, --no-default-keywords Disable the default set of keywords (see above). Any keywords explicitly added with the -k/--keyword option are still recognized. --no-location Do not write filename/lineno location comments. -n, --add-location Write filename/lineno location comments indicating where each extracted string is found in the source. These lines appear before each msgid. The style of comments is controlled by the -S/--style option. This is the default. -o, --output=FILENAME Rename the default output file from messages.pot to FILENAME. If FILENAME is `-' then the output is sent to standard out. -p, --output-dir=DIR Output files will be placed in directory DIR. -S, --style=STYLENAME Specify which style to use for location comments. Two styles are supported: o Solaris # File: filename, line: line-number o GNU #: filename:line The style name is case insensitive. GNU style is the default. -v, --verbose Print the names of the files being processed. -V, --version Print the version of pygettext and exit. -w, --width=COLUMNS Set width of output to columns. -x, --exclude-file=FILENAME Specify a file that contains a list of strings that are not be extracted from the input files. Each string to be excluded must appear on a line by itself in the file. -X, --no-docstrings=FILENAME Specify a file that contains a list of files (one per line) that should not have their docstrings extracted. This is only useful in conjunction with the -D option above. If `INPUTFILE' is -, standard input is read. BUGS
pygettext attempts to be option and feature compatible with GNU xgettext where ever possible. However some options are still missing or are not fully implemented. Also, xgettext's use of command line switches with option arguments is broken, and in these cases, pygettext just defines additional switches. AUTHOR
pygettext is written by Barry Warsaw <barry@zope.com>. Joonas Paalasmaa <joonas.paalasmaa@iki.fi> put this manual page together based on "pygettext --help". pygettext 1.4 PYGETTEXT(1)
All times are GMT -4. The time now is 05:51 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy