Sponsored Content
Top Forums Shell Programming and Scripting Reading columns, making a new file using another as template Post 302606188 by Corona688 on Friday 9th of March 2012 07:56:26 PM
Old 03-09-2012
This is probably overkill, but I had 500 megabytes of extremely messy flatfiles to merge and sort. This ought to be reliable if not fast, tolerant of things like missing columns.

Code:
$ cat col.awk

# Set up input and output separators
BEGIN { FS=","  ;       OFS="," }

# First line in a file?  Figure out what our columns are.
FNR == 1 {
        if(NR==1) # Very first line in very first file
        {
                COLMAX=NF
                # Mark down the contents of all the columns
                for(N=1; N<=NF; N++)
                {
                        ORDER[N]=$N
                        ORDER[$N]=N
                        REORDER[N]=N
                }
                print # Print columns
                next # Go to next line
        }

        # First line in the second/third/fourth file?  Find out how we need to reorder.

        # Delete old columns
        for(X in REORDER)       delete REORDER[X];

        # Match field M against column N.
        for(N=1; N<=COLMAX; N++)
        {
                for(M=1; M<=NF; M++)
                if($M == ORDER[N])
                {
                        REORDER[N]=M;
                        break;
                }

                if(!REORDER[N])
                {
                        print "Couldn't find " ORDER[M] " in " FILENAME >"/dev/stderr";
                        REORDER[N]=NF+10;
                }
        }

        # Only print the first line of columns
        if(NR != 1)     next;
}

# Reorder all input
{
        split($0, ZZT, FS);

        PFIX=""
        STR=""
        for(N=1; N<=COLMAX; N++)
        {
                STR=STR PFIX ZZT[REORDER[N]];
                PFIX=","
        }

        $0=STR
}

1 # Print all other lines

$ awk -f col.awk columnfile data

interacao,AspAsp,AspCys,CysAsp,CysCys,classe
beta_alfa,DA,DD,CA,CD,ppi

$

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

gawk - reading two files & re arrange the columns

Hi, I am trying to read 2 files and writing to the 3rd file if I find the same elements in 2 files. my first file is 1 0 kb12124819 766409 1.586e-01 1 0 kb17160939 773886 8.674e-01 1 0 kb4475691 836671 8.142e-01 1 0 ... (2 Replies)
Discussion started by: ezhil01
2 Replies

2. Shell Programming and Scripting

ksh questions - Reading columns and lines on unix

1-) For the command below, I want to read second column: 32751. How will I get it ? $ ps -ef|grep deneme U00 32751 22745 0 16:30 pts/1 00:00:00 ksh deneme U00 32762 32132 0 16:30 pts/2 00:00:00 grep deneme 2-) For the command below, how will I read all lines line by line? For... (1 Reply)
Discussion started by: senem
1 Replies

3. Shell Programming and Scripting

Reading columns in tab delimited file

I want to read only one column in "|" delimited file and write that column to a new file. For Ex: Input File 1|abc|324|tt 2|efd|11|cbcb 3||1|fg 4|ert|23|88 Output : I want to read column 3 in diff file. 324 11 1 88 Can anyone give me inputs on this ? (2 Replies)
Discussion started by: net
2 Replies

4. Shell Programming and Scripting

Compare columns and rows with template, and fill empty slots.

Hi, I'm working on a script that will take the contents of a file, that is in a row and column format, and compare it to a arrangment file. Such that if there is any or all blanks in my content file, the blank will be filled with a flag and will retain the row and column configuration. Ex. ... (2 Replies)
Discussion started by: hizzle
2 Replies

5. Shell Programming and Scripting

Creating a larger .xml file from a template(sample file)

Dear All, I have a template xml file like below. ....Some---Header....... <SignalPreference> ... <SignalName>STRING</SignalName> ... </SignalPreference> ......Some formatting text....... <SignalPreference> ......... ... (3 Replies)
Discussion started by: ks_reddy
3 Replies

6. Programming

Reading multiple columns in C++

Dear all, I am novice in C+= programing. I would like to seek help in one of the progra. Here it is, I have txt file which has the data as following order varA varB -21 0 -21.2 3, 4, 5, 6 -21.4 45, 65, 87, 98, 98 -22.0 345677, 349887, 98766, 877654, 987543 -23.0 76549,... (17 Replies)
Discussion started by: emily
17 Replies

7. Shell Programming and Scripting

Reading columns from a text file and to make an array for each column

Hi, I am not so familiar with bash scripting and would appreciate your help here. I have a text file 'input.txt' like this: 2 3 4 5 6 7 8 9 10 I want to store each column in an array like this a ={2 5 8}, b={3 6 9}, c={4 7 10} so that i can access any element, e.g b=6 for the later use. (1 Reply)
Discussion started by: Asif Siddique
1 Replies

8. Shell Programming and Scripting

Making a composite file of transposed columns

Hello, I have a directory with allot of tab delimited text files that have data that look like, filePath distance (1,4-dioxan-2-ylmethyl)methylamine 0.0 4-methylmorpholine 0.0755473632594 1-propyl-4-piperidone 0.157792911954 heptaminol 0.158142893249 N-acetylputrescine 0.158689628956... (3 Replies)
Discussion started by: LMHmedchem
3 Replies

9. Shell Programming and Scripting

Reading specific range of columns in an Excel file

Hi All, I want to read an excel file. PFA excel, I want to read the cloumn from A to G and the V to AH starting from Row number 3. Please help me on this. (7 Replies)
Discussion started by: Abhisrajput
7 Replies

10. Shell Programming and Scripting

Reading columns using arrays

Hello, Please help in how to read rows and columns using array and print them. I have below output and i want to store this in array and print the required rows or columns. aaaaaaa 123 bbbbbb 456 ccccccc 888 Use code tags, thanks. (1 Reply)
Discussion started by: Cva2568
1 Replies
RS(1)							    BSD General Commands Manual 						     RS(1)

NAME
rs -- reshape a data array SYNOPSIS
rs [-CcSs[x]] [-GgKkw N] [-EeHhjmnTtyz] [rows [cols]] DESCRIPTION
rs reads the standard input, interpreting each line as a row of blank-separated entries in an array, transforms the array according to the options, and writes it on the standard output. With no arguments it transforms stream input into a columnar format convenient for terminal viewing. The shape of the input array is deduced from the number of lines and the number of columns on the first line. If that shape is inconvenient, a more useful one might be obtained by skipping some of the input with the -k option. Other options control interpretation of the input col- umns. The shape of the output array is influenced by the rows and cols specifications, which should be positive integers. If only one of them is a positive integer, rs computes a value for the other which will accommodate all of the data. When necessary, missing data are supplied in a manner specified by the options and surplus data are deleted. There are options to control presentation of the output columns, including transposition of the rows and columns. The options are as follows: -C[x] Output columns are delimited by the single character x. A missing x is taken to be '^I'. -c[x] Input columns are delimited by the single character x. A missing x is taken to be '^I'. -E Consider each character of input as an array entry. -e Consider each line of input as an array entry. -GN The gutter width has N percent of the maximum column width added to it. -gN The gutter width (inter-column space), normally 2, is taken to be N. -H Like -h, but also print the length of each line. -h Print the shape of the input array and do nothing else. The shape is just the number of lines and the number of entries on the first line. -j Right adjust entries within columns. -KN Like -k, but print the ignored lines. -kN Ignore the first N lines of input. -m Do not trim excess delimiters from the ends of the output array. -n On lines having fewer entries than the first line, use null entries to pad out the line. Normally, missing entries are taken from the next line of input. -S[x] Like -C, but padded strings of x are delimiters. -s[x] Like -c, but maximal strings of x are delimiters. -T Print the pure transpose of the input, ignoring any rows or cols specification. -t Fill in the rows of the output array using the columns of the input array, that is, transpose the input while honoring any rows and cols specifications. -wN The width of the display, normally 80, is taken to be the positive integer N. -y If there are too few entries to make up the output dimensions, pad the output by recycling the input from the beginning. Normally, the output is padded with blanks. -z Shrink column widths to fit the largest entries appearing in them. With no arguments, rs transposes its input, and assumes one array entry per input line unless the first non-ignored line is longer than the display width. Option letters which take numerical arguments interpret a missing number as zero unless otherwise indicated. EXAMPLES
rs can be used as a filter to convert the stream output of certain programs (e.g., spell, du, file, look, nm, who, and wc(1)) into a conve- nient ``window'' format, as in $ who | rs This function has been incorporated into the ls(1) program, though for most programs with similar output rs suffices. To convert stream input into vector output and back again, use $ rs 1 0 | rs 0 1 A 10 by 10 array of random numbers from 1 to 100 and its transpose can be generated with $ jot -r 100 | rs 10 10 | tee array | rs -T >tarray In the editor vi(1), a file consisting of a multi-line vector with 9 elements per line can undergo insertions and deletions, and then be neatly reshaped into 9 columns with :1,$!rs 0 9 Finally, to sort a database by the first line of each 4-line field, try $ rs -eC 0 4 | sort | rs -c 0 1 SEE ALSO
jot(1), pr(1), sort(1), vi(1) BUGS
Handles only two dimensional arrays. The algorithm currently reads the whole file into memory, so files that do not fit in memory will not be reshaped. Fields cannot be defined yet on character positions. Re-ordering of columns is not yet possible. There are too many options. BSD
April 14, 2012 BSD
All times are GMT -4. The time now is 08:57 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy