awk script to (un)/concatenate fields in file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk script to (un)/concatenate fields in file
# 1  
Old 03-22-2010
awk script to (un)/concatenate fields in file

Hi everyone,

I'm trying to use the "join" function for more than 1 field. Since it's not possible as it is, I want to take my input files and concatenate the joining fields as 1 field (separated by "|"). I wrote 2 awk script to do and undo it (see below). However I'm new to awk and I'm certain I could do it in a much more efficient way.

I found various topics around the question but often the syntax proposed is a bit of a mystery to me. For instance someone posted this:

BEGIN{FS=OFS="\t"}NR==FNR{a[$1$2]=$4;b[$1$2]=$5;c[$1$2]=$6;next}{$4=$4-a[$1$2];$5=$5-b[$1$2];$6=$6-c[$1$2]}1

what does the trailing '1' mean? what are there 2 separated {} and what distinguish them? finally, where can I find doc on that kind of questions (googling "awk trailing digit" didn't help me much!!)

Here are my scripts, I don't care much about syntax shortcuts, I only care about speed of execution!

any help would be greatly appreciated Smilie

to concatenate:

Code:
#!/bin/sh
#
# usage:
#     nawk -F$'\t' -v JF=3,5 -f concatene.awk ~/tmp/tmp15
#     nawk -F$'\t' -v JF=15,16,17,18 -f concatene.awk split/snp_j > concat
#
# JF stands for "join fields"
BEGIN { FS="\t";OFS="\t" }
{ 
    if (NR==1) {    # to do it only once (NR starts at 1)
        N=split(JF,JFS,",");
        for (i=1;i<=N;i++) {    # reverse it
            RJFS[JFS[i]] = i;
        }
    }

    LINE="";
    for (FIELD_INDEX=1 ; FIELD_INDEX<=N ; FIELD_INDEX++ ) {
        LINE=(FIELD_INDEX==1 ? "" : LINE"|")$JFS[FIELD_INDEX];
    }
    for (FIELD_INDEX=1 ; FIELD_INDEX<=NF ; FIELD_INDEX++ ) {
        if (!RJFS[FIELD_INDEX]) {
            LINE=LINE"\t"$FIELD_INDEX;
        }
    }
    print LINE;
}

example:
input: a b c d e f
output: c|e a b d f

to "un"concatenate:

Code:
#!/bin/sh
# nawk -F$'\t' -v JF=3,5 -f unconcatene.awk test
BEGIN { FS="\t";OFS="\t" }
{ 
    if (NR==1) {    # to do it only once (NR starts at 1)
        N=split(JF,JFS,",");
        for (i=1;i<=N;i++) {    # reverse it
            RJFS[JFS[i]] = i;
        }
    }

    N2=split($1,JFS2,"|");    # N=N2
    for (i=1;i<=N;i++) {    # reverse it
        RJFS[JFS[i]] = JFS2[i];
    }

    SIZE=NF-1+N;
    FIELD_INDEX=2;
    LINE="";
    for (NEW_FIELD_INDEX=1 ; NEW_FIELD_INDEX<=SIZE ; NEW_FIELD_INDEX++ ) {
        LINE=LINE(NEW_FIELD_INDEX==1 ? "" : "\t");
        if (RJFS[NEW_FIELD_INDEX]) {
            LINE=(LINE)RJFS[NEW_FIELD_INDEX];
        } else {
            LINE=(LINE)$FIELD_INDEX;        
            FIELD_INDEX++;
        }
    }
    print LINE;
}

Thanks!!

example:
input: c|e a b d f
output: a b c d e f

Anthony
# 2  
Old 03-23-2010
Can you explain a little bit more on how you want to get this..

Code:
input: a b c d e f
output: c|e a b d f

# 3  
Old 03-23-2010
Hi,

First thanks for responding!
I'm not sure what you mean by "how" I want to get this but I'll give you a more thorough example:

I have this file for instance (TSV):

Code:
a    b    c    d    e    f
g    h    i    j    k    l
m    n    o    p    q    r

And say I want to join fields 2, 3 and 6 with 3 columns of another file. Because join uses only 1 field, I want to put the fields 2, 3 and 6 together separated by only pipe (as opposed to my other fields separated with tabs). So the result of the concatene.awk script will give me the following:

Code:
b|c|f    a    d    e
h|i|l    g    j    k
n|o|r    m    p    q

to do so in the current script, I pass "2,3,6" as a parameter and for each line create two arrays like:
(example for the first line only)
JFS[0] = b, JFS[1] = c, JFS[2] = f
RJFS[2] = b, RJFS[3] = c, RJFS[6] = f
from there I rebuild my line by first going through JFS with a pipe separation, then adding the other fields with a tab separation by going through the NF fields and ignoring the ones for which RJFS[field] exist.

Hope this makes more sense! I bet there is a way to do it in a much more optimized way though..!
# 4  
Old 03-23-2010
Can you post some lines of your input files and the desired output?
# 5  
Old 03-23-2010
Code:
nawk -f anthony.awk myFile
OR
nawk -v jf='2,3,6' -f anthony.awk myFile

anthony.awk:
Code:
BEGIN {
  FS=OFS="\t"

  SEP_jf="|"
  if (!jf) jf="3,5"

  n=split(jf, jfA, ",")
  for(i=1;i<=n;i++)
    jfO[jfA[i]]
}
{
  line=jfS=""
  for(i=1;i<=NF;i++)
    if (i in jfO)
       jfS=(jfS)?jfS SEP_jf $i: $i
    else
      line=(line)?line OFS $i:$i
  print jfS, line
}

# 6  
Old 03-23-2010
Thanks a lot for your response,

i see that using BEGIN is cleaner than my "if (NR==1)" and that "if (i in JFO)" exists is good to know!! Smilie

Franklin, my post from 9:55 describes it pretty well, what info are you missing?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Concatenate a string and number and compare that with another string in awk script

I have below code inside my awk script if ( $0 ~ /SVC IN:/ ) { svc_in=substr( $0,23 , 3); if (msg_start == 1 && msg_end == 0) { msg_arr=$0; } } else if ( $0 ~ /^SVC OUT:/ ) { svc_out=substr( $0, 9, 3); if (msg_start == 1 && msg_end == 0) ... (6 Replies)
Discussion started by: bhagya123
6 Replies

2. Shell Programming and Scripting

awk script concatenate two column and perform mutiplication

Need your help in solving this puzzle. Any kind of help will be appreciated and link for any documents to read and learn and to deal with such scenarios would be helpful Concatenate column1 and column2 of file 1. Then check for the concatenated value in Column1 of File2. If found extract the... (14 Replies)
Discussion started by: as7951
14 Replies

3. Shell Programming and Scripting

XML Fields comparison using awk script

Hello All, I have many zipped XMLs (example file name in tgz formate - file_rec.trx.2016-01-23.000123.exc.85sesdzd45wsds5299c8f2994f7.tgz) looks following and I need to verify two numbers, they are RecordNumber and EnrolData (only sequence number, NOT hole). for all the records, both should be... (5 Replies)
Discussion started by: VasuKukkapalli
5 Replies

4. Shell Programming and Scripting

awk script to parse case with information in two fields of file

The below awk parser works for most data inputs, but I am having trouble with the last one. The problem is in the below rules steps 1 and 2 come from $2 (NC_000013.10:g.20763686_20763687delinsA) and steps 3 and 4 come from $1 (NM_004004.5:c.34_35delGGinsT). Parse Rules: The header is... (0 Replies)
Discussion started by: cmccabe
0 Replies

5. Shell Programming and Scripting

How to get fields and get output with awk or shell script.?

I have a flat file A.txt with field seperate by a pipe 2012/11/13 20:06:11 | 284:hawk pid=014268 opened Locations 12, 13, 14, 15 for /home/hawk_t112/t112/macteam/qt/NET12/full_ddr3_2X_FV_4BD_1.qt/dbFiles/t112.proto|2012/11/14 15:19:26 | still running |norway|norway 2012/11/14 12:53:51 | ... (6 Replies)
Discussion started by: sabercats
6 Replies

6. Shell Programming and Scripting

Comparing two csv file fields using awk script

Hi All, I want to remove the rows from File1.csv by comparing the columns/fields in the File2.csv. I only need the records whose first column is same and the second column is different for the same record in both files.Here is an example on what I need. File1.csv: RAJAK|ACTIVE|1... (2 Replies)
Discussion started by: rajak.net
2 Replies

7. UNIX for Advanced & Expert Users

Concatenate lines in file shell script

Hi colleagues, I have a file in this format. "/cccc/pppp/dddd/ggg/prueba.txt". ERROR" THE error bbbbbbbbbb finish rows. "/kkkk/mmmm/hhhh/jjj/ejemplo.txt". ERROR This is other error rows.I need my file in this format. "/cccc/pppp/dddd/ggg/prueba.txt". ERROR" THE error bbbbbbbbbb finish rows.... (3 Replies)
Discussion started by: systemoper
3 Replies

8. Shell Programming and Scripting

Need awk script to compare 2 fields in fixed length file.

Need a script that manipulates a fixed length file that will compare 2 fields in that file and if they are equal write that line to a new file. i.e. If fields 87-93 = fields 119-125, then write the entire line to a new file. Do this for every line in the file. After we get only the fields... (1 Reply)
Discussion started by: Muga801
1 Replies

9. Shell Programming and Scripting

Get 4 character each from 2 different fields concatenate and add as a new field

Hi, I have a huge text file. It looks like abcde bangalo country 12345 lastfield i want to get first 3 characters from field1 and first 3 characters from field 2 and insert the result as a new field. example the result should be: abcde bangalo abcban country 12345 lastfield Please... (4 Replies)
Discussion started by: ajithshankar@ho
4 Replies

10. Shell Programming and Scripting

awk sed cut? to rearrange random number of fields into 3 fields

I'm working on formatting some attendance data to meet a vendors requirements to upload to their system. With some help on the forums here, I have the data close. But they've since changed what they want. The vendor wants me to submit three fields to them. Field 1 is the studentid field,... (4 Replies)
Discussion started by: axo959
4 Replies
Login or Register to Ask a Question