I'm trying to use the "join" function for more than 1 field. Since it's not possible as it is, I want to take my input files and concatenate the joining fields as 1 field (separated by "|"). I wrote 2 awk script to do and undo it (see below). However I'm new to awk and I'm certain I could do it in a much more efficient way.
I found various topics around the question but often the syntax proposed is a bit of a mystery to me. For instance someone posted this:
what does the trailing '1' mean? what are there 2 separated {} and what distinguish them? finally, where can I find doc on that kind of questions (googling "awk trailing digit" didn't help me much!!)
Here are my scripts, I don't care much about syntax shortcuts, I only care about speed of execution!
any help would be greatly appreciated
to concatenate:
Code:
#!/bin/sh
#
# usage:
# nawk -F$'\t' -v JF=3,5 -f concatene.awk ~/tmp/tmp15
# nawk -F$'\t' -v JF=15,16,17,18 -f concatene.awk split/snp_j > concat
#
# JF stands for "join fields"
BEGIN { FS="\t";OFS="\t" }
{
if (NR==1) { # to do it only once (NR starts at 1)
N=split(JF,JFS,",");
for (i=1;i<=N;i++) { # reverse it
RJFS[JFS[i]] = i;
}
}
LINE="";
for (FIELD_INDEX=1 ; FIELD_INDEX<=N ; FIELD_INDEX++ ) {
LINE=(FIELD_INDEX==1 ? "" : LINE"|")$JFS[FIELD_INDEX];
}
for (FIELD_INDEX=1 ; FIELD_INDEX<=NF ; FIELD_INDEX++ ) {
if (!RJFS[FIELD_INDEX]) {
LINE=LINE"\t"$FIELD_INDEX;
}
}
print LINE;
}
example:
input: a b c d e f
output: c|e a b d f
to "un"concatenate:
Code:
#!/bin/sh
# nawk -F$'\t' -v JF=3,5 -f unconcatene.awk test
BEGIN { FS="\t";OFS="\t" }
{
if (NR==1) { # to do it only once (NR starts at 1)
N=split(JF,JFS,",");
for (i=1;i<=N;i++) { # reverse it
RJFS[JFS[i]] = i;
}
}
N2=split($1,JFS2,"|"); # N=N2
for (i=1;i<=N;i++) { # reverse it
RJFS[JFS[i]] = JFS2[i];
}
SIZE=NF-1+N;
FIELD_INDEX=2;
LINE="";
for (NEW_FIELD_INDEX=1 ; NEW_FIELD_INDEX<=SIZE ; NEW_FIELD_INDEX++ ) {
LINE=LINE(NEW_FIELD_INDEX==1 ? "" : "\t");
if (RJFS[NEW_FIELD_INDEX]) {
LINE=(LINE)RJFS[NEW_FIELD_INDEX];
} else {
LINE=(LINE)$FIELD_INDEX;
FIELD_INDEX++;
}
}
print LINE;
}
I'm working on formatting some attendance data to meet a vendors requirements to upload to their system. With some help on the forums here, I have the data close. But they've since changed what they want.
The vendor wants me to submit three fields to them. Field 1 is the studentid field,... (4 Replies)
Hi,
I have a huge text file. It looks like
abcde bangalo country 12345 lastfield
i want to get first 3 characters from field1 and first 3 characters from field 2 and insert the result as a new field. example the result should be:
abcde bangalo abcban country 12345 lastfield
Please... (4 Replies)
Need a script that manipulates a fixed length file that will compare 2 fields in that file and if they are equal write that line to a new file.
i.e. If fields 87-93 = fields 119-125, then write the entire line to a new file. Do this for every line in the file. After we get only the fields... (1 Reply)
Hi colleagues,
I have a file in this format.
"/cccc/pppp/dddd/ggg/prueba.txt".
ERROR" THE error bbbbbbbbbb finish rows.
"/kkkk/mmmm/hhhh/jjj/ejemplo.txt".
ERROR This is other error rows.I need my file in this format.
"/cccc/pppp/dddd/ggg/prueba.txt". ERROR" THE error bbbbbbbbbb finish rows.... (3 Replies)
Hi All,
I want to remove the rows from File1.csv by comparing the columns/fields in the File2.csv. I only need the records whose first column is same and the second column is different for the same record in both files.Here is an example on what I need.
File1.csv:
RAJAK|ACTIVE|1... (2 Replies)
I have a flat file A.txt with field seperate by a pipe
2012/11/13 20:06:11 | 284:hawk pid=014268 opened Locations 12, 13, 14, 15 for /home/hawk_t112/t112/macteam/qt/NET12/full_ddr3_2X_FV_4BD_1.qt/dbFiles/t112.proto|2012/11/14 15:19:26 | still running |norway|norway
2012/11/14 12:53:51 | ... (6 Replies)
The below awk parser works for most data inputs, but I am having trouble with the last one. The problem is in the below rules steps 1 and 2 come from $2 (NC_000013.10:g.20763686_20763687delinsA) and steps 3 and 4 come from $1 (NM_004004.5:c.34_35delGGinsT).
Parse Rules:
The header is... (0 Replies)
Hello All,
I have many zipped XMLs (example file name in tgz formate - file_rec.trx.2016-01-23.000123.exc.85sesdzd45wsds5299c8f2994f7.tgz) looks following and I need to verify two numbers, they are RecordNumber and EnrolData (only sequence number, NOT hole).
for all the records, both should be... (5 Replies)
Need your help in solving this puzzle. Any kind of help will be appreciated and link for any documents to read and learn and to deal with such scenarios would be helpful
Concatenate column1 and column2 of file 1. Then check for the concatenated value in Column1 of File2. If found extract the... (14 Replies)
I have below code inside my awk script
if ( $0 ~ /SVC IN:/ )
{
svc_in=substr( $0,23 , 3);
if (msg_start == 1 && msg_end == 0)
{
msg_arr=$0;
}
}
else if ( $0 ~ /^SVC OUT:/ )
{
svc_out=substr( $0, 9, 3);
if (msg_start == 1 && msg_end == 0)
... (6 Replies)
Discussion started by: bhagya123
6 Replies
LEARN ABOUT LINUX
igawk
IGAWK(1) Utility Commands IGAWK(1)NAME
igawk - gawk with include files
SYNOPSIS
igawk [ all gawk options ] -f program-file [ -- ] file ...
igawk [ all gawk options ] [ -- ] program-text file ...
DESCRIPTION
Igawk is a simple shell script that adds the ability to have ``include files'' to gawk(1).
AWK programs for igawk are the same as for gawk, except that, in addition, you may have lines like
@include getopt.awk
in your program to include the file getopt.awk from either the current directory or one of the other directories in the search path.
OPTIONS
See gawk(1) for a full description of the AWK language and the options that gawk supports.
EXAMPLES
cat << EOF > test.awk
@include getopt.awk
BEGIN {
while (getopt(ARGC, ARGV, "am:q") != -1)
...
}
EOF
igawk -f test.awk
SEE ALSO gawk(1)
Effective AWK Programming, Edition 1.0, published by the Free Software Foundation, 1995.
AUTHOR
Arnold Robbins (arnold@skeeve.com).
Free Software Foundation Nov 3 1999 IGAWK(1)