Match in awk skipping header rows


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Match in awk skipping header rows
# 1  
Old 02-01-2016
Match in awk skipping header rows

I am trying to match $1-$7 between the two files and if a match is found then the contents of $8 in file2 and copied over. The awk I tried is below. There is also a header row in file2 that has the Chr Start End Ref Alt that does not need to be searched. Thank you Smilie.

awk
Code:
awk 'FNR==NR>1{a[$1,$2,$3,$4,$5,$6,$7]=$0;next}{if(b=a[$1,$2,$3,$4,$5,$6,$7]){print b}}' file1.txt file2.txt > output.txt 
-bash: $'\342\200\213awk': command not found

file1
Code:
yyyyy,yyy	Chr	Start	End	Ref	Alt	Func.refGene
yyyyy,yyy	22	40757228	40757228	A	C	intronic
yyyyy,yyy	5	125887715	125887715	T	G	exonic
yyyyy,yyy	5	125880589	125880589	T	C	UTR3
xxxxxxx,xxx	1111111	Chr	Start	End	Ref	Alt	Func.refGene
xxxxxxx,xxx	1111111	1	43394661	43394661	A	G	exonic
xxxxxxx,xxx	1111111	2	166870221	166870221	A	C	intronic

file2
Code:
yyyyy,yyy 	0	22	40757228	40757228	A	C	likely benign
yyyyy,yyy  0	5	125887715	125887715	T	G	likely benign
yyyyy,yyy	0	5	125880589	125880589	T	C	likely benign



desired output
Code:
yyyyy,yyy	Chr	Start	End	Ref	Alt	Func.refGene
yyyyy,yyy	22	40757228	40757228	A	C	intronic likely benign
yyyyy,yyy	5	125887715	125887715	T	G	exonic likely benign
yyyyy,yyy	5	125880589	125880589	T	C	UTR3 likely benign


Last edited by cmccabe; 02-02-2016 at 11:54 AM.. Reason: added details
# 2  
Old 02-01-2016
Code:
awk '
        BEGIN {
                print "yyyyy,yyy        Chr     Start           End             Ref     Alt     Func.refGene"
        }
        NR == FNR && !/Chr/ {
                A[$1 FS $3 FS $4 FS $5 FS $6 FS $7] = $(NF-1) FS $NF
                next
        }
        !/Chr/ && ( ( $1 FS $2 FS $3 FS $4 FS $5 FS $6 ) in A ) {
                print $0, A[$1 FS $2 FS $3 FS $4 FS $5 FS $6]
        }
' file1 file2

This User Gave Thanks to Yoda For This Post:
# 3  
Old 02-02-2016
Could it be that you first data sample is not entirely accurate? There is a space and a tab between field 1 and 2 on the first line and only spaces on the second line. I suspect the actual file is tab separated? If that is the case, then you could try this:

Code:
awk '
  NR==FNR {
    A[$1,$3,$4,$5,$6,$7]=$8
    next
  }
  ($1,$2,$3,$4,$5,$6) in A {
    print $0, A[$1,$2,$3,$4,$5,$6]
  }
' FS='\t' file1 file2

If there are also header files in file1 then you could use a conditional statement to fileter them out..

If you need all the headers from file 2 printed if one or more of its members match, then additional provisions would need to be made..

Last edited by Scrutinizer; 02-02-2016 at 12:25 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 4  
Old 02-02-2016
Yes the files are tab separated as you suspected and there are header rows in file1 but not file2 and these headers rows only need to appear once if a match is found. Thank you very much Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Convert rows into column along with header

Hi, I have a requirement to format the data in a new order. Here is my source format : ppp ***Wed Dec 16 10:32:30 GMT 2015 header1 header2 header3 header4 header5 server1 0.00 0.02 0.07 0.98 server2 0.01 0.00 0.08 0.79 server3 0.05 0.82 0.77 0.86 ... (18 Replies)
Discussion started by: john_prince
18 Replies

2. Shell Programming and Scripting

Convert header rows into

I want to put the 3 first lines into a single line separated by ; I've tried to use Sed and Awk but without success. I'm new to Shell scripting. Thanks in advance! Input 112 DESAC_201309_OR_DJ10 DJ10 1234567890123;8 1234567890124;20 1234567890125;3 expected Output... (8 Replies)
Discussion started by: MoroccanRoll
8 Replies

3. Shell Programming and Scripting

Manipulate all rows except header, but header should be output as well

Hello There... I have a sample input file .. number:department:amount 125:Market:125.23 126:Hardware store:434.95 127:Video store:7.45 128:Book store:14.32 129:Gasolline:16.10 I will be doing some manipulations on all the records except the header, but the header should always be... (2 Replies)
Discussion started by: juzz4fun
2 Replies

4. Shell Programming and Scripting

Awk/sed script for transposing any number of rows with header row

Greetings! I have been trying to find out a way to take a CSV file with a large number of rows, and a very large number of columns (in the thousands) and convert the rows to a single column of data, where the first row is a header representing the attribute name and the subsequent series of... (3 Replies)
Discussion started by: tntelle
3 Replies

5. Shell Programming and Scripting

awk : Filter a set of data to parse header line and last field of multiple same match.

Hi Experts, I have a data with multiple entry , I want to filter PKG= & the last column "00060110" or "00088150" in the output file: ############################################################################################### PKG= P8SDB :: VGS = vgP8SOra vgP8SDB1 vgP8S001... (5 Replies)
Discussion started by: rveri
5 Replies

6. Shell Programming and Scripting

Skipping rows based on columns

Hi, suppose I have the following file and certain rows have missing columns, how do i skip these rows and create an output file which has all the columns in it E/N Ko_exp %err Ko_calc %err diff diff- diff+ 0.95 ======== ======= ==== ======= ==== ===== ===== =====... (12 Replies)
Discussion started by: ramky79
12 Replies

7. Shell Programming and Scripting

Awk match rows

Hi, I am pretty new to awk. I have a text file of the following style a b c d e f g h i 1 a b c d e f g h i 2 a b c d e f g h i 3 j k l m n o p q r 4 s t u v w x y z # 5 s t u v w x y z #7 I want the minimum of 10th column if the first 9 columns match with its before and after... (6 Replies)
Discussion started by: jacobs.smith
6 Replies

8. UNIX for Dummies Questions & Answers

Appending 2 files skipping the header of the second file

I have 2 files with the same header and need to append them and put the result in a 3rd file the 2 files has the same header and while appending i want to skip the second file header and need the result to be put in a third file Normally, this would work Cat file1 file2 >> file3....But how... (5 Replies)
Discussion started by: saggiboy10
5 Replies

9. Shell Programming and Scripting

Binning rows while skipping the first column

Hi I have a file that I want to bin. I am using this code: awk -F'\t' -v r=40 '{for(i=r;i<=NF;i+=r){for(j=0;j<r;j++){sum+=$(i-j)}printf "%s ", sum/r;sum=0}; printf "\n"}' file1 > file2 So basically what this code does is that it will averaging every 40 columns (creating bins of 40). But... (2 Replies)
Discussion started by: phil_heath
2 Replies
Login or Register to Ask a Question