awk output line by line


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers awk output line by line
# 1  
Old 06-09-2014
awk output line by line

Hi all,

New to the forum and somewhat of a Bash newbie so go easy. Smilie

I have some data in a tsv file, a sample of which looks like this (ignore what look like duplicates, in reality there are many more columns of data):

Code:
SAMD11    NM_152486.2    intronic
SAMD11    NM_152486.2    intronic
SAMD11    NM_152486.2    intronic
SAMD11|NOC2L    NM_152486.2|NM_015658.3    downstream|intronic
SAMD11|NOC2L    NM_152486.2|NM_015658.3    downstream|intronic
PLOD1    NM_000302.3    exonic
PLOD1    NM_000302.3    exonic
PLOD1    NM_000302.3    exonic
PLOD1    NM_000302.3    intronic

I would like to use the value in the first column and search a second file such that if that value is present in any line of the second file it is added to the same line in the first file as as a new column. A sample line from the second file looks like this (| delimited):

Code:
PLOD1, LH1, LLH, EDS6|Ehlers-Danlos syndrome, type VI, 225400 (3)

As you can see the first file can have more than one value in the first column so I would need an OR based search. I know EGREP with | can do this. I can also print the lines in the first file using AWK but I've got no idea how to do this line by line or how to use these as inputs in EGREP such that it will search the second file. I'm also a bit unsure as to how to add the data from the second file to the first.

I hope it's clear but here's my desired output based on my two samples above:

Code:
SAMD11    NM_152486.2    intronic
SAMD11    NM_152486.2    intronic
SAMD11    NM_152486.2    intronic
SAMD11|NOC2L    NM_152486.2|NM_015658.3    downstream|intronic
SAMD11|NOC2L    NM_152486.2|NM_015658.3    downstream|intronic
PLOD1    NM_000302.3    exonic PLOD1, LH1, LLH, EDS6|Ehlers-Danlos syndrome, type VI, 225400 (3)
PLOD1    NM_000302.3    exonic PLOD1, LH1, LLH, EDS6|Ehlers-Danlos syndrome, type VI, 225400 (3)
PLOD1    NM_000302.3    exonic PLOD1, LH1, LLH, EDS6|Ehlers-Danlos syndrome, type VI, 225400 (3)
PLOD1    NM_000302.3    intronic PLOD1, LH1, LLH, EDS6|Ehlers-Danlos syndrome, type VI, 225400 (3)

Thanks in advance,

Matt

Last edited by bartus11; 06-09-2014 at 02:16 PM.. Reason: Please use [code][/code] tags.
# 2  
Old 06-09-2014
Welcome to forums, you may try this

Code:
$ awk 'FNR==NR{A[$0];next}NF{ for(i in A){ if(i ~ $1){ $0 = $0 OFS i; break } }}1' file2  file1

OR
Code:
$ awk 'FNR==NR{A[$0];next}NF{ for(i in A){ if(match(i, $1)){ $0 = $0 OFS i; break } }}1' file2 file1

---------- Post updated at 11:55 PM ---------- Previous update was at 11:50 PM ----------

Code:
$ cat file2
PLOD1, LH1, LLH, EDS6|Ehlers-Danlos syndrome, type VI, 225400 (3)

Code:
$ cat file1
SAMD11 NM_152486.2 intronic
SAMD11 NM_152486.2 intronic
SAMD11 NM_152486.2 intronic
SAMD11|NOC2L NM_152486.2|NM_015658.3 downstream|intronic
SAMD11|NOC2L NM_152486.2|NM_015658.3 downstream|intronic
PLOD1 NM_000302.3 exonic
PLOD1 NM_000302.3 exonic
PLOD1 NM_000302.3 exonic
PLOD1 NM_000302.3 intronic

---------- Post updated Jun 10th, 2014 at 12:06 AM ---------- Previous update was Jun 9th, 2014 at 11:55 PM ----------

Resulting
Code:
SAMD11 NM_152486.2 intronic
SAMD11 NM_152486.2 intronic
SAMD11 NM_152486.2 intronic
SAMD11|NOC2L NM_152486.2|NM_015658.3 downstream|intronic
SAMD11|NOC2L NM_152486.2|NM_015658.3 downstream|intronic
PLOD1 NM_000302.3 exonic PLOD1, LH1, LLH, EDS6|Ehlers-Danlos syndrome, type VI, 225400 (3)
PLOD1 NM_000302.3 exonic PLOD1, LH1, LLH, EDS6|Ehlers-Danlos syndrome, type VI, 225400 (3)
PLOD1 NM_000302.3 exonic PLOD1, LH1, LLH, EDS6|Ehlers-Danlos syndrome, type VI, 225400 (3)
PLOD1 NM_000302.3 intronic PLOD1, LH1, LLH, EDS6|Ehlers-Danlos syndrome, type VI, 225400 (3)

This User Gave Thanks to Akshay Hegde For This Post:
# 3  
Old 06-09-2014
Try also
Code:
awk 'NR==FNR {T[$1]=$0; next} {print $0, T[$1]}' FS="," file2 FS=" " file1

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Get an output of lines in pattern 1st line then 10th line then 11th line then 20th line and so on.

Input file: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 (6 Replies)
Discussion started by: Sagar Singh
6 Replies

2. Shell Programming and Scripting

awk use sequential line numbering in output

The awk below produces an output with the original header and only the matching lines (which is good), but the output where the original line numbering in the match found on is used. I can not figure out how to sequentially number the output instead of using the original. I did try to add... (2 Replies)
Discussion started by: cmccabe
2 Replies

3. Shell Programming and Scripting

awk printing output to new line

Hi I have a file profile.txt with the below input: {"atgUserId":"736f14c4-eda2-4531-9d40-9de4d6d1fb0f","firstName":"donna","lastName":"biehler","email":"schoolathome42@live.com","receiveEmail":"y es"}, {"atgUserId":"c3716baf-9bf8-42da-8a44-a13fff68d20f","firstName":"Gilberto... (6 Replies)
Discussion started by: ankur328
6 Replies

4. Shell Programming and Scripting

Print awk output in same line ,For loop

My code is something like below. #/bin/bash for i in `ps -ef | grep pmon | grep -v bash | grep -v grep | grep -v perl | grep -v asm | grep -v MGMT|awk '{print $1" "$8}'` do echo $i ORACLE_SID=`echo $line | awk '{print $2}'` USERNAME=`echo $line | awk '{print $1}'` done ============= But... (3 Replies)
Discussion started by: tapia
3 Replies

5. Shell Programming and Scripting

Output on one line using awk or sed

I have a file of 100,000 lines in the below format: answer.bed chr1 957570 957852 NOC2L chr1 976034 976270 PERM1 chr1 976542 976787 PERM1 I need to get each on one line and so far what I have tried doesn't seem to be working. Thank you... (3 Replies)
Discussion started by: cmccabe
3 Replies

6. Shell Programming and Scripting

sed command to replace a line in a file using line number from the output of a pipe.

Sed command to replace a line in a file using line number from the output of a pipe. Is it possible to replace a whole line piped from someother command into a file at paritcular line... here is some basic execution flow.. the line number is 412 lineNo=412 Now i have a line... (1 Reply)
Discussion started by: vivek d r
1 Replies

7. Shell Programming and Scripting

Reading ls -l output line by line awk the user name and su user to run commands

Using ksh on AIX what I am trying to do is to read the ls -l output from a file in a do while loop line by line. Extract the user name(3rd field) and the directory/file name(9th field) using awk and save them into variables. su -c to the user and change directory/file permisions to 777. Script I... (13 Replies)
Discussion started by: zubairom
13 Replies

8. Shell Programming and Scripting

Need to have output of AWK array in one line

I have this code echo $logfile | awk ' {arr++; next} END { for (i in arr) {print i} }' that gives me this output result1 result2 result3 I try to figure out how to get it like this result1 result2 result3 (4 Replies)
Discussion started by: Jotne
4 Replies

9. UNIX for Dummies Questions & Answers

AWK: add and sum line in output

Hi, I have this output: extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd2 0.1 2.9 2.6 24.8 0.0 0.1 21.0 0 1 sd3 0.1 2.9 2.7 24.8 0.0 ... (9 Replies)
Discussion started by: mjnman
9 Replies

10. Shell Programming and Scripting

awk help required to group output and print a part of group line and original line

Hi, Need awk help to group and print lines to format the output as shown below INPUT FORMAT set echo on set heading on set spool on /* SCHEMA1 */ CREATE TABLE T1; /* SCHEMA1 */ CREATE TABLE T2; /* SCHEMA1 */ CREATE TABLE T3; /* SCHEMA1 */ CREATE TABLE T4; /* SCHEMA1 */ CREATE TABLE T5;... (5 Replies)
Discussion started by: rajan_san
5 Replies
Login or Register to Ask a Question