awk merging files based on 2 complex conditions


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk merging files based on 2 complex conditions
# 1  
Old 03-25-2010
awk merging files based on 2 complex conditions

1. if the 1st row IDs of input1 (ID1/ID2.....) is equal to any IDNames of input2
print all relevant values together as defined in the output.

2. A bit tricky part is IDno in the output. All we need to do is numbering same kind of
letters as 1 (aa of ID1) and different letters as 2 (ab of ID1). And 3 is second frequent same kind of letter (bb of ID1)



Hope I made every thing clear.


input1
Code:
Column1	ID1	ID2	ID3
a1	a/a	b/b	c/c
a2	a/a	b/b	c/c
a3	a/b	b/b	c/d
a6	a/b	b/b	c/e
a8	a/a	b/c	c/e
a9	b/b	b/d	c/e

input2
Code:
IDName	Column1	name	value
ID1	a1	ppp	10
ID1	a2	ppp	20
ID1	a3	ppp	30
ID1	a4	ppp	40
ID1	a6	ppp	79
ID1	a8	ppp	62
ID2	a9	ppp	12
ID2	a1	qqq	23
ID2	a2	qqq	26
ID2	a3	qqq	28
ID2	a4	qqq	29
ID2	a6	qqq	36
ID2	a8	qqq	46
ID2	a9	qqq	22
ID4	a1     fff	67

output
Code:
IDName	name	Column1	IDs	IDNo	Value
ID1	ppp	a1	a/a	1	10
ID1	ppp	a2	a/a	1	20
ID1	ppp	a8	a/a	1	62
ID1	ppp	a3	a/b	2	30
ID1	ppp	a6	a/b	2	79
ID1	ppp	a9	b/b	3	12
ID2	qqq	a1	b/b	1	23
ID2	qqq	a2	b/b	1	26
ID2	qqq	a3	b/b	1	28
ID2	qqq	a6	b/b	1	36
ID2	qqq	a8	b/c	2	46
ID2	qqq	a9	b/d	3	22



---------- Post updated at 08:20 PM ---------- Previous update was at 03:49 AM ----------

Code:
if IDNames of input1 = IDs of IDNames of input2

    if Column1 of input1 = Column1 of input2

        print matched-IDName, and it's corresponding name, matched Column1value, and values of IDnames based on Column1value, and Value as last column

    if IDs based on column1 values are same letters
        
        print highest frequent letters as 1 in IDNo
        print second highest letters as 3 in IDNo

        else 
            print letters as 2 in IDNo


Last edited by ruby_sgp; 03-25-2010 at 09:29 PM.. Reason: adding pseudocode
# 2  
Old 03-29-2010
Try the below I hope you like it.

Code:
nawk 'NR==1{split($0,a," ");next}
NR>1&&FNR==NR{for (i=2;i<=NF;i++) { b[a[i]$1]=$i };next}
(($1$2) in b){print $1,$3,$2,b[$1$2],$4}
' OFS="\t\t" input1 input2 | sort -k4 | nawk '
BEGIN{print "IDName\t\tname\t\tColumn1\tIDs\tValue\tIDNo" }
{
split($4,_,"/") ;
if (_[1]==_[2]&&$1==p1){c=1;print $0,c;p1=$1 ;next} ; if(_[1]==_[2]&&$1!=p1){c=1;p1=$1;print $0,c;next} ;
if($4!=p){c++;p=$4; print $0,c }else{print $0,c} ;
}

' OFS="\t"

SmilieSmilieSmilie
# 3  
Old 03-30-2010
hey thanx alot

I really liked it very much.
thanx alot. i thought It will become unreply post. thanx for your time.
actually I divided every thing into smaller tasks and wrote multiple shell scripts. But your single code looks finer than mine.


Code:
IDName          name            Column1 IDs     Value   IDNo
ID1             ppp             a1              a/a             10      1
ID1             ppp             a2              a/a             20      1
ID1             ppp             a8              a/a             62      1
ID1             ppp             a3              a/b             30      2
ID1             ppp             a6              a/b             79      2
ID2             qqq             a1              b/b             23      1
ID2             qqq             a2              b/b             26      1
ID2             qqq             a3              b/b             28      1
ID2             qqq             a6              b/b             36      1
ID2             qqq             a8              b/c             46      2
ID2             ppp             a9              b/d             12      3
ID2             qqq             a9              b/d            22      3

# 4  
Old 04-06-2010
small edition

Could you please take a look at small requirements.
1. if the letters are same b/b or d/d it has to be 1 or 3 not 2[Please check the black-bold letters]
2. And if any n/n in input1 read it as 4 instead of 1 or 3.[Please check the red-bold letters]

Thanx for your help
ruby

input1
Code:
Column1	ID1	ID2	ID3
a1	a/a	b/b	c/c
a2	a/a	b/b	c/c
a3	a/b	b/b	c/d
a6	a/b	b/b	c/e
a8	a/a	b/c	c/e
a9	b/b	d/d	c/e
a10	n/n	n/n	n/n

input2
Code:
IDName	Column1	name	value
ID1	a1	ppp	10
ID1	a2	ppp	20
ID1	a3	ppp	30
ID1	a4	ppp	40
ID1	a6	ppp	79
ID1	a8	ppp	62
ID1	a10	ppp	99
ID2	a9	ppp	12
ID2	a1	qqq	23
ID2	a2	qqq	26
ID2	a3	qqq	28
ID2	a4	qqq	29
ID2	a6	qqq	36
ID2	a8	qqq	46
ID2	a9	qqq	22
ID4	a1     fff	67

output
Code:
IDName          name            Column1 IDs     Value   IDNo
ID1             ppp             a1              a/a             10      1
ID1             ppp             a2              a/a             20      1
ID1             ppp             a8              a/a             62      1
ID1             ppp             a3              a/b             30      2
ID1             ppp             a6              a/b             79      2
ID1             ppp             a10              n/n             99      4
ID2             qqq             a1              b/b             23      1
ID2             qqq             a2              b/b             26      1
ID2             qqq             a3              b/b             28      1
ID2             qqq             a6              b/b             36      1
ID2             qqq             a8              b/c             46      2
ID2             ppp             a9              d/d             12      3
ID2             qqq             a9              d/d            22      3


Last edited by ruby_sgp; 04-06-2010 at 12:34 AM.. Reason: formatting problem
# 5  
Old 04-06-2010
after modification use below:-

Code:
nawk '
NR==1{split($0,a," ");next}
NR>1&&FNR==NR{for (i=2;i<=NF;i++) { b[a[i]$1]=$i };next}
(($1$2) in b){print $1,$3,$2,b[$1$2],$4}
' OFS="\t\t" input1 input2 | sort -k1,1 -k4,4  | nawk '
BEGIN{print "IDName\t\tname\t\tColumn1\tIDs\tValue\tIDNo" }
{
split($4,_,"/") ;
if (_[1]==_[2]&&$1==p1){if ($4!=p){c++} ;print $0,c;p1=$1;p=$4 ;next} ; if(_[1]==_[2]&&$1!=p1){c=1;p1=$1;p=$4;print $0,c;next} ;
if($4!=p){c++;p=$4; print $0,c }else{print $0,c} ;
} ' OFS="\t"

SmilieSmilieSmilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to update file based on 5 conditions

I am trying to use awk to update the below tab-delimited file based on 5 different rules/conditions. The final output is also tab-delimited and each line in the file will meet one of the conditions. My attemp is below as well though I am not very confident in it. Thank you :). Condition 1: The... (10 Replies)
Discussion started by: cmccabe
10 Replies

2. Shell Programming and Scripting

awk to filter file based on seperate conditions

The below awk will filter a list of 30,000 lines in the tab-delimited file. What I am having trouble with is adding a condition to SVTYPE=CNV that will only print that line if CI= must be >.05 . The other condition to add is if SVTYPE=Fusion, then in order to print that line READ_COUNT must... (3 Replies)
Discussion started by: cmccabe
3 Replies

3. Shell Programming and Scripting

Merge input from two files into one based on conditions

Using Linux (bash), I have two files which contain information about berries. Example: file1.txt: Blueberry blue 14 Raspberry red 12 Blackberry dark 4 file2.txt Blackberry sour 4 3 Blueberry tasty 12 78 Strawberry yummy 33 88 I want to merge these two files into one. The desired... (5 Replies)
Discussion started by: Zooma
5 Replies

4. Shell Programming and Scripting

Need script for making files based on some conditions.

Hi All, I have a text file (code_data.txt) with the followig data. AMAR AB123456 XYZ KIRAN CB789 ABC RAJ CS78890 XYZ KAMESH A33535335 ABC KUMAR MD678894 MAT RITESH SR3535355... (26 Replies)
Discussion started by: ROCK_PLSQL
26 Replies

5. Shell Programming and Scripting

merging two files based on first column

I had two files file1 and file2. I want a o/p file(file3) like below using first column as ref. Pls give suggestion ass join is not working as the number of lines in each file is nealry 5 C? file1 --------------------- 404000324810001 Y 404000324810004 N 404000324810008 Y 404000324810009 N... (1 Reply)
Discussion started by: p_sai_ias
1 Replies

6. UNIX for Dummies Questions & Answers

any script for joining files based on simple conditions

Condition1; If NPID and IndID of both input1 and input2 are same take all the vaues relevant to them and print together as output Condition2; IDNo in output: Take the highly repeated same letter of similar NPID-IndID as *1* Second highly repeated same letter... (0 Replies)
Discussion started by: stateperl
0 Replies

7. Shell Programming and Scripting

Merging of all files based on a condition

Hi Friends, I am new to UNIX. I need to merge all the files(to FINAL.txt) in single directory based one condition. Out of all the files one of file will have specific value like :GF01: at any where in the file. so the file which is having :GF01: should be appended at the last. EX:... (5 Replies)
Discussion started by: arund_01
5 Replies

8. Shell Programming and Scripting

Merging files based on the contents

Hi, I have a file f1 having the contents as below select (<condn>) from dual I have another file f2 having the contents as below 1, 2, 3 I want to replace <condn> in f1 with the contents of f2 I tried using sed like this sed "s:<condn>:`cat f2`:g" f1 The above command resulted in sed:... (3 Replies)
Discussion started by: mr_manii
3 Replies

9. Shell Programming and Scripting

using awk to count no of records based on conditions

Hi I am having files with date and time stamp as the folder names like 200906051400,200906051500,200906051600 .....hence everyday 24 files will be generated i need to do certain things on this 24 files daily file contains the data like 200906050016370 0 1244141195225298lessrv3 ... (13 Replies)
Discussion started by: aemunathan
13 Replies

10. Shell Programming and Scripting

merging two files based on some key

I have to merge two files: The files are having the same format like A0this is first line TOlast line silmilarly other lines. I have to search for A0 line in the second file also and then put the data in the third file under A0 heading ,then for A1 and so on. A0 portion will be treminated... (1 Reply)
Discussion started by: Vandana Yadav
1 Replies
Login or Register to Ask a Question