Visit Our UNIX and Linux User Community


Complex comparison of 3 files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Complex comparison of 3 files
# 1  
Old 09-05-2013
Complex comparison of 3 files

Hello to all in forum,

May you experts could help me with this complex comparison please.

I need to search the numbers in file1 within column3 of file2 and if found, compare column4 to NF
of file2 with the lines in file3. If the column4 to NF in file2 match any of the lines
in file3, then print column2 to NF of file2 and add "C" at the end. If the column4 to NF don't match
any of the lines in file3, print column2 to NF of file2, adding "D" at the end.

file1
Code:
547680
575210
804270
123989
623989
221209

file2
Code:
1|4501892|547680|1|2|30|73491|12|34|1
2|4788930|575210|1|2|30|73472|12|34|1
3|6793773|804270|1|2|30|73420
4|6673724|123989|1|2|30|73001|12|34|1
5|8099821|333722|1|30|73473|10|34|1
6|7889200|623989|1|2|30|73001|12|45|1
7|8882662|221209|1|2|30|83002|12|34|1

file3
Code:
1|2|30|73472|12|34|1
1|2|30|73001|12|34|1
1|2|30|83002|12|34|1

Desired output:
Code:
547680|1|2|30|73491|12|34|1,D
575210|1|2|30|73472|12|34|1,C
804270|1|2|30|73420,D
123989|1|2|30|73001|12|34|1,C
623989|1|2|30|73001|12|45|1,D
221209|1|2|30|83002|12|34|1,C

Thanks in advance for any help.

Last edited by Ophiuchus; 09-05-2013 at 02:43 AM..
# 2  
Old 09-05-2013
Try

Code:
awk -F \| 'f==1{A[$1]++}
f==2 {$1=$2="";B[$7]=$0;}
f==3{if(B[$4]){sub("\\|\\|","",B[$4]);print B[$4]",C";delete B[$4]}}END{for (i in B){if(B[i]){sub("\\|\\|","",B[i]);print B[i]",D"}}}
' OFS=\| f=1 file1 f=2 file2 f=3 file3

575210|1|2|30|73472|12|34|1,C
623989|1|2|30|73001|12|45|1,C
221209|1|2|30|83002|12|34|1,C
804270|1|2|30|73420,D
333722|1|30|73473|10|34|1,D
547680|1|2|30|73491|12|34|1,D

# 3  
Old 09-05-2013
Another version:
Code:
awk '!f{A[$0]; next} {h=$3; $1=$2=$3=x; sub(/\|\|\|/,x)} h in A{print h, $0 "," ($0 in A?"C":"D")}' FS=\| OFS=\| file1 file3 f=1 file2


Last edited by Scrutinizer; 09-05-2013 at 03:39 AM..
# 4  
Old 09-05-2013
Another one

Code:
bash-3.2$ cat f1
547680
575210
804270
123989
623989
221209
bash-3.2$ cat f2
1|4501892|547680|1|2|30|73491|12|34|1
2|4788930|575210|1|2|30|73472|12|34|1
3|6793773|804270|1|2|30|73420
4|6673724|123989|1|2|30|73001|12|34|1
5|8099821|333722|1|30|73473|10|34|1
6|7889200|623989|1|2|30|73001|12|45|1
7|8882662|221209|1|2|30|83002|12|34|1
bash-3.2$ 
bash-3.2$ cat f3
1|2|30|73472|12|34|1
1|2|30|73001|12|34|1
1|2|30|83002|12|34|1
bash-3.2$ 
bash-3.2$ awk 'NR==FNR {  x[$0]++; next; } { m=0; for(i in x) { regexp=gensub(/\|/, "\\\\\\|", "g", i); if(match($0, regexp)) { m++; }} printf "%s", $0; print m ? ",C" : ",D" } ' f3 <(join -t'|' -2 3 f1 f2 | cut -d'|' -f 1,4-)
547680|1|2|30|73491|12|34|1,D
575210|1|2|30|73472|12|34|1,C
804270|1|2|30|73420,D
123989|1|2|30|73001|12|34|1,C
623989|1|2|30|73001|12|45|1,D
221209|1|2|30|83002|12|34|1,C


Last edited by MR.bean; 09-05-2013 at 04:11 AM..
# 5  
Old 09-05-2013
Hello Pamu, Scrutinizer, Mr.Bean

Many thanks! all 3 solution works correctly
# 6  
Old 09-05-2013
awk

Code:
awk -F"[|]" '{
        if(FILENAME=="file1")
                key[$1]=1
        else if(FILENAME=="file3"){
                tail[$0]=1
        }
        else{
                if(key[$3]==1){
					str=$4
                    for(i=5;i<=NF;i++){
						str=sprintf("%s|%s",str,$i)
					}
					if(tail[str]==1)
						print $3"|"str",C"
					else
						print $3"|"str",D"
                }
        }
}' file1 file3 file2


Previous Thread | Next Thread
Test Your Knowledge in Computers #16
Difficulty: Easy
There are a total of 25 pins in the traditional parallel port of a computer system.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Comparison of files

I have the requirement I have two files cat fileA something anythg nothing everythg cat fileB everythg anythg Now i shld use fileB and compare every line at fileA and get the output as something nothing (3 Replies)
Discussion started by: Priya Amaresh
3 Replies

2. Shell Programming and Scripting

Comparison of two files

Hi all I have two files which I have to compare that whetehr there is soemthing common or not body, div, table, thead, tbody, tfoot, tr, th, td, p { font-family: "Liberation Sans"; font-size: x-small; } body, div, table, thead, tbody, tfoot,... (2 Replies)
Discussion started by: manigrover
2 Replies

3. Shell Programming and Scripting

Complex data sorting in excel files or text files

Dear all, I have a complex data file shown below,,,,, A_ABCD_13208 0 0 4.16735 141044 902449 1293900 168919 C_ABCD_13208 0 0 4.16735 141044 902449 1293900 168919 A_ABCDEF715 52410.9 18598.2 10611 10754.7 122535 252426 36631.4 C_DBCDI_1353 0... (19 Replies)
Discussion started by: AAWT
19 Replies

4. Shell Programming and Scripting

Complex renaming then moving files

I am a biologist and using an program on a computer cluster that generates a lot of data. The program creates a directory named ExperimentX (where X is a number) that contains files "out.pdb" and "log.txt". I would like to create a script that renames the out.pdb file to out_ExperimentX.pdb (or... (1 Reply)
Discussion started by: yaledocker
1 Replies

5. Shell Programming and Scripting

mass renaming files with complex filenames

Hi, I've got files with names like this : _Some_Name_178_HD_.mp4 _Some_Name_-_496_Vost_SD_(720x400_XviD_MP3).avi Goffytofansub_Some name 483_HD.avi And iam trying to rename it with a regular pattern. My gola is this : Ep 178.mp4 Ep 496.avi Ep 483.avi I've tried using sed with... (8 Replies)
Discussion started by: VLaw
8 Replies

6. Shell Programming and Scripting

Comparison of two files (sh)

Hi, I have a problem with comparison of two files file1 20100101 20090101 20080101 20071001 20121229 file2 19990112 12 456 7 20011131 19 20100101 2 567 1 987 17890709 123 555 and, sh script needs to compare of these two files and give out to me result: 20100101 2 567 1 987 it... (5 Replies)
Discussion started by: shizik
5 Replies

7. Shell Programming and Scripting

Joining files in a complex way

if input1 1st row labels (S1or S2 or S3 or any (actually so many in original text file)) are similar to 1st column of input2 i.e "ID" merge them together based on input1 1st row labels. for example take S1..... input1 "aphab" "S1" "S2" "S3" "a" "A/A" "A/A" "A/A" "b" ... (19 Replies)
Discussion started by: stateperl
19 Replies

8. Shell Programming and Scripting

sed: How to modify files in a complex way

Hello, I am new to sed and hope that someone can help me with the following task. I need to modify a txt file which has format like this: xy=CreateDB|head.queue|head.source|head.definition|rtf.edit|rtf.task|rft.cut abc|source|divine|line4|5|true into something like: head.queue=abc... (19 Replies)
Discussion started by: pinkypunky
19 Replies

9. Shell Programming and Scripting

comparison of 2 files

Kindly help on follows. I have 2 files. One file contains only one column of mobile numbers. And total records in a file 12 million. Second file contains 2 columns mobile numbers and balance. and total records 30 million. I want to find out balance of each data in file 1 corresponding to file 2.... (2 Replies)
Discussion started by: kamal_418
2 Replies

10. Shell Programming and Scripting

Help with complex merg of files with common field

Please help, I am new to shell Programming. I have three files each containg a unique text (key) field (e.g. ABCDEF, XCDUD as shown below), line return followed by some data of which there can be more then one instance. In addition, in some cases there may be no data but only a key field. Please... (18 Replies)
Discussion started by: gugs
18 Replies

Featured Tech Videos