Help on writing data from 2 different files to one based on a common factor


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help on writing data from 2 different files to one based on a common factor
# 1  
Old 02-07-2013
Help on writing data from 2 different files to one based on a common factor

Hello all,

I have 2 text files.
For example:
File1.txt contains data
Code:
A
B
C
D
****NEXT****
X
Y
Z
****NEXT****
L
M
N

and File2.txt contains data
Code:
P
Q
X
****NEXT****
E
F
G
B
****NEXT**** 
J
K
L

As you can see the data is grouped and if you observe the 2 files, there are common data in each group - X,B,L is common in both files. Now based on these 2 files data and common factor I move to a third file in this format
Code:
B A B C D E F G
X X Y Z P Q
L L M N J K

Is this insane request even possible in shell script. Please help.
# 2  
Old 02-07-2013
I'm lost. What goes where based on which criteria?
# 3  
Old 02-07-2013
The 2 files contains the data one below the other as shown in the examples above and if you take a look at both the files, the common data is available in both the text files with the ***NEXT*** indicating that a group ends....

So I need the data from both the files with the common factor (data) as the catalyst to merge the group and write into a third file.

Example once again: 'B' is common in both files which is termed as a group
ABCD and EFGB... so these 2 needs to be merged based on the common factor B and written into a third file with the data now shown B A B C D E F G with the common data first and the remaining data in one line. HTH
# 4  
Old 02-07-2013
It's lit bit lenghty Smilie

try

Code:
$ awk -v a=1 -v b=1 'NR==FNR{if($0 ~ /****NEXT****/){a++}else{A[$0]++;X[a]=X[a]?X[a] FS $0 : $0}next}{
if($0 ~ /****NEXT****/){b++}else{Y[b]=Y[b]?Y[b] FS $0 : $0}}END{
for(i in X){n=split(X[i],P)
for(j in Y){
for(t=1;t<=n;t++){
if(Y[j] ~ P[t]){gsub(P[t],"",Y[j]);print P[t],X[i],Y[j]}}}}}' file1 file2
 
B A B C D E F G
X X Y Z P Q
L L M N J K

shortened

Last edited by pamu; 02-07-2013 at 02:40 AM.. Reason: bit shortened
# 5  
Old 02-07-2013
Thanks Pamu. But I get this error

Code:
couldn't set locale correctly
couldn't set locale correctly
awk: syntax error near line 1
awk: bailing out near line 1

# 6  
Old 02-07-2013
Quote:
Originally Posted by vat1kor
. . .
Example once again: 'B' is common in both files which is termed as a group
ABCD and EFGB... so these 2 needs to be merged based on the common factor B and written into a third file with the data now shown B A B C D E F G with the common data first and the remaining data in one line.
Why then is it B A B C D E F G and not B B A C D E F G? Why move the second group's B up front, but not the first one's?
# 7  
Old 02-07-2013
Quote:
Originally Posted by vat1kor
Thanks Pamu. But I get this error

Code:
couldn't set locale correctly
couldn't set locale correctly
awk: syntax error near line 1
awk: bailing out near line 1


Use /usr/xpg4/bin/awk or nawk on Solaris.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Compare two files and print based on common variable value.

Hi All, i have below two files. FILE: NAME="/dev/sda" TYPE="disk" SIZE="60G" OWNER="root" GROUP="disk" MODE="brw-rw----" PKNAME="" MOUNTPOINT="" NAME="/dev/sda1" TYPE="part" SIZE="500M" OWNER="root" GROUP="disk" MODE="brw-rw----" PKNAME="/dev/sda" MOUNTPOINT="/boot" NAME="/dev/sda2"... (3 Replies)
Discussion started by: balu1234
3 Replies

2. UNIX for Dummies Questions & Answers

How to join 2 .txt files based on a common column?

Hi all, I'm trying to join two .txt file tab delimitated based on a common column. File 1 transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct comp1000201_c0_seq1 comp1000201_c0 337 183.51 0.00 0.00 0.00 0.00 comp1000297_c0_seq1 ... (1 Reply)
Discussion started by: alisrpp
1 Replies

3. Shell Programming and Scripting

common entries between files based on 1st column

Hi, I am trying to get the common entries from 2 files based on 1st field.. However when I try to do in perl I am getting blank output.. How can I do this in awk? open(BUFF1, "my_genes"); open(BUFF3, "rawcounts"); #open(WRBUFF,">result_rawcounts"); while($line =<BUFF1>) { ... (3 Replies)
Discussion started by: Diya123
3 Replies

4. Shell Programming and Scripting

Matching and Merging csv data fields based on a common field

Dear List, I have a file of csv data which has a different line per compliance check per host. I do not want any omissions from this csv data file which looks like this: date,hostname,status,color,check 02-03-2012,COMP1,FAIL,Yellow,auth_pass_change... (3 Replies)
Discussion started by: landossa
3 Replies

5. UNIX for Dummies Questions & Answers

compare two files based on common field in unix

I have two files in UNIX. 1st file is Entity and Second File is References. 1st File has only one column named Entity ID and 2nd file has two columns Entity ID | Person ID. I want to produce a output file where entity id's are matching in both the files. Entity File 624197 624252 624264... (4 Replies)
Discussion started by: PRS
4 Replies

6. UNIX for Dummies Questions & Answers

Writing a loop to merge multiple files by common column

I have 100 data files labelled 250.1.txt through 250.100.txt. The second column of the data files partially match (there is about %90 overlap). Each data file has 4 columns. I want the merge all these text files by the matching values in the second column. In the output, the first column should... (1 Reply)
Discussion started by: evelibertine
1 Replies

7. Shell Programming and Scripting

join files based on a common field

Hi experts, Would you please help me with this? I have several files and I need to join the forth field of them based on the common first field. here's an example... first file: 280346 39.88 -75.08 547.8 280690 39.23 -74.83 538.7 280729 40.83 -75.08 499.2 280907 40.9 -74.4 507.8... (5 Replies)
Discussion started by: GoldenFire
5 Replies

8. Shell Programming and Scripting

Join multiple files based on 1 common column

I have n files (for ex:64 files) with one similar column. Is it possible to combine them all based on that column ? file1 ax100 20 30 40 ax200 22 33 44 file2 ax100 10 20 40 ax200 12 13 44 file2 ax100 0 0 4 ax200 2 3 4 (9 Replies)
Discussion started by: quincyjones
9 Replies

9. Shell Programming and Scripting

Merging 2 files based on a common column

Hi All, I do have 2 files file 1 has 4 tab delimited columns 234 a c dfgyu 294 b g fih 302 c h jzh 328 z c san 597 f g son File 2 has 2 tab delimted columns 234 23 302 24 597 24 I want to merge file 2 with file 1 based on the data common in both files which is the first column so... (6 Replies)
Discussion started by: Lucky Ali
6 Replies

10. UNIX for Dummies Questions & Answers

Writing data onto new lines based on terminator

I have a requirement, where based on a particular character on a single line, the data has to be written to new lines... Ex: abccd$xyzll$bacc$kkklkjl$albc My output should be abccd$ xyzll$ bacc$ kkklkjl$ albc Can someone help on this. (1 Reply)
Discussion started by: thanuman
1 Replies
Login or Register to Ask a Question