merge two column multiple files into one


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users merge two column multiple files into one
# 1  
Old 01-10-2012
Question merge two column multiple files into one

Hi
I have multiple files each with two columns and I need to combine all those file into a tab delimited file. (multiple entry with same name separated by a comma)

The content of the files are as follows:
---
file1.txt:

name var1
aaa xx
aaa gg
bbb yy
ddd zz
---
file2.txt

name var2
bbb mm
ccc nn
ddd pp
ddd qq
----

file3.txt

name var3
aaa uu
ddd vv
eee ww
fff tt
fff uu
---------

I need to combine all these file into a tab delimited file in such a way that the output looks like follows:
 
name \t var1 \t var2 \t var3
aaa \t xx, gg \t - \t uu
bbb \t yy \t mm \t -
ccc \t zz \t nn \t -
ddd \t - \t pp,qq \t vv
eee \t - \t - \t ww
fff \t - \t - \t tt,uu

* \t shows tab delimited space.

I was wondering how to do this in unix. I can do the same in other language but because of efficiency issue I was looking to unix as a potential rescuer.

Thanks for ur help

Last edited by mary271; 01-10-2012 at 06:50 PM..
# 2  
Old 01-10-2012
look into 'man paste'
# 3  
Old 01-10-2012
I have different number of lines in each file and the output I need is not as straight forward.

Last edited by mary271; 01-10-2012 at 07:06 PM..
# 4  
Old 01-10-2012
Code:
$ cat agg.awk
BEGIN { OFS = "\t" }
{
  if ($1 != last) {
    if (NR > 1) print last, a
    last = $1
    a = $2
  } else a = a ", " $2
}
END { print last, a }

$ join -t$'\t' -a1 -a2 -e- -j1 -o0,1.2,2.2 <(awk -f agg.awk f1) <(awk -f agg.awk f2) |
join -t$'\t' -a1 -a2 -e- -j1 -o0,1.2,1.3,2.2 - <(awk -f agg.awk f3)

This User Gave Thanks to binlib For This Post:
# 5  
Old 01-11-2012
Try this out

Code:
 
#!/bin/ksh
cat f1 f2 f3 > tmp
_list=$(cat tmp | awk '{print $1}' | sort | uniq)
for i in ${_list}
do
        _inf1=$(cat f1 | grep $i | awk '{print $2}' | awk 'BEGIN{ORS="\t"}{print $0}')
        _inf2=$(cat f2 | grep $i | awk '{print $2}' | awk 'BEGIN{ORS="\t"}{print $0}')
        _inf3=$(cat f3 | grep $i | awk '{print $2}' | awk 'BEGIN{ORS="\t"}{print $0}')
        print "${i}\t${_inf1}${_inf2}${_inf3}"
done

This User Gave Thanks to dmaharana For This Post:
# 6  
Old 01-11-2012
Quote:
Originally Posted by binlib
Code:
$ cat agg.awk
BEGIN { OFS = "\t" }
{
  if ($1 != last) {
    if (NR > 1) print last, a
    last = $1
    a = $2
  } else a = a ", " $2
}
END { print last, a }

$ join -t$'\t' -a1 -a2 -e- -j1 -o0,1.2,2.2 <(awk -f agg.awk f1) <(awk -f agg.awk f2) |
join -t$'\t' -a1 -a2 -e- -j1 -o0,1.2,1.3,2.2 - <(awk -f agg.awk f3)

This code is incredible.
This works exactly what i wanted. Awesome.
Can you please explain the switches you used here especially join. Also if I have variable number of input files then how can I extend this.

* I never did awk programming but this works for me. I was using matlab to do this and is damn slow.
Thanks again.

---------- Post updated at 12:30 AM ---------- Previous update was at 12:28 AM ----------

Quote:
Originally Posted by dmaharana
Try this out

Code:
 
#!/bin/ksh
cat f1 f2 f3 > tmp
_list=$(cat tmp | awk '{print $1}' | sort | uniq)
for i in ${_list}
do
        _inf1=$(cat f1 | grep $i | awk '{print $2}' | awk 'BEGIN{ORS="\t"}{print $0}')
        _inf2=$(cat f2 | grep $i | awk '{print $2}' | awk 'BEGIN{ORS="\t"}{print $0}')
        _inf3=$(cat f3 | grep $i | awk '{print $2}' | awk 'BEGIN{ORS="\t"}{print $0}')
        print "${i}\t${_inf1}${_inf2}${_inf3}"
done

This code is just concatenating one file after another. Did not work for me.
Thanks again
# 7  
Old 01-11-2012
Options to join used here are:
-t$'\t' use tab as input and output field separator
-a1 -a2 include unmatched lines from the first and second file
-e- use a "-" for missing fields
-j1 use the first field as key in joining
-o0,1.2,2.2 output line consists of the join field (0), second field of the first file (1.2), second field of the second file (2.2)
I wish Unix has a built in command to do the task of agg.awk.
Let me know how the code performs (time wise).
This User Gave Thanks to binlib For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Merge two csv files using column name

Hi all, I have two separate csv files(comma delimited) file 1 and file 2. File 1 contains PAN,NAME,Salary AAAAA5467D,Raj,50000 AAFAC5467D,Ram,60000 BDCFA5677D,Kumar,90000 File 2 contains PAN,NAME,Dept,Salary ASDFG6756T,Karthik,ABC,450000 QWERT8765Y,JAX,CDR,780000... (5 Replies)
Discussion started by: Nivas
5 Replies

2. Shell Programming and Scripting

How to merge two or more fields from two different files where there is non matching column?

Hi, Please excuse for often requesting queries and making R&D, I am trying to work out a possibility where i have two files field separated by pipe and another file containing only one field where there is no matching columns, Could you please advise how to merge two files. $more... (3 Replies)
Discussion started by: karthikram
3 Replies

3. Shell Programming and Scripting

Multiple file merge by column

Hello all, I am quite new in linux shell scripting and I have this issue. I ve got some files including measurements taken every 10minutes for a whole day. File name format is: 00.00, 00.10, 00.20,....23.50 File structure is: x | y | temperature x and y is the same in all files (same... (12 Replies)
Discussion started by: atzounis
12 Replies

4. Shell Programming and Scripting

Merge files based on the column value

Hi Friends, I have a file file1.txt 1|ABC|3|jul|dhj 2|NHU|4|kil|eu 3|hjd|34|hfd|43 file2.txt 1||3|KING|dhj 2|NHU||k| 3|hjd|34|hd|43 i want to merge file1.txt file2.txt based on the column null values in file2.txif there are any nulls in column values , (5 Replies)
Discussion started by: i150371485
5 Replies

5. UNIX for Dummies Questions & Answers

How do I merge multiple columns into one column?

Hi all, I'm looking for a way to merge multiple columns (from one file) into a single column in an output file. The file I have looks somewhat like this: @HWI-ST212 1:N:0 AGTCCTACCGGGAGT + @@@DDDDDHHHHHII @HWI-ST212 1:N:0 CGTTTAAAAATTTCT + @;@B;DDDDH?:F;F... (4 Replies)
Discussion started by: Vnguyen
4 Replies

6. Shell Programming and Scripting

Merge two columns from two files into one if another column matches

I have two text files that look something like this: A:B:C 123 D:E:F 234 G:H:I 345 J:K:L 123 M:N:O 456 P:Q:R 567 A:B:C 456 D:E:F 567 G:H:I 678 J:K:L 456 M:N:O 789 P:Q:R 890 I want to find the line where the first column matches and then combine the second columns into a single... (8 Replies)
Discussion started by: pbluescript
8 Replies

7. Shell Programming and Scripting

How to merge multiple rows into single row if first column matches ?

Hi, Can anyone suggest quick way to get desired output? Sample input file content: A 12 9 A -0.3 2.3 B 1.0 -4 C 34 1000 C -111 900 C 99 0.09 Output required: A 12 9 -0.3 2.3 B 1.0 -4 C 34 1000 -111 900 99 0.09 Thanks (3 Replies)
Discussion started by: cbm_000
3 Replies

8. Shell Programming and Scripting

Merge two files with similar column entries

Hi , I have few files which contains user name and data transfer rate in MBs and this data is collected for year and for each month report is saved in 12 different files I have to merge all the files to prepare the final report Files are as below Similarly I have 10 more files ... (5 Replies)
Discussion started by: pratapsingh
5 Replies

9. UNIX for Dummies Questions & Answers

Writing a loop to merge multiple files by common column

I have 100 data files labelled 250.1.txt through 250.100.txt. The second column of the data files partially match (there is about %90 overlap). Each data file has 4 columns. I want the merge all these text files by the matching values in the second column. In the output, the first column should... (1 Reply)
Discussion started by: evelibertine
1 Replies

10. Shell Programming and Scripting

Merge Two Files based on First column

Hi, I need to join two files based on first column of both files.If first column of first file matches with the first column of second file, then the lines should be merged together and go for next line to check. It is something like: File one: 110001 abc efd 110002 fgh dfg 110003 ... (10 Replies)
Discussion started by: apjneeraj
10 Replies
Login or Register to Ask a Question