Merge 70 files into one data matrix


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Merge 70 files into one data matrix
# 1  
Old 11-13-2008
Merge 70 files into one data matrix

Hi,
I have a list of 70 files in a directory and I need to merge the content of each file into one big matrix file (71 columns x 3060 rows).

Each file has the following format only two columns per file:
Code:
unique    identifier1
randomtext1    randomtext1
a    5
b    3
c    6
d    3
e    2
f    9

Things that need to consider:
1) The first column, 'unique' is a unique field among each file, it's the second column that is different and what I need merged into one data matrix.

2) The randomtext line (second line of each file) I wish I could ignore in my final file...

3) Note, not all files are sorted as the example I posted above.

Final merged file, should be
Code:
unique    identifier1    indentifer2 . . . . . 
a    5    4 . . . . . .
b    3    2 . . . . . .
c    6    6 . . . . . .
d    3    7 . . . . . .
e    2    9 . . . . . .
f    9    4 . . . . . .

Obviously, I need 71 columns in the final file, with the first column as the 'unique' field and the other 70 columns to be the associated value from the 70 files in the common directory.

Can someone help me? Thanks in advance.
# 2  
Old 11-13-2008
Code:
a 1
b 2
c 3

Code:
a 10
b 20
c 30

Code:
join f1 f2

Code:
a 1 10
b 2 20
c 3 30

# 3  
Old 11-13-2008
Thank you for the reply but if I do it that way, it would only join 2 files, how can i do it for all 70 files in one directory?
# 4  
Old 11-13-2008
I just tried that, and I forgot to mention that I would like the output to be tab deliminated.
# 5  
Old 11-14-2008
I think you've been around here long enough to have had a go at this yourself, labrazil. :-)

Try using awk to build an array indexed by the first field. As you process each file you can just append the new values to the existing values stored in the array. Then you can just print out the entire array in your END { } clause. I'm presuming that the same number of unique indices appears in all of the files, otherwise it woudl become more complicated because you'd have to cater for the ones that were missing.
# 6  
Old 11-14-2008
Quote:
Originally Posted by Annihilannic
I think you've been around here long enough to have had a go at this yourself, labrazil. :-)

Try using awk to build an array indexed by the first field. As you process each file you can just append the new values to the existing values stored in the array. Then you can just print out the entire array in your END { } clause. I'm presuming that the same number of unique indices appears in all of the files, otherwise it woudl become more complicated because you'd have to cater for the ones that were missing.
Thanks Anni,

I came across this code
Code:
awk -v OFS='\t' 'NR == FNR {
   _[$1] = $2
 next
}
$1 in _ {
$(NF+1) = _[$1] 
}
 1' 1.txt 2.txt > 3.txt

But two problems with this.
how can I incorporate it to run through all 70 files at once without doing it one by one, and two, how can remove the second line from each file as I'm merging them?

Any ideas?
# 7  
Old 11-14-2008
That solution will be no good for your problem (unless you ran it 70 times!) because the additional field is "forgotten" as each line is processed.

Try implementing what I described rather than just searching for code.

You can avoid processing of each file the second line using FNR != 2 { ... }.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Merge and Sort tabular data from different text files

I have 42 text files; each containing up to 34 lines with following structure; file1 H-01 23 H-03 5 H-05 9 H-02 14 . . file2 H-01 17 H-02 43 H-04 7 H-05 8 H-03 7 . . file3 (6 Replies)
Discussion started by: Syeda Sumayya
6 Replies

2. Shell Programming and Scripting

Merge files and copy some data using sed or awk

Copy data from other file to paste cat file1: Name: server1.data.com data1 server1 running Name: server3.data.com data3 server3 running cat file2: server1 good server2 bad network not ok server3 good Output: (10 Replies)
Discussion started by: kenshinhimura
10 Replies

3. Shell Programming and Scripting

How to merge the multiple data files as a single file?

Hi Experts, I have created multiple scripts and send the output to new file, getting this output to my mailbox on daily basis. I would like to send the all outputs to a single file, need to merge all file outputs on a single file. For example, Created script for df -h > df.doc grep... (7 Replies)
Discussion started by: seenuvasan1985
7 Replies

4. UNIX for Dummies Questions & Answers

Need help combining txt files w/ multiple lines into csv single cell - also need data merge

:confused:Hello -- i just joined the forums. I am a complete noob -- only about 1 week into learning how to program anything... and starting with linux. I am working in Linux terminal. I have a folder with a bunch of txt files. Each file has several lines of html code. I want to combine... (2 Replies)
Discussion started by: jetsetter
2 Replies

5. Shell Programming and Scripting

AWK to match and merge data from 2 files into 1.

Hello, hopefully this is an easy on for the AWK guru's out there. I'm having some trouble figuring out how to match+merge data in 2 files into 1 single report. I've got my 2 files filtered and delimited, just need to MATCH $3 in file1 to $1 in file2, then put $0 from File1 and $2+$3 from File2... (6 Replies)
Discussion started by: right_coaster
6 Replies

6. Shell Programming and Scripting

Merge multiple tables into big matrix

Hi all, I have a complex (beyond my biological expertise) problem at hand. I need to merge multiple files into 1 big matrix. Please help me with some code. Inp1 Ang_0 chr1 98 T A Ang_0 chr1 352 G A Ang_0 chr1 425 C T Ang_0 chr2 ... (1 Reply)
Discussion started by: newbie83
1 Replies

7. Shell Programming and Scripting

Reformatting data in matrix form

Hi, Some assistance with respect to the following problem will be very helpful. I want to reformat my dataset in the following manner for subsequent analysis. I have first column values (which repeat for each value of 2nd column) which are names, the second column specifies position ad the... (1 Reply)
Discussion started by: newbie83
1 Replies

8. Ubuntu

How to convert full data matrix to linearised left data matrix?

Hi all, Is there a way to convert full data matrix to linearised left data matrix? e.g full data matrix Bh1 Bh2 Bh3 Bh4 Bh5 Bh6 Bh7 Bh1 0 0.241058 0.236129 0.244397 0.237479 0.240767 0.245245 Bh2 0.241058 0 0.240594 0.241931 0.241975 ... (8 Replies)
Discussion started by: evoll
8 Replies

9. Shell Programming and Scripting

two-column data to matrix in AWK

Howdy, I need to convert an association data matrix, currently in a two-column format, into a matrix with numbers indicating the number of associations. I've been looking around for AWK code in the list, but could not find anything. Here's an example of what I want to perform: original... (10 Replies)
Discussion started by: sramirez
10 Replies

10. Shell Programming and Scripting

extract data from a data matrix with filter criteria

Here is what old matrix look like, IDs X1 X2 Y1 Y2 10914061 -0.364613333 -0.362922333 0.001691 -0.450094667 10855062 0.845956333 0.860396667 0.014440333 1.483899333... (7 Replies)
Discussion started by: ssshen
7 Replies
Login or Register to Ask a Question