Creating a matrix from files.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Creating a matrix from files.
# 1  
Old 01-31-2011
Creating a matrix from files.

I need to create a large matrix so that I can feed that matrix to MATLAB for processing. The problem is creating that matrix because my data is completely scattered around files.

1. I have one big dictionary file which has words in newlines, like
Code:
apple
orange
pineapple

2. I have some 400 files (all with *.txt) extension which contain words in newlines like:
Code:
apple
computer
orange
glass

3. I have another file 400 files (all with *.dat extension) which have numbers in them. These numbers correspond to the values of the words stored in each file with *.txt extension. This means a file named 1.txt has a corresponding file 1.dat. First word from 1.txt has a value stored in first line of 1.dat. This means if 1.txt has
Code:
apple
orange

and 1.dat has
Code:
2
3

The value of apple is 2 and orange is 3.

The Task: I need to search apple in the dictionary file and place 2 in front of apple and orange in dictionary and place 3 in front of orange and rest all 0's and then store the result in the matrix file. This way I create columns.

This means my first column are the words of file from 1.txt (and numbers from 1.dat), my second column are the words from 2.txt and numbers from 2.dat and so on for 400 files. hence last column is the 400th file. Hence there are 400 columns and number of words in the dictionary all form the number of rows.
The file matrix should look like this (a small example consisting of 3 files and 6 words in the dictionary):

Code:
2 4 3
5 5 0
0 6 0
0 7 0
0 0 0
1 0 1

Frankly, I have no idea how to do it with awk or sed.
# 2  
Old 01-31-2011
Try something like this:

First read the contents of the dictionary file into a 1-dimensional array and mark its position within the dictionary. This will be the row number in the final 2-dimensional array. Then for every txt file for every line read the corresponding value. Use this value to lookup the row position in the the 1 dimensional array. The column number is the name of the file without .txt. Read the numerical value from the corresponding .dat file and store that value into a 2-dimensional array with the row and column that was calculated.

After the array is filled, enumerate the array and print the value, if no value is set, print a 0.
This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 01-31-2011
If Perl is ok you can try:
Code:
#!/usr/bin/perl

$numFiles=400;

open(DIC,"<",shift) || die;
foreach ( <DIC> ) {
  if ( $_ !~ /^\s*$/  ) {
     chomp;
     $cont++;
     $dic[$cont]=$_ ;
     }
  }
close(DIC);

for $fic ( 1 .. $numFiles  ) {
   open(DAT,"<",$fic .".dat") || die;
   @dat=<DAT>;
   close(DAT);

   open(TXT,"<",$fic.".txt") || die;
   $cont=1;
   foreach ( <TXT> ) {
      chomp;
      $txt{$_}[$fic]=$dat[($cont-1)] if ( $_ );
      $cont++;
      }
   close (TXT);
   }   

for $contDic (1 .. $#dic) {
   for (1 .. $numFiles) {
      $v=$txt{$dic[$contDic]}[$_]+0;
      print $v." ";
      }
   print "\n";
   }

Usage:
Code:
script dicFile

This User Gave Thanks to Klashxx For This Post:
# 4  
Old 01-31-2011
Or awk:
Code:
awk 'NR==FNR{
       A[$1]=NR
       next
     }
     !n{n=NR}
     FNR==1{
       ++m
       close(f)
       f=FILENAME
       sub(/\.txt/,x,f)
       k=f
       f=f".dat"
     }
     {
       getline v<f
       B[A[$1],k]=v
     }
     END{
       for(i=1;i<=n;i++){
         for(j=1;j<=m;j++)printf "%s ",B[i,j]?B[i,j]:0
         print x
       }
     }' dictionary *.txt

This User Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Creating a matrix out of a longitudinal data set

Hi I do have a tab delimited file with 2 columns, which is stratified based on the first column. There are 1000's of values in the file. Below is an example of the input file 1 AB 1 AC 1 CC 1 DD 2 AB 2 CC 2 AC 2 AB 3 CF 3 CC 3 DD 4 AC 4 CC 4 AD (5 Replies)
Discussion started by: Kanja
5 Replies

2. Programming

C++: Creating Matrix template using vector

I want to create a Matrix template that uses vector. For the time being I want to create the following operations. I need setting the implementation for the operations. Maybe I do not have to use a pointer either. template <class T> class Matrix { protected: typedef vector<T>* ... (2 Replies)
Discussion started by: kristinu
2 Replies

3. Shell Programming and Scripting

Creating matrix from folders and subfolders

Hello, Greetings! please help me produce the following solution. I need to produce one big matrix file from several files in different levels. If it helps, the index folder provides information on chromosome index and the data folder provides information on values for chromosomes. there... (8 Replies)
Discussion started by: newbie83
8 Replies

4. Shell Programming and Scripting

Perl- creating a matrix from a 3 column file

Dear all, I'm new in perl scripting and I'm trying to creating a matrix from a 3 column file sorting data in a particular manner. In the final matrix I need to have the first column "IDs" on the header of the columns and the second column values on the header of each row. And the value fo the... (2 Replies)
Discussion started by: gabrysfe
2 Replies

5. Shell Programming and Scripting

Creating Matrix from file

Hi all, I'm a newbie in shell scripting and currently I'm trying to create a matrix using bash. The Output will look like this AB CDE FG 1 2 3 4 5 6 7 I'm stuck on the ABCDEFG display. printFlightSeats() { rows=7 columns=7 for ((i=0;i<=$rows;i++)) do (2 Replies)
Discussion started by: vinzping
2 Replies

6. UNIX for Dummies Questions & Answers

BASH - Creating a Matrix

I'm trying to create a Matrix using bash. The expected output is .AB CDE FG 1 2 3 4 5 6 7 I'm a newbie in shell language, really appreciate if there is anyone who can guide me with this. Double post again, continued here (0 Replies)
Discussion started by: vinzping
0 Replies

7. Ubuntu

Creating Matrix

Hi all, I'm a newbie in shell scripting and currently I'm trying to create a matrix using bash. The Output will look like this AB CDE FG 1 2 3 4 5 6 7 I'm stuck on the ABCDEFG display. printFlightSeats() { rows=7 columns=7 for ((i=0;i<=$rows;i++)) do (0 Replies)
Discussion started by: vinzping
0 Replies

8. UNIX for Dummies Questions & Answers

Matrix multiplication with different files

Hi, i have file1 which looks like: x1 y1 z1 x2 y2 z2 ...(and so on) and file2 which looks like: a11 a12 a13 a21 a22 a23 a31 a32 a33 and i want to replace file1 with the following values: x1' y1' z1' x2' y2' z2' ...(and so on) (2 Replies)
Discussion started by: ezitoc
2 Replies

9. Shell Programming and Scripting

Matrix Operations of two files

Hi , I have two files aaa.txt (which contains) 1 2 3 4 5 6 7 8 9 10 11 12 and bbb.txt (which contains) -1 -2 -3 -4 -5 -6 5 -8 0 3 0 0 the output that I intended to have is 0 0 0 0 0 0 6 0 4.5 6.5 5.5 6 i.e. Averaging the script is in the file abc Begin{START of the... (2 Replies)
Discussion started by: narendra_linux
2 Replies

10. Shell Programming and Scripting

Merge 70 files into one data matrix

Hi, I have a list of 70 files in a directory and I need to merge the content of each file into one big matrix file (71 columns x 3060 rows). Each file has the following format only two columns per file: unique identifier1 randomtext1 randomtext1 a 5 b 3 c 6 d 3 e 2... (11 Replies)
Discussion started by: labrazil
11 Replies
Login or Register to Ask a Question