Putting together all values from different files in one file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Putting together all values from different files in one file
# 1  
Old 08-16-2011
Putting together all values from different files in one file

Hi All,

This is what I am trying to achieve but to no avail.

I have three sets of files which are:

1. One big dictionary file which looks like this:

Code:
apple
orange
computer
pear
country

2. Some thousands of text files which are named as 1.dat, 2.dat, 3.dat etc

The text files look like this (assume this to be 1.dat):

Code:
apple
computer
country

3. Another set of files (with extension .num and same in number as dat files above) but instead of words have some numbers. These numbers are some values corresponding to the words in the dat files. This means that for the above 1.dat, my 1.num would look like this:

Code:
0.33
2.3
0.84

Same thing goes for 2.dat and 3.dat and rest. Hence, 2.dat has a 2.num, 3.dat has a 3.num and so on.

Now, I want to bring everything together in one file so that I may see the values all at once rather than opening several files.

This is what I wish to achieve:
Code:
1 1 0.33
3 1 2.3
5 1 0.84

The above output says that: apple's position is 1 (the first one) in the dictionary file and it is from 1.dat (the second one, I have removed .dat) and its value is 0.33 (obtained from 1.num file)
Similarly, computer is at position 3 in the dictionary file and it is from 1.dat (that is the first document) and its value is 2.3 from the 1.num file. country is the 5th word in the dictionary file and it from 1.dat and its value if 0.84 in 1.num and same thing goes on for 2.dat and 2.num, 3.dat and 3.num and so on.

I have a code for this but it is able to do what I wish to achieve. What I am doing is that creating a huge matrix from the file and then converting it to my format. But when I create the matrix the entire memory blows off and my computer hangs. So, I want to bypass this matrix creation step.

Code to create matrix:
Code:
awk 'NR==FNR{
       A[$1]=NR
       next
     }
     !n{n=NR}
     FNR==1{
       ++m
       close(f)
       f=FILENAME
       sub(/\.dat/,x,f)
       k=f
       f=f".num"
     }
     {
       getline v<f
       B[A[$1],k]=v
     }
     END{
       for(i=1;i<=n;i++){
         for(j=1;j<=m;j++)printf "%s ",B[i,j]?B[i,j]:0
         print x
       }
     }' dictionary *.num

and then I convert it to my format like this:

Code:
awk -v nc=6000 -v nr=60000 '
{ for (col=1; col<=NF; col++) matrix[NR,col] = $col }  
END { 
    for (col=1; col<=nr; col++)
      for (row=1; row<=nc; row++)
        if (matrix[row,col])
          print row, col, matrix[row, col]
}
' matrix.mtx

where matrix.mtx is my huge matrix file.

I am using BASH on Linux.
# 2  
Old 08-16-2011
You can avoid loading "X.dat" and "X.num" in arrays with the below code:
Code:
#!/bin/ksh
typeset -i mCnt=1
while [[ ${mCnt} -le Number_Files ]]; do
  echo "Now working on <${mCnt}> files:"
  paste -d' ' ${mCnt}.dat ${mCnt}.num > Tmp_Dat_Num

  <insert the dictionary look up code here>

  mCnt=${mCnt}+1
done

This will make it easier for you.
This User Gave Thanks to Shell_Life For This Post:
# 3  
Old 08-16-2011
This is very easy with perl. And it should be easy with awk if you have only 1.common, 2.common, 3.common, etc. files with, for example, such format (numbers as the first field allows to have words with spaces):
Code:
0.33 apple
2.3 computer
0.84 country

It is not hard to paste all your files:
Code:
for num in `seq 1 $max`; do
  paste $num.num $num.dat >$num.common
done

Then:
Code:
awk -F'\t' '
  NR == FNR { a[$1]=NR }
  NR != FNR {
    sub(".common", "", FILENAME)
    print a[$2], FILENAME,  $1
  }
' dictionary *.common
1 1 0.33
3 1 2.3
5 1 0.84

This User Gave Thanks to yazu For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need help in sorting the file and putting to a different folder

Hi, I am new to unix. Kindly help me on this. My requirement is as below: I have a path temp/input , temp/CL and temp/CM I have files in temp/input as below (dates in YYYYMMDDHHMISS) NMP1515O.CL.20181026111213 NMP1515O.CM.20181025111213 ... (4 Replies)
Discussion started by: Shanmugapriya D
4 Replies

2. Shell Programming and Scripting

Copy files based on specific word in a file name & its extension and putting it in required location

Hello All, Since i'm relatively new in shell script need your guidance. I'm copying files manually based on a specific word in a file name and its extension and then moving it into some destination folder. so if filename contains hyr word and it has .md and .db extension; it will move to TUM/HYR... (13 Replies)
Discussion started by: prajaktaraut
13 Replies

3. Shell Programming and Scripting

Putting values into order in a field using awk

Hi, I am using UBUNTU 12.04. I have a dataset as follows: Column#1 Column#2 Column#3 .... Column#50 1 154878 1 145145 2 189565 2 454121 ... (5 Replies)
Discussion started by: Homa
5 Replies

4. Shell Programming and Scripting

Compare values in two files. For matching rows print corresponding values from File 1 in File2.

- I have two files (File 1 and File 2) and the contents of the files are mentioned below. - I am trying to compare the values of Column1 of File1 with Column1 of File2. If a match is found, print the corresponding value from Column2 of File1 in Column5 of File2. - I tried to modify and use... (10 Replies)
Discussion started by: Santoshbn
10 Replies

5. Shell Programming and Scripting

joining multiple files into one while putting the filename in the file

Hello, I know how to join multiple files using the cat function. I want to do something a little more advanced. Basically I want to put the filename in the first column... One thing to note is that the file is tab delimited. e.g. file1.txt joe 1 4 5 6 7 3 manny 2 3 4 5 6 7 ... (4 Replies)
Discussion started by: phil_heath
4 Replies

6. Shell Programming and Scripting

How to redirect the output to multiple files without putting on console

How to redirect the output to multiple files without putting on console I tried tee but it writes to STDOUT , which I do not want. Test.sh ------------------ #!/bin/ksh echo "Hello " tee -a file1 file2 ---------------------------- $>./Test.sh $> Expected output: -------------------... (2 Replies)
Discussion started by: prashant43
2 Replies

7. Shell Programming and Scripting

Separating values from a file and putting to a variable

I am writing into a file testfile.txt values like ./XXXXXXCZ1/tprcm10c.bin ./XXXXXXCZ1_HOT/tprcm09c.bin ./XXXXXXCZ_cold/tprcm05c.bin I want to store the values of tprcm*.bin and XXXXXXCZ* in separate variables Can anybody Pls hlp me out with this ... Thanks (2 Replies)
Discussion started by: ultimatix
2 Replies

8. Shell Programming and Scripting

Making file with values from different files

Hello, I am stuck up in middle of a script.Pls have a look at the problem and help me in any way out for the same. There are n number of files with n number of contents in a column. for example : file1 has contents in quotes "abcd" "1234-asbcd" "12312"..... file2 has contents in... (4 Replies)
Discussion started by: er_aparna
4 Replies

9. UNIX for Dummies Questions & Answers

putting a timestamp in a file

I was sure there was a way to put a timestamp ina logfile but I can't seem to figure out how. What I would like to do is after the last messages in the rptmgr.err log is put a timestamp so I know the next time I look whats new. I am using AIX 5.1 any help will great Thanks (2 Replies)
Discussion started by: rocker40
2 Replies

10. UNIX for Dummies Questions & Answers

putting restrictions when copying file

I would like to create a command to copy a file with a restriction- if the file exists at the copy destination, the copy does not occur and message is provided that file already exists. (3 Replies)
Discussion started by: thoffpauir
3 Replies
Login or Register to Ask a Question