Combine a datafile with Master datafile, emergent!


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Combine a datafile with Master datafile, emergent!
# 1  
Old 05-10-2007
Combine a datafile with Master datafile, emergent!

Hi guys, my supervisor has asked me to solve the problem in 7 days, I've taken 3 days to think about it but couldn't figure out any idea.
Please give me some thoughts with the following problem,

I have index.database that has only index date:
1994
1995
1996
1997
1998
1999

I have small.database.csv that contains data for some of the indexed dates but not all of them:

1995, california, A3,B6
1999, vermont, A4,B9

I want to match the small.database.csv into index.database into a combined.database.csv so it would look like:

1994,,,
1995, california, A3,B6
1996,,,
1997,,,
1998,,,
1999, vermont, A4,B9

shell scripts or perl would both be fine

Thanks a lot.
My supervisor is after me on this one.
# 2  
Old 05-10-2007
Try...
Code:
$ head file?
==> file1 <==
1994
1995
1996
1997
1998
1999

==> file2 <==
1995, california, A3,B6
1999, vermont, A4,B9
$ join -t , -a 1 -o 1.1,2.2,2.3,2.4 file1 file2
1994,,,
1995, california, A3,B6
1996,,,
1997,,,
1998,,,
1999, vermont, A4,B9
$

# 3  
Old 05-11-2007
It didn't work!
The join command requires two files to be sorted according the index field.
What I have as index field is a date
07/08/1998
Join can't figure it out on its own., all it sees is 07

Please help.
# 4  
Old 05-11-2007
If you can use Python, here's an alternative:
Code:
#!/usr/bin/python
flag=0
for line in open("file1"):
    line = line.strip()
    for line2 in open("file2"):
        if line2.split(",")[0] == line:
            print line2.strip()
            flag=1
    if flag: 
        flag = 0
        continue
    else: print "%s,,," % line

output:
Code:
# ./test.py
1994,,,
1995, california, A3,B6
1996,,,
1997,,,
1998,,,
1999, vermont, A4,B9

# 5  
Old 05-11-2007
Thanks a lot ghostdog74. It works!
But it's really slow for large data files.
Join is surprisingly much faster in managing large files, only join couldn't work in this case.
# 6  
Old 05-12-2007
Quote:
Originally Posted by onthetopo
Thanks a lot ghostdog74. It works!
But it's really slow for large data files.
Join is surprisingly much faster in managing large files, only join couldn't work in this case.
Code:
#! /opt/third-party/bin/perl

open(FILE, "<", "small") || die "Unable to open file small <$!>\n";

while(<FILE>) {
  chomp;
  $fileHash{$_} = $i++;
}

close(FILE);

open(FILE, "<", "index") || die "Unable to open file index <$!>\n";

while(<FILE>) {
  chomp;
  $set = 0;
  foreach my $v ( sort keys %fileHash ) {
    if ( $v =~ m/^$_/ ) {
      print $v . "\n";
      $set = 1;
      last;
    }
  }
  print "$_,,,\n" if ( $set == 0 );
}

close(FILE);

exit 0

This should be fast !
# 7  
Old 05-12-2007
how about this:
Code:
 awk -F "," 'NR==FNR { 
			for(i=2;i<=NF;i++) a=a","$i 
			arr[$1]=a ; a="";next
		     }
		     {
		        length(arr[$1]) <=0 ? s = $1",,," : s = $1 "" arr[$1]
			print s	
		     } ' file2 file1

output:
Code:
./test.sh
1994,,,
1995, california, A3,B6
1996,,,
1997,,,
1998,,,
1999, vermont, A4,B9

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Read in numbers from a datafile

Hi, I want to be able to read numbers from many files which have the same general form as follows: C3H8 4.032258004031807E-002 Phi = 1.000000E+00 Tau = 5.749E+00 sL0 = 3.805542E+01 dL0 = 1.514926E-02 Tb = 2.328291E+03 Tu = 3.450E+02 Alpha = ... (3 Replies)
Discussion started by: lost.identity
3 Replies

2. Shell Programming and Scripting

Validating a datafile with the datatypes

I have two input files 1)datafile 2)metadata file. I have a metadata file like: field1datatypeformat1number2string3dateyy-mm-dd I have a data file like: 1234abc12-8-16 xyz234512-9-163456acd14-08-12 In the first row there is no correction as everything is inline with the metadata.... (3 Replies)
Discussion started by: bikky6
3 Replies

3. Shell Programming and Scripting

Help with datafile parsing and creating spreadsheet

I have a datafile containing data in the following format name1,employee_number1,cell1,home1,fax1 name2,employee_number2,cell2,home2,fax2 name3,employee_number3,cell3,home3,fax3 name4,employee_number4,cell4,home4,fax4 name5,employee_number5,cell5,home5,fax5 ... ... .... I would like... (6 Replies)
Discussion started by: inditopgun
6 Replies

4. Shell Programming and Scripting

sorting the datafile in an order given in second datafile

Hi, I have two files: first input file is having 7-8 columns, and second data file is like I want to arrange my datafile1 in the order given in second data file, by comparing the seconddatafile with the second column of first file and print the entire line....also if any... (2 Replies)
Discussion started by: CAch
2 Replies

5. Shell Programming and Scripting

Reversing numbers in a datafile of rows and columns

Hello, I've tried searching the forum for an answer to my question, but without any luck... I have a datafile looking simplified as follows: 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 I want to reverse it by rearranging all the numbers from last to... (16 Replies)
Discussion started by: mattings
16 Replies

6. UNIX for Advanced & Expert Users

How do we know which processis creating a datafile

Hi, Is there any way we can find out which process is creating a partucular datafile.I know the user and group but i am just curios to know is there any way to find the process. Thanks (7 Replies)
Discussion started by: ukatru
7 Replies

7. UNIX for Dummies Questions & Answers

Append filename to datafile

I am working on an shell script which checks for all the file starting with abc*.* and if file found then the filelines need to append the file name in begining can some one help with the filename appending... for i in `ls $filename*.csv` do echo $i --- NEED to append file name befor... (3 Replies)
Discussion started by: Satyagiri
3 Replies

8. Shell Programming and Scripting

selective positions from a datafile

Hi dear friends, Im writing a shell script which has to select the strings based on the position. but the problem is there is no field seperator. Normally a datafile contains 2000 records (lines) and each line is of size 500 charecters. I want to select the fields from all the lines which... (10 Replies)
Discussion started by: ganapati
10 Replies

9. Solaris

oracle datafile *dbf

Hi ,,,, I have move an oracle db from old server to a new server ( solaris 5.9 is the operating system ) my problem is that to new server the datafile ( *.dbf ) are in a different path ..... example old : /export/home/data/blobs ........... new /oracle/data/blobs....... how i can... (3 Replies)
Discussion started by: tt155
3 Replies

10. Shell Programming and Scripting

replace one section in a datafile

Hi: First, this is not a homework problem. I just need enough of a hint to get this going... My datafile (dataf.in) is made up of 10 sections. Each section begins with & and with && So it looks like this:------------------------------------- &section1 ...etc... && &section2 ...etc...... (4 Replies)
Discussion started by: Paprika
4 Replies
Login or Register to Ask a Question