Sponsored Content
Top Forums Shell Programming and Scripting script to merge two files on an index Post 302639917 by LMHmedchem on Sunday 13th of May 2012 04:33:58 PM
Old 05-13-2012
Thanks for the post. One of the issues I am looking at here is that I use things like this fairly often, but the formats of the files can be very different. They could have hundreds of columns and the header name and location of the index column can vary quire a bit. In most cases, it is easier to use a header name to specify the index column and it's harder to use a simple tokenizer where the number of columns and location of the index is not constant. I am trying to get away from scripts with significant hard coding that needs to be changed.

In c++, I might create a hash table using the index value of the second file ask the key and the remainder of the row as the value. I would read in the second file and store it on the index value and then loop through the first file looking up the file 2 data that goes with each file 1 row. This would also mean that the files wouldn't have to be in the same order. That wouldn't be super fast, but processing files with 150,000+ rows will take some time no matter what the method. I don't really know how to do that sort of thing in a scripting language.

I don't think that something of that sophistication is really necessary, if the rows are not in the same order, another script could be used to sort them. I think that all that is really necessary is to check that the number of rows is the same and then test each row for matching key values, but I would like to have a script that will just take arguments for the filenames and index header name and have the rest work for pretty much any pair of files.

LMHmedchem
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

shell script to merge files

Can anybody help me out with this problem " a shell program that takes one or any number of file names as input; sorts the lines of each file in ascending order and displays the non blank lines of each sorted file and merge them as one combined sorted file. The program generates an error... (1 Reply)
Discussion started by: arya
1 Replies

2. Shell Programming and Scripting

Merge two files in windows using perl script

Hi I want to merge two or more files using perl in windows only(Just like Paste command in Unix script) . How can i do this.Is ther any single command to do this? Thanks Kunal (1 Reply)
Discussion started by: kunal_dixit
1 Replies

3. Filesystems, Disks and Memory

why the inode index of file system starts from 1 unlike array index(0)

why do inode indices starts from 1 unlike array indexes which starts from 0 its a question from "the design of unix operating system" of maurice j bach id be glad if i get to know the answer quickly :) (0 Replies)
Discussion started by: sairamdevotee
0 Replies

4. Shell Programming and Scripting

script needed to merge two files and report differences

Hello, I have two txt files that look like this: db.0.0.0.0: Total number of NS records = 1 db.127.0.0.0: Total number of NS records = 1 Total number of PTR records = 1 db.172.19.0.0: Total number of NS records = 1 Total number of PTR records = 3 db.172.19.59.0: Total... (8 Replies)
Discussion started by: richsark
8 Replies

5. Shell Programming and Scripting

script to merge xml files with options

Hi, I have a very basic knowledge of shell scripting & would like some help with a little problem I have. I sometimes use a program calle phronix & sometimes like to compare its results which are *.xml files. Which is easy enough but a friend wants to avoid typing the path to the files.... (2 Replies)
Discussion started by: ptrbee
2 Replies

6. Shell Programming and Scripting

merge two files via looping script

Hi all, I hope you can help me. I got a file a and a file b File a contains a b c d e f g h File b contains 1 2 3 (8 Replies)
Discussion started by: stinkefisch
8 Replies

7. Shell Programming and Scripting

Sort from start index and end index in line

Hi All, I have a file (FileNames.txt) which contains the following data in it. $ cat FileNames.txt MYFILE17XXX208Sep191307.csv MYFILE19XXX208Sep192124.csv MYFILE20XXX208Sep192418.csv MYFILE22XXX208Sep193234.csv MYFILE21XXX208Sep193018.csv MYFILE24XXX208Sep194053.csv... (5 Replies)
Discussion started by: krish_indus
5 Replies

8. Shell Programming and Scripting

merge two text files of different size on common index

I have two text files. text file 1: ID filePath col1 col2 col3 1 10584588.mol 269.126 190.958 23.237 2 10584549.mol 281.001 200.889 27.7414 3 10584511.mol 408.824 158.316 29.8561 4 10584499.mol 245.632 153.241 25.2815 5 10584459.mol ... (8 Replies)
Discussion started by: LMHmedchem
8 Replies

9. Programming

Merge sort when array starts from zero(0) index???

Hi friends, I have implemented the merge sort algorith in c, before I put forward my question, you please have a look at my code. // The array is sorted, as 1234 #include <stdio.h> #include <stdlib.h> #include <math.h> int A = {4, 3, 2, 1}; void Merge_Sort(int , int, int); void... (0 Replies)
Discussion started by: gabam
0 Replies

10. Shell Programming and Scripting

Merge multiple tab delimited files with index checking

Hello, I have 40 data files where the first three columns are the same (in theory) and the 4th column is different. Here is an example of three files, file 2: A_f0_r179_pred.txt Id Group Name E0 1 V N(,)'1 0.2904 2 V N(,)'2 0.3180 3 V N(,)'3 0.3277 4 V N(,)'4 0.3675 5 V N(,)'5 0.3456 ... (8 Replies)
Discussion started by: LMHmedchem
8 Replies
funtablerowput(3)						SAORD Documentation						 funtablerowput(3)

NAME
FunTableRowPut - put Funtools rows SYNOPSIS
int FunTableRowPut(Fun fun, void *rows, int nev, int idx, char *plist) DESCRIPTION
The FunTableRowPut() routine writes rows to a FITS binary table, taking its input from an array of user structs that contain column values selected by a previous call to FunColumnSelect(). Selected column values are automatically converted from native data format to FITS data format as necessary. The first argument is the Fun handle associated with this row data. The second rows argument is the array of user structs to output. The third nrow argument specifies the number number of rows to write. The routine will write nrow records, starting from the location speci- fied by rows. The fourth idx argument is the index of the first raw input row to write, in the case where rows from the user buffer are being merged with their raw input row counterparts (see below). Note that this idx value is has nothing to do with the row buffer specified in argument 1. It merely matches the row being written with its corresponding (hidden) raw row. Thus, if you read a number of rows, process them, and then write them out all at once starting from the first user row, the value of idx should be 0: Ev ebuf, ev; /* get rows -- let routine allocate the row array */ while( (ebuf = (Ev)FunTableRowGet(fun, NULL, MAXROW, NULL, &got)) ){ /* process all rows */ for(i=0; i<got; i++){ /* point to the i'th row */ ev = ebuf+i; ... } /* write out this batch of rows, starting with the first */ FunTableRowPut(fun2, (char *)ebuf, got, 0, NULL); /* free row data */ if( ebuf ) free(ebuf); } On the other hand, if you write out the rows one at a time (possibly skipping rows), then, when writing the i'th row from the input array of rows, set idx to the value of i: Ev ebuf, ev; /* get rows -- let routine allocate the row array */ while( (ebuf = (Ev)FunTableRowGet(fun, NULL, MAXROW, NULL, &got)) ){ /* process all rows */ for(i=0; i<got; i++){ /* point to the i'th row */ ev = ebuf+i; ... /* write out the current (i.e., i'th) row */ FunTableRowPut(fun2, (char *)ev, 1, i, NULL); } /* free row data */ if( ebuf ) free(ebuf); } The final argument is a param list string that is not currently used. The routine returns the number of rows output. This should be equal to the value passed in the third nrow</B argument. When FunTableRowPut() is first called for a given binary table, Funtools checks to see of the primary header has already been written (either by writing a previous row table or by writing an image.) If not, a dummy primary header is written to the file specifying that an extension should be expected. After this, a binary table header is automatically written containing information about the columns that will populate this table. In addition, if a Funtools reference handle was specified when this table was opened, the parameters from this Funtools reference handle are merged into the new binary table header. In a typical Funtools row loop, you read rows using FunTableRowGet()() and write rows using FunTableRowPut(). The columns written by FunT- ableRowPut()() are those defined as writable by a previous call to FunColumnSelect(). If that call to FunColumnSelect also specified merge=[update|replace|append], then the entire corresponding raw input row record will be merged with the output row according to the merge specification (see FunColumnSelect() above). A call to write rows can either be done once, after all rows in the input batch have been processed, or it can be done (slightly less effi- ciently) one row at a time (or anything in between). We do recommend that you write all rows associated with a given batch of input rows before reading new rows. This is required if you are merging the output rows with the raw input rows (since the raw rows are destroyed with each successive call to get new rows). For example: Ev buf, ev; ... /* get rows -- let routine allocate the row array */ while( (buf = (Ev)FunTableRowGet(fun, NULL, MAXROW, NULL, &got)) ){ /* point to the i'th row */ ev = buf + i; .... process } /* write out this batch of rows */ FunTableRowPut(fun2, buf, got, 0, NULL); /* free row data */ if( buf ) free(buf); } or Ev buf, ev; ... /* get rows -- let routine allocate the row array */ while( (buf = (Ev)FunTableRowGet(fun, NULL, MAXROW, NULL, &got)) ){ /* process all rows */ for(i=0; i<got; i++){ /* point to the i'th row */ ev = buf + i; ... process /* write out this batch of rows with the new column */ if( dowrite ) FunTableRowPut(fun2, buf, 1, i, NULL); } /* free row data */ if( buf ) free(buf); } Note that the difference between these calls is that the first one outputs got rows all at once and therefore passes idx=0 in argument four, so that merging starts at the first raw input row. In the second case, a check it made on each row to see if it needs to be output. If so, the value of idx is passed as the value of the i variable which points to the current row being processed in the batch of input rows. As shown above, successive calls to FunTableRowPut() will write rows sequentially. When you are finished writing all rows in a table, you should call FunFlush() to write out the FITS binary table padding. However, this is not necessary if you subsequently call FunClose() with- out doing any other I/O to the FITS file. Note that FunTableRowPut() also can be called as FunEventsPut(), for backward compatibility. SEE ALSO
See funtools(7) for a list of Funtools help pages version 1.4.2 January 2, 2008 funtablerowput(3)
All times are GMT -4. The time now is 10:45 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy