10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Hi All,
I have a.dat file with content
1,338,30253395122015103,2015103,UB0085000,STMT151117055527002,,,
1,338,30253395122015103,2015103,UB0085000,STMT151117055527001,,,
and b.dat having content
1,STMT151117055527001,a1.txt,b1.txt,c1.txt
1,STMT151117055527002,a2.txt,b2.txt,c2.txt
... (13 Replies)
Discussion started by: PRAMOD 96
13 Replies
2. UNIX for Dummies Questions & Answers
Hi,
Below is my requirement
file1
id|cnt
1|1
2|2
3|3
file2
id_1|cnt_1
1|1
2|1
3|1
I want to compare cnt and cnt_1 columns, if they are differ then give the details
Am using below awk command, but the output is not as expected. (2 Replies)
Discussion started by: grandhirahuletl
2 Replies
3. Shell Programming and Scripting
Hi all, I'm pretty much a newbie to UNIX. I would appreciate any help with UNIX coding on comparing two large csv files (greater than 10 GB in size), and output a file with matching columns.
I want to compare file1 and file2 by 'id' and 'chain' columns, then extract exact matching rows'... (5 Replies)
Discussion started by: bkane3
5 Replies
4. Shell Programming and Scripting
Hi,
I have two files like this:
8 1.3
10 1.3
12 1.3
15 1.3
21 1.3
and
1
2
3
4
10
11
15
16
21
22 (3 Replies)
Discussion started by: jamie_123
3 Replies
5. Shell Programming and Scripting
Hi
I have file 1 like this
and file 2 like this
I need to compare column 3 of both files and delete lines in file1 with same column 3 values in two files. So the output is
I tried with perl but didnt work. A perl code will be good as I am learning the language, but any other code would... (1 Reply)
Discussion started by: polsum
1 Replies
6. UNIX for Dummies Questions & Answers
Hi all,
I would like to compare a column in one file to a column in another file and when there is a match it prints the first column and the corresponding second column. Example
File1
ABA
ABC
ABE
ABF
File 2
ABA 123
ABB 124
ABD 125
ABC 126
So what I would like printed to a file... (0 Replies)
Discussion started by: pcg
0 Replies
7. Shell Programming and Scripting
I have a control file which tells me which are the fields in the files I need to compare and based on the values I need to print the exact value if key =Y and output is Y , or if output is Y/N then I need to print only Y if it matches or N if it does not match and if output =N , then skip the feild... (7 Replies)
Discussion started by: newtoawk
7 Replies
8. Shell Programming and Scripting
Hiiiii friends
I have 2 files which contains huge data & few lines of it are as shown below
File1: b.dat(which has 21 columns)
SSR 1976 8 12 13 10 44.00 39.0700 70.7800 7.0 0 0.00 0 2.78 0.00 0.00 0 0.00 2.78 0 NULL
ISC 1976 8 12 22 32 37.39 36.2942 70.7338... (6 Replies)
Discussion started by: reva
6 Replies
9. Shell Programming and Scripting
Hello all,
Could someone please let me know shell script or awk solution to compare two columns in two files? Here is the sample -
file1.txt
abc/xyz,M1234
ddd/lyg,M2345
cnn/tnt,G0123
file2.txt
A,abc/xyz,kk,dd,zz,DCT,G0123,1
A,ddd/lyg,kk,dd,zz,DCT,M1234,1... (17 Replies)
Discussion started by: sncoupons
17 Replies
10. Shell Programming and Scripting
My Friends,
Need your help to find the difference between few columns from two comma delimited files. For example, File1 and File2 has 22 columns, and I want to find the difference in first 12 columns.
I have list of file names in MyListOfFiles2Compare.txt. Data is separated with commas.... (5 Replies)
Discussion started by: manish44
5 Replies
funjoin(1) SAORD Documentation funjoin(1)
NAME
funjoin - join two or more FITS binary tables on specified columns
SYNOPSIS
funjoin [switches] <ifile1> <ifile2> ... <ifilen> <ofile>
OPTIONS
-a cols # columns to activate in all files
-a1 cols ... an cols # columns to activate in each file
-b 'c1:bvl,c2:bv2' # blank values for common columns in all files
-bn 'c1:bv1,c2:bv2' # blank values for columns in specific files
-j col # column to join in all files
-j1 col ... jn col # column to join in each file
-m min # min matches to output a row
-M max # max matches to output a row
-s # add 'jfiles' status column
-S col # add col as status column
-t tol # tolerance for joining numeric cols [2 files only]
DESCRIPTION
funjoin joins rows from two or more (up to 32) FITS Binary Table files, based on the values of specified join columns in each file. NB: the
join columns must have an index file associated with it. These files are generated using the funindex program.
The first argument to the program specifies the first input FITS table or raw event file. If "stdin" is specified, data are read from the
standard input. Subsequent arguments specify additional event files and tables to join. The last argument is the output FITS file.
NB: Do not use Funtools Bracket Notation to specify FITS extensions and row filters when running funjoin or you will get wrong results.
Rows are accessed and joined using the index files directly, and this bypasses all filtering.
The join columns are specified using the -j col switch (which specifies a column name to use for all files) or with -j1 col1, -j2 col2,
... -jn coln switches (which specify a column name to use for each file). A join column must be specified for each file. If both -j col
and -jn coln are specified for a given file, then the latter is used. Join columns must either be of type string or type numeric; it is
illegal to mix numeric and string columns in a given join. For example, to join three files using the same key column for each file, use:
funjoin -j key in1.fits in2.fits in3.fits out.fits
A different key can be specified for the third file in this way:
funjoin -j key -j3 otherkey in1.fits in2.fits in3.fits out.fits
The -a "cols" switch (and -a1 "col1", -a2 "cols2" counterparts) can be used to specify columns to activate (i.e. write to the output
file) for each input file. By default, all columns are output.
If two or more columns from separate files have the same name, the second (and subsequent) columns are renamed to have an underscore and a
numeric value appended.
The -m min and -M max switches specify the minimum and maximum number of joins required to write out a row. The default minimum is 0
joins (i.e. all rows are written out) and the default maximum is 63 (the maximum number of possible joins with a limit of 32 input files).
For example, to write out only those rows in which exactly two files have columns that match (i.e. one join):
funjoin -j key -m 1 -M 1 in1.fits in2.fits in3.fits ... out.fits
A given row can have the requisite number of joins without all of the files being joined (e.g. three files are being joined but only two
have a given join key value). In this case, all of the columns of the non-joined file are written out, by default, using blanks (zeros or
NULLs). The -b c1:bv1,c2:bv2 and -b1 'c1:bv1,c2:bv2' -b2 'c1:bv1,c2 - bv2' ... switches can be used to set the blank value for columns
common to all files and/or columns in a specified file, respectively. Each blank value string contains a comma-separated list of col-
umn:blank_val specifiers. For floating point values (single or double), a case-insensitive string value of "nan" means that the IEEE NaN
(not-a-number) should be used. Thus, for example:
funjoin -b "AKEY:???" -b1 "A:-1" -b3 "G:NaN,E:-1,F:-100" ...
means that a non-joined AKEY column in any file will contain the string "???", the non-joined A column of file 1 will contain a value of
-1, the non-joined G column of file 3 will contain IEEE NaNs, while the non-joined E and F columns of the same file will contain values
-1 and -100, respectively. Of course, where common and specific blank values are specified for the same column, the specific blank value
is used.
To distinguish which files are non-blank components of a given row, the -s (status) switch can be used to add a bitmask column named
"JFILES" to the output file. In this column, a bit is set for each non-blank file composing the given row, with bit 0 corresponds to the
first file, bit 1 to the second file, and so on. The file names themselves are stored in the FITS header as parameters named JFILE1,
JFILE2, etc. The -S col switch allows you to change the name of the status column from the default "JFILES".
A join between rows is the Cartesian product of all rows in one file having a given join column value with all rows in a second file having
the same value for its join column and so on. Thus, if file1 has 2 rows with join column value 100, file2 has 3 rows with the same value,
and file3 has 4 rows, then the join results in 2*3*4=24 rows being output.
The join algorithm directly processes the index file associated with the join column of each file. The smallest value of all the current
columns is selected as a base, and this value is used to join equal-valued columns in the other files. In this way, the index files are
traversed exactly once.
The -t tol switch specifies a tolerance value for numeric columns. At present, a tolerance value can join only two files at a time. (A
completely different algorithm is required to join more than two files using a tolerance, somethng we might consider implementing in the
future.)
The following example shows many of the features of funjoin. The input files t1.fits, t2.fits, and t3.fits contain the following columns:
[sh] fundisp t1.fits
AKEY KEY A B
----------- ------ ------ ------
aaa 0 0 1
bbb 1 3 4
ccc 2 6 7
ddd 3 9 10
eee 4 12 13
fff 5 15 16
ggg 6 18 19
hhh 7 21 22
fundisp t2.fits
AKEY KEY C D
----------- ------ ------ ------
iii 8 24 25
ggg 6 18 19
eee 4 12 13
ccc 2 6 7
aaa 0 0 1
fundisp t3.fits
AKEY KEY E F G ------------ ------ -------- -------- -----------
ggg 6 18 19 100.10
jjj 9 27 28 200.20
aaa 0 0 1 300.30
ddd 3 9 10 400.40
Given these input files, the following funjoin command:
funjoin -s -a1 "-B" -a2 "-D" -a3 "-E" -b
"AKEY:???" -b1 "AKEY:XXX,A:255" -b3 "G:NaN,E:-1,F:-100"
-j key t1.fits t2.fits t3.fits foo.fits
will join the files on the KEY column, outputting all columns except B (in t1.fits), D (in t2.fits) and E (in t3.fits), and setting blank
values for AKEY (globally, but overridden for t1.fits) and A (in file 1) and G, E, and F (in file 3). A JFILES column will be output to
flag which files were used in each row:
AKEY KEY A AKEY_2 KEY_2 C AKEY_3 KEY_3 F G JFILES
------------ ------ ------ ------------ ------ ------ ------------ ------ -------- ----------- --------
aaa 0 0 aaa 0 0 aaa 0 1 300.30 7
bbb 1 3 ??? 0 0 ??? 0 -100 nan 1
ccc 2 6 ccc 2 6 ??? 0 -100 nan 3
ddd 3 9 ??? 0 0 ddd 3 10 400.40 5
eee 4 12 eee 4 12 ??? 0 -100 nan 3
fff 5 15 ??? 0 0 ??? 0 -100 nan 1
ggg 6 18 ggg 6 18 ggg 6 19 100.10 7
hhh 7 21 ??? 0 0 ??? 0 -100 nan 1
XXX 0 255 iii 8 24 ??? 0 -100 nan 2
XXX 0 255 ??? 0 0 jjj 9 28 200.20 4
SEE ALSO
See funtools(7) for a list of Funtools help pages
version 1.4.2 January 2, 2008 funjoin(1)