Compare Fields from two text files using key columns
Hi All,
I have two files to compare. Each has 10 columns with first 4 columns being key index together. The rest of the columns have monetary values.
Using Perl, I want to read one file into hash; check for the key value availability in file 2; then compare the values in the rest of 6 columns; report the differences found.
The files are comman separated and do not have header
Here is the sample file:
File A:
File B:
Output:
Any help would be appreciated. I am able to come up with the script in Bash, but not very comfortable with the concept of Hash in Perl and also setting up key index columns.
Thanks!
Last edited by vgersh99; 05-12-2010 at 01:10 PM..
Reason: code tags, please!
I am trying to join/paste columns from two files for the rows with matching first field. Any help will be appreciated.
Files can not be sorted and may not have all rows in both files.
Thanks.
File1
aaa 111
bbb 222
ccc 333
File2
aaa sss mmmm
ccc kkkk llll
ddd xxx yyy
Want to... (1 Reply)
Hi Folks,
I need to compare two very huge file ( i.e the files would contain a minimum of 70k records each) using awk or sed. The comparison needs to be done with respect to a 'key'. For example :
File1
**********
1234|TONY|Y75634|20/07/2008
1235|TINA|XCVB56|30/07/2009... (13 Replies)
i have this file which has the following contents:
,-0.3000 ,-0.3000 ,-0.3000
,-0.9000 ,-0.9000 ,-0.9000
i would like to get this:
-0.3-0.9-0.3-0.9-0.3-0.9
so far i am trying:
awk '{for(i=1; i<=NF; i++) {printf("%f\n",$i)}}' test1 > test2
any help... (4 Replies)
HI
I'm having some troubles to compare and permut diffrent fields indexed with another filed like the following example `:
file1
1 1
2 2
3 3
file2
7 1
9 2
10 3
result------------------- (6 Replies)
Hi,
I need the most efficient way of comparing the following and arriving at the result
I have a file which has entries like,
File1:
1|2|5|7|8|2|3|6|3|1
File2:
1|2|3|1|2|7|9|2
I need to compare the entries in these two file with those of a general file,
1|2|3|5|2|5|6|9|3|1... (7 Replies)
Hi,
I have two text files, compare column one in both the files and if it matches then the output should contain the id in column one, the number and the description.
Both the files are sorted. Is there a one liner to get this done, kindly help. Thank you
File 1:
NC_000964 92.33 ... (2 Replies)
I am trying to compare two files (separted by a pipe) using 2 fields (field 1,3 from fileA and 1,2 from fileB) if the two files match i want the whole record of fileA adding the extra fields left from fileB.
1. A.txt
cat|floffy|12|anything|anythings
cat|kitty|15|lala|lalala... (6 Replies)
Hi,
I am trying to check two files based on certain string and field.
cat f1
source=\GREP\"
hi this \\
source=\SED\"
skdmsmd
dnksdns
source=\PERL\"
cat f2
source=\SED\"
source=\GREP\"
vlamskds
amdksk m
source=\AWK\"
awk \here\" (3 Replies)
Hi all, I'm pretty much a newbie to UNIX. I would appreciate any help with UNIX coding on comparing two large csv files (greater than 10 GB in size), and output a file with matching columns.
I want to compare file1 and file2 by 'id' and 'chain' columns, then extract exact matching rows'... (5 Replies)
Hi,
Below are the sample files. x.txt is from an Excel file that is a list of users from Windows and y.txt is a list of database account.
$ head -500 x.txt y.txt
==> x.txt <==
TEST01 APP_USER_PROFILE
USER03 APP_USER_PROFILE
TEST02 APP_USER_EXP_PROFILE
TEST04 APP_USER_PROFILE
USER01 ... (3 Replies)
Discussion started by: newbie_01
3 Replies
LEARN ABOUT DEBIAN
compalign
COMPALIGN(1) General Commands Manual COMPALIGN(1)NAME
compalign - compare two multiple alignments
SYNOPSIS
compalign [-options] <trusted-alignment> <test-alignment>
DESCRIPTION
compalign calculates the fractional "identity" between the trusted alignment and the test alignment. The two files must contain exactly the
same sequences, in exactly the same order.
The identity of the multiple sequence alignments is defined as the averaged identity over all N(N-1)/2 pairwise alignments.
The fractional identity of two sets of pairwise alignments is in turn defined as follows (for aligned known sequences k1 and k2, and
aligned test sequences t1 and t2):
matched columns / total columns
where total columns = the total number of columns in which there is
a valid (nongap) symbol in k1 or k2;
matched columns = the number of columns in which one of the
following is true:
k1 and k2 both have valid symbols at a given column; t1 and t2
have the same symbols aligned in a column of the t1/t2
alignment;
k1 has a symbol aligned to a gap in k2; that symbol in t1 is
also aligned to a gap;
k2 has a symbol aligned to a gap in k1; that symbol in t2 is
also aligned to a gap.
Because scores for all possible pairs are calculated, the algorithm is of order (N^2)L for N sequences of length L; large sequence sets
will take a while.
OPTIONS
Available options:
-h Print short help and usage info.
-c Only compare under marked #=CS consensus structure.
--informat <s>
Specify that both alignments are in format <s> (MSF, for instance).
--quiet
Suppress verbose header (used in regression testing).
SEE ALSO afetch(1), alistat(1), compstruct(1), revcomp(1), seqsplit(1), seqstat(1), sfetch(1), shuffle(1), sindex(1), sreformat(1), stranslate(1),
weight(1).
AUTHOR
Sean Eddy
HHMI/Department of Genetics
Washington University School of Medicine
4444 Forest Park Blvd., Box 8510
St Louis, MO 63108 USA
Phone: 1-314-362-7666
FAX : 1-314-362-2157
Email: eddy@genetics.wustl.edu
This manual page was written by Nelson A. de Oliveira <naoliv@gmail.com>,
for the Debian project (but may be used by others).
Mon, 01 Aug 2005 15:28:08 -0300COMPALIGN(1)