Visit Our UNIX and Linux User Community


joining files based on key column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting joining files based on key column
# 1  
Old 07-22-2009
joining files based on key column

Hi

I have to join two files based on 1st column where 4th column of a2.txt=at and take 2nd column of a1.txt and 3rd column of a2.txt and check against source files ,if matches list those source file names.
Code:
a1.txt

a1|20090809|20090810
a2|20090907|20090908

a2.txt

a1|d|file1.txt|at
a1|d|file2.txt|at
a1|d|file3.txt|st
a2|d|file4.txt|st
a2|d|file5.txt|st

I have source files in my dir
file1_20090809.txt
file2_20090809.txt
file3_20090809.txt


I am expecting o/p like that

file1_20090809.txt
file2_20090809.txt


Thanks in advance
Akil
# 2  
Old 07-22-2009
for starters....
nawk -f akil.awk a1.txt a2.txt

akil.awk:
Code:
BEGIN {
  FS="|"
}
FNR==NR { f1[$1]=$2; next }
$1 in f1 && $NF == "at" { dot=index($3,"."); print substr($3,1, dot-1) "_" f1[$1] substr($3, dot) }

# 3  
Old 07-22-2009
Use gawk, nawk or /usr/xpg4/bin/awk on Solaris:

Code:
awk -F\| 'NR == FNR { f1[$1] = $2; next }
$1 in f1 { 
  n = split($NF, t, "."); ext = t[n]
  sub("." t[n], "", $NF)
  fn = $NF sep f1[$1] "." ext
  if (!system("[ -e " fn " ]")) print fn
}' sep="_" a1.txt a2.txt

# 4  
Old 07-22-2009
Hi ,
Thanks for your prompt reply

Thanks & Regards,
Akil
# 5  
Old 07-22-2009
Quote:
Originally Posted by radoulov
Use gawk, nawk or /usr/xpg4/bin/awk on Solaris:

Code:
awk -F\| 'NR == FNR { f1[$1] = $2; next }
$1 in f1 { 
  n = split($NF, t, "."); ext = t[n]
  sub("." t[n], "", $NF)
  fn = $NF sep f1[$1] "." ext
  if (!system("[ -e " fn " ]")) print fn
}' sep="_" a1.txt a2.txt

With my nawk on Solaris 'system' seems to return the status of the 'spawning' of the command, rather than the return status of the command (although the 'man nawk' says otherwise).
I used to check for the file existence with the 'getline':
Code:
if ((getline dummy < fn ) >0) { print fn; close(fn) }


Last edited by vgersh99; 07-22-2009 at 12:30 PM..
# 6  
Old 07-22-2009
Code:
$
$ perl -ne 'BEGIN{open(F1,"a1.txt"); while(<F1>){split/\|/; $x{$_[0]}=$_[1]} close F1}
>           chomp; split/\|/;
>           if ($_[3] eq "at") {$_[2] =~ s/(.*)\.(.*)/$1_$x{$_[0]}.$2/; push @f,$_[2]}
>           END {foreach $i (@f) {system "ls $i 2>/dev/null"}}' a2.txt
file1_20090809.txt
file2_20090809.txt
$
$

tyler_durden
# 7  
Old 07-22-2009
With Perl:

Code:
perl -le'
    open F1, $ARGV[0] or die "$ARGV[0]: $!";
    %f1 = map { ( split /\|/ )[ 0, 1 ] } <F1>;
    open F2, $ARGV[1] or die "$ARGV[1]: $!";
    while (<F2>) {
        @f2 = split /\|/;
        if ( exists $f1{ $f2[0] } ) {
            $f2[-1] =~ /(.+)(\.[^.\n]+)$/;
            $fn = $1 . "_" . $f1{ $f2[0] } . $2;
            -e $fn and print $fn;
        }
    }' a1.txt a2.txt



---------- Post updated at 05:19 PM ---------- Previous update was at 05:16 PM ----------

I missed the "at" part in both solutions Smilie
Adding it is left as an exercise.

---------- Post updated at 05:24 PM ---------- Previous update was at 05:19 PM ----------

Quote:
Originally Posted by vgersh99
With my nawk on Solaris 'system' seem to return the status of the 'spawning' of the command, rather than the return status of the command (although the 'man nawk' says otherwise).
I used to check for file existence with the 'getline':
Code:
if ((getline dummy < fn ) >0) { print fn; close(fn) }

Yep,
it does not work as expected on Solaris (I'll check later).

---------- Post updated at 05:28 PM ---------- Previous update was at 05:24 PM ----------

I think I should post a correct answer later (my both solutions are wrong). Got to go now ...

---------- Post updated at 07:35 PM ---------- Previous update was at 05:28 PM ----------

Actually, if I'm not missing something, this seems to work on my Solaris machine:

[ some old shells like bsh do not support the -e test option, so I changed it to -f ]

Code:
% head a*
==> a1.txt <==
a1|20090809|20090810
a2|20090907|20090908

==> a2.txt <==
a1|d|file1.txt|at
a1|d|file6.txt|at
a1|d|file2.txt|at
a1|d|file3.txt|st
a2|d|file4.txt|st
a2|d|file5.txt|st

% uname -rs
SunOS 5.8

% ls -l
total 4
-rw-r--r--   1 drado    sysdba        42 Jul 22 17:08 a1.txt
-rw-r--r--   1 drado    sysdba       108 Jul 22 19:10 a2.txt
-rw-r--r--   1 drado    sysdba         0 Jul 22 17:10 file1_20090809.txt
-rw-r--r--   1 drado    sysdba         0 Jul 22 17:10 file2_20090809.txt
-rw-r--r--   1 drado    sysdba         0 Jul 22 17:10 file3_20090809.txt

% /usr/xpg4/bin/awk -F\| 'NR == FNR { f1[$1] = $2; next }
$1 in f1 && /at$/ {
  n = split($(NF - 1), t, "."); ext = t[n]
  sub("." t[n], "", $(NF - 1))
  fn = $(NF - 1) sep f1[$1] "." ext
  if (!system("[ -f " fn " ]")) print fn
}' sep="_" a1.txt a2.txt
file1_20090809.txt
file2_20090809.txt

% nawk -F\| 'NR == FNR { f1[$1] = $2; next }
$1 in f1 && /at$/ {
  n = split($(NF - 1), t, "."); ext = t[n]
  sub("." t[n], "", $(NF - 1))
  fn = $(NF - 1) sep f1[$1] "." ext
  if (!system("[ -f " fn " ]")) print fn
}' sep="_" a1.txt a2.txt
file1_20090809.txt
file2_20090809.txt

Code:
% nawk 'BEGIN {
  while (++i < ARGC)
    print ARGV[i], system("[ -f " ARGV[i] " ]")
        }' file* inexistent
file1_20090809.txt 0
file2_20090809.txt 0
file3_20090809.txt 0
inexistent 1


Modified Perl version:

Code:
perl -le'
    open F1, $ARGV[0] or die "$ARGV[0]: $!";
    %f1 = map { ( split /\|/ )[ 0, 1 ] } <F1>;
    open F2, $ARGV[1] or die "$ARGV[1]: $!";
    while (<F2>) {
        @f2 = split /\|/;
        if ( exists $f1{ $f2[0] } && /at$/ ) {
            $f2[-2] =~ /(.+)(\.[^.]+)$/;
            $fn = $1 . "_" . $f1{ $f2[0] } . $2;
            -e $fn and print $fn;
        }
    }' a1.txt a2.txt

I agree with vgersh99 that it's better to do the test inside awk without spawning an external command,
but I'd like to point out the that approach assumes that the files are readable..

Last edited by radoulov; 07-22-2009 at 02:43 PM..

Previous Thread | Next Thread
Test Your Knowledge in Computers #810
Difficulty: Medium
The CSS3 RGB Decimal Code for Turquoise is 64,224,218.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Matching 2 files based on key

Hi all I have two files I need to match record from first file and second file on column 1,8 and and output only match records on file1 File1: 020059801803180116130926800002090000800231000245204003160000000002000461OUNCE000000350000100152500BM01007W0000 ... (5 Replies)
Discussion started by: arunkumar_mca
5 Replies

2. Shell Programming and Scripting

Joining the files with comparing the first column

Hi, I have two files in the following format. I am trying to compare the first column of both the files and if the values match the rows in file tst6 should be replaced in tst1. File tst1 S00823295|MIDDL|0|MR|019221521A||RL|STD|0|0||E S00862481|ESSEX|0|MR|018163650A||R|STD|0|0||E... (1 Reply)
Discussion started by: nua7
1 Replies

3. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies

4. Shell Programming and Scripting

Joining multiple files based on one column with different and similar values (shell or perl)

Hi, I have nine files looking similar to file1 & file2 below. File1: 1 ABCA1 1 ABCC8 1 ABR:N 1 ACACB 1 ACAP2 1 ACOT1 1 ACSBG 1 ACTR1 1 ACTRT 1 ADAMT 1 AEN:N 1 AKAP1File2: 1 A4GAL 1 ACTBL 1 ACTL7 (4 Replies)
Discussion started by: seqbiologist
4 Replies

5. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies

6. UNIX for Dummies Questions & Answers

any script for joining files based on simple conditions

Condition1; If NPID and IndID of both input1 and input2 are same take all the vaues relevant to them and print together as output Condition2; IDNo in output: Take the highly repeated same letter of similar NPID-IndID as *1* Second highly repeated same letter... (0 Replies)
Discussion started by: stateperl
0 Replies

7. UNIX for Dummies Questions & Answers

Joining files based on multiple keys

I need a script (perl or awk..anything is fine) to join 3 files based on three key columns. The no of non-key columns can vary in each file. The columns are delimited by semicolon. For example, File1 Dim1;Dim2;Dim3;Fact1;Fact2;Fact3;Fact4;Fact5 ---- data delimited by semicolon --- ... (1 Reply)
Discussion started by: Sebben
1 Replies

8. Shell Programming and Scripting

Joining two files based on columns/fields

I've got two files, File1 and File2 File 1 has got combination of col1, col2 and col3 which comes on file2 as well, file2 does not get col4. Now based on col1, col2 and col3, I would like to get col4 from file1 and all the columns from file2 in a new file Any ideas? File1 ------ Col1 col2... (11 Replies)
Discussion started by: rudoraj
11 Replies

9. Shell Programming and Scripting

Joining columns from two files, if the key matches

I am trying to join/paste columns from two files for the rows with matching first field. Any help will be appreciated. Files can not be sorted and may not have all rows in both files. Thanks. File1 aaa 111 bbb 222 ccc 333 File2 aaa sss mmmm ccc kkkk llll ddd xxx yyy Want to... (1 Reply)
Discussion started by: sk_sd
1 Replies

10. Shell Programming and Scripting

merging two files based on some key

I have to merge two files: The files are having the same format like A0this is first line TOlast line silmilarly other lines. I have to search for A0 line in the second file also and then put the data in the third file under A0 heading ,then for A1 and so on. A0 portion will be treminated... (1 Reply)
Discussion started by: Vandana Yadav
1 Replies

Featured Tech Videos