awk program to join 2 fields of different files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk program to join 2 fields of different files
# 1  
Old 10-04-2012
Error awk program to join 2 fields of different files

Hello Friends,
I just need a small help, I need an awk program which can join 2 fields of different files which are having one common field into one file.

File - 1
FileName~Size

File- 2
FileName~Date

I need the output file in the following way
O/P- File
FileName~Date~Size

For this req, do the files have to be sorted?,
The files are as huge as 10 million lines.
Need your help,

Thanks in Advance
Regards,
Abhishek S.
# 2  
Old 10-05-2012
This will work if files are sorted by the first column(file name):
Code:
$ cat t
a~12345
b~54321
c~47789

$ cat t2
a~10/01/2012
b~10/02/2012
c~10/03/2012

$ join -t'~' t t2
a~12345~10/01/2012
b~54321~10/02/2012
c~47789~10/03/2012

This User Gave Thanks to spacebar For This Post:
# 3  
Old 10-05-2012
Hi,
Thanks a lot for the reply. Am aware of the join command, but for that the file has to be sorted. So if I sort a 10 million file its breaking in between saying that there is not much space left.

If this function possible through awk ?
# 4  
Old 10-05-2012
The logic is simple in awk (whether the input files are sorted or not):
Code:
awk 'BEGIN {FS = OFS = "~"}
FNR == NR {s[$1] = $2; next}
        {print $1, $2, s[$1]}' in1 in2

but I make no guarantee that awk won't run out of memory for files this large.
If a line in the second file doesn't have a match in the first file, a record will be printed with the 3rd field empty. It would also be possible to add a couple of statements to print any lines that appear in the 1st input file that don't contain a matching line in the 2nd input file, but I didn't bother since you have implied that there are always matching lines in the two input files.
# 5  
Old 10-05-2012
try:
Code:
sort -u dates_file sizes_file | awk -F"~" '
{
  if($1==ln){
    fe=0;
    if ($2 ~ "/") {
     cd=$2; cs=$3;
    } else {
     cd=$3; cs=$2;
    }
    if (cd !~ /./) cd=ld;
    if (cs !~ /./) cs=ls;
    if (ln ~ /./) {
      print ln "~" cd "~" cs;
    }
  } else {
    if (ln ~ /./) {
      if (fe==1) {
        print ll;
      }
    }
    fe=1;
  }
}
{
  ll=$0; ln=$1;
  if ($2 ~ "/") {
   ld=$2; ls=$3;
  } else {
   ld=$3; ls=$2;
  }
}
END{
  if (fe=1) print ll;
}
' > new_file

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Join, merge, fill NULL the void columns of multiples files like sql "LEFT JOIN" by using awk

Hello, This post is already here but want to do this with another way Merge multiples files with multiples duplicates keys by filling "NULL" the void columns for anothers joinning files file1.csv: 1|abc 1|def 2|ghi 2|jkl 3|mno 3|pqr file2.csv: 1|123|jojo 1|NULL|bibi... (2 Replies)
Discussion started by: yjacknewton
2 Replies

2. Shell Programming and Scripting

Join two files using awk

Hello All; I have two files: File1: abc def pqr File2: abc,123 mno,456 def,989 pqr,787 ghj,678 (6 Replies)
Discussion started by: mystition
6 Replies

3. Shell Programming and Scripting

Join files on multiple fields

Hello all, I want to join 2 tabbed files on the first 2 fields, and filling the missing values with 0. The 3rd column in each file is constant for the entire file. file1 12658699 ST5 XX2720 0 1 0 1 53039541 ST5 XX2720 1 0 1.5 1 file2 ... (6 Replies)
Discussion started by: sheetalk
6 Replies

4. Shell Programming and Scripting

awk join 2 files

Hello All, file1 A1;B1;C1;D1;E1;F1;G1;H1;III1;J1 A2;B2;C2;D2;E2;F2;G2;H2;III2;J2 A3;B3;C3;D3;E3;F3;G3;H3;III3;J3 A4;B4;C4;D4;E4;F4;G4;H4;III4;J4file2 III1 ZZ1 S1 Y 1 P1 None NA III2 ZZ2 S2 Y 3 P2 None NA III3 ZZ3 S2 Y 5 ... (2 Replies)
Discussion started by: vikus
2 Replies

5. Shell Programming and Scripting

Join fields comparing 4 fields using awk

Hi All, I am looking for an awk script to do the following Join the fields together only if the first 4 fields are same. Can it be done with join function in awk?? a,b,c,d,8,,, a,b,c,d,,7,, a,b,c,d,,,9, a,b,p,e,8,,, a.b,p,e,,9,, a,b,p,z,,,,9 a,b,p,z,,8,, desired output: ... (1 Reply)
Discussion started by: aksijain
1 Replies

6. Shell Programming and Scripting

Join fields from files with duplicate lines

I have two files, file1.txt: 1 abc 2 def 2 dgh 3 ijk 4 lmn file2.txt 1 opq 2 rst 3 uvw My desired output is: 1 abc opq 2 def rst 2 dgh rst 3 ijk uvw (2 Replies)
Discussion started by: xan.amini
2 Replies

7. Shell Programming and Scripting

Awk - join multiple files

Is it possible to join all the files with input1 based on 1st column? input1 a b c d e f input2 a b input3 a e input4 c (2 Replies)
Discussion started by: quincyjones
2 Replies

8. Shell Programming and Scripting

how to join two files with awk.

Hi, Unix Gurus, I need to compare two file based on key value and load result to different files. requirement as following: file1 1, abc 2, bcd 4, cdefile2 1, aaaaa 2, bbbbb 5, ccccckey value is first column for both file. I need generate following files; records_in_1_not_2.txt 4,... (6 Replies)
Discussion started by: ken002
6 Replies

9. UNIX for Dummies Questions & Answers

Join 2 files with multiple columns: awk/grep/join?

Hello, My apologies if this has been posted elsewhere, I have had a look at several threads but I am still confused how to use these functions. I have two files, each with 5 columns: File A: (tab-delimited) PDB CHAIN Start End Fragment 1avq A 171 176 awyfan 1avq A 172 177 wyfany 1c7k A 2 7... (3 Replies)
Discussion started by: InfoSeeker
3 Replies

10. Shell Programming and Scripting

Left join on files using awk

nawk 'NR==FNR{a;next} {if($1 in a) print $1,"Found" else print}' OFS="," File_B File_A The above code is not working help is appreciated (6 Replies)
Discussion started by: pinnacle
6 Replies
Login or Register to Ask a Question