10-17-2006
Join Files
Hi Gurus,
I have to join two flat files based on two key field columns. I concatenated two key fields and i tried the join command. It is working
fine. But, without using temporary files can't i use like this:
join -t ':' `awk -F ":" '{ printf("%s%s:%s\n", $1,$2, $0) }' file1` `awk -F ":" '{ printf("%s%s:%s\n", $1,$2, $0) }' file2`
My each file is having the size nearly 2 GB. Similary i need to run 27 jobs parellely (scripts which are using join).
So i need to join 54 files approximately. If i use temporary files 54 * 2 = 108 GB space needs to be utilized more.
Without using temporary files is there any other approach to proceed?
Thanks in advance
Srinivas Choppa
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi ,
I want to join 2 files based on 2 column join condition.
a11
john 2230 5000
a12
XXX 2230 A B 200 345
Expected O/P
John 2230 5000 A B 200
I have tried this
awk 'NR==FNR{a=$1;next}a&&sub($1,a)' a11 a12 > a13 (3 Replies)
Discussion started by: mohan705
3 Replies
2. Shell Programming and Scripting
Hi
i have file f1 like:
xx yy| 123
xx1 yy1| 1234
xx2 yy2|12345
f2 file like:
xx yy| aaa
xx1 yy1| bbb
xx2 yy2|cccc
i would like output is:
xx yy| 123|aaa
xx1 yy1| 1234|bbbb
xx2 yy2|12345|cccc
please help me on this........... (5 Replies)
Discussion started by: koti_rama
5 Replies
3. UNIX for Dummies Questions & Answers
Hello,
My apologies if this has been posted elsewhere, I have had a look at several threads but I am still confused how to use these functions. I have two files, each with 5 columns:
File A: (tab-delimited)
PDB CHAIN Start End Fragment
1avq A 171 176 awyfan
1avq A 172 177 wyfany
1c7k A 2 7... (3 Replies)
Discussion started by: InfoSeeker
3 Replies
4. Shell Programming and Scripting
i have two files and i want to join the contents like:
file a has content
my name is
i am
i work at
and file b has
John sims
43 years old
maximu ltd
and i want to join the two files to get a third file with content reading
my name is John sims
i am 43 years old
i work at... (2 Replies)
Discussion started by: tomjones
2 Replies
5. UNIX for Dummies Questions & Answers
file1:
Toronto:12439755:1076359:July 1, 1867:6
Quebec City:7560592:1542056:July 1, 1867:5
Halifax:938134:55284:July 1, 1867:4
Fredericton:751400:72908:July 1, 1867:3
Winnipeg:1170300:647797:July 15, 1870:7
Victoria:4168123:944735:July 20, 1871:10
Charlottetown:137900:5660:July 1, 1873:2... (2 Replies)
Discussion started by: mindfreak
2 Replies
6. UNIX for Dummies Questions & Answers
Hi,
I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column).
I want to write a script to join the files by the first common column so that in the... (5 Replies)
Discussion started by: evelibertine
5 Replies
7. Shell Programming and Scripting
I have file1.txt
BGE179W1
BGE179W2
BGE179W3
BGE187W1
BGE187W2
BGE187W3
BGE194W1
BGE194W2
BGE194W3
BGE227W1
BGE227W2
BGE227W3
BGE288W1
BGE288W2
BGE288W3
BGE650W1
---------- Post updated at 12:41 AM ---------- Previous update was at 12:39 AM ---------- (5 Replies)
Discussion started by: radius
5 Replies
8. Shell Programming and Scripting
Hi,
I have two files Files, FileA and FileB which are attached.Each row in the files have 8 tab delimited columns. The two files have to be compared and joined based on first two columns. The resulting file FileC should have:
1. if the data in the first two columns is same in both the... (3 Replies)
Discussion started by: mehar
3 Replies
9. Shell Programming and Scripting
I have 2 files:
fileA
AAA1:AAA2:AAA3:AAA_4:AAA5:AAA_6:AAA7:AAA_8
BBB1:BBB2:BBB3:BBB_4:BBB5:BBB-6
CCC1:CCC2:CCC3:CCC_4fileB
AAA_4:XXX1:YYY1
BBB_4:XXX2:YYY2
CCC_4:XXX3:YYY3:ZZZ3
AAA_6:XXX4:YYY4
AAA_8:XXX5:YYY5Result:
AAA1:AAA2:AAA3:AAA_4:XXX1:YYY1:AAA5:AAA_6:XXX4:YYY4:AAA7:AAA_8:XXX5:YYY5... (8 Replies)
Discussion started by: vikus
8 Replies
10. Shell Programming and Scripting
Hello,
This post is already here but want to do this with another way
Merge multiples files with multiples duplicates keys by filling "NULL" the void columns for anothers joinning files
file1.csv:
1|abc
1|def
2|ghi
2|jkl
3|mno
3|pqr
file2.csv:
1|123|jojo
1|NULL|bibi... (2 Replies)
Discussion started by: yjacknewton
2 Replies
JOIN(1) General Commands Manual JOIN(1)
NAME
join - relational database operator
SYNOPSIS
join [ options ] file1 file2
DESCRIPTION
Join forms, on the standard output, a join of the two relations specified by the lines of file1 and file2. If file1 is `-', the standard
input is used.
File1 and file2 must be sorted in increasing ASCII collating sequence on the fields on which they are to be joined, normally the first in
each line.
There is one line in the output for each pair of lines in file1 and file2 that have identical join fields. The output line normally con-
sists of the common field, then the rest of the line from file1, then the rest of the line from file2.
Fields are normally separated by blank, tab or newline. In this case, multiple separators count as one, and leading separators are dis-
carded.
These options are recognized:
-an In addition to the normal output, produce a line for each unpairable line in file n, where n is 1 or 2.
-e s Replace empty output fields by string s.
-jn m Join on the mth field of file n. If n is missing, use the mth field in each file.
-o list
Each output line comprises the fields specified in list, each element of which has the form n.m, where n is a file number and m is a
field number.
-tc Use character c as a separator (tab character). Every appearance of c in a line is significant.
SEE ALSO
sort(1), comm(1), awk(1)
BUGS
With default field separation, the collating sequence is that of sort -b; with -t, the sequence is that of a plain sort.
The conventions of join, sort, comm, uniq, look and awk(1) are wildly incongruous.
7th Edition April 29, 1985 JOIN(1)