Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Joining files based on multiple keys Post 302308619 by Sebben on Sunday 19th of April 2009 06:45:05 PM
Old 04-19-2009
Joining files based on multiple keys

I need a script (perl or awk..anything is fine) to join 3 files based on three key columns. The no of non-key columns can vary in each file. The columns are delimited by semicolon.

For example,

File1

Dim1;Dim2;Dim3;Fact1;Fact2;Fact3;Fact4;Fact5
---- data delimited by semicolon ---

File2

Dim1;Dim2;Dim3;Fact6;Fact7;Fact8
---- data delimited by semicolon ---

File3
Dim1;Dim2;Dim3;Fact9;Fact10
---- data delimited by semicolon ---

So I have to join based on the keys (Dimensions 1,2 and 3) and output the data in one file

Output File
Dim1;Dim2;Dim3;Fact1;Fact2;Fact3;Fact4;Fact5;Fact6;Fact7;Fact8;Fact9;Fact10
---- data delimited by semicolon ---

It needs to be a full outer join. Say if a match is not found in one of the files, the fact values need to be filled as zeroes. Each file has header as well as data.

Thanks for your help.
 

10 More Discussions You Might Find Interesting

1. Programming

marge tow files based on keys

how can i marge two files depend som key for example: the first file include many records of information for X person and the second file have one record of information for each X person shortly i want to mak first :match between the two files then insert data from the second to the first... (2 Replies)
Discussion started by: Ehab
2 Replies

2. Shell Programming and Scripting

Joining two files based on columns/fields

I've got two files, File1 and File2 File 1 has got combination of col1, col2 and col3 which comes on file2 as well, file2 does not get col4. Now based on col1, col2 and col3, I would like to get col4 from file1 and all the columns from file2 in a new file Any ideas? File1 ------ Col1 col2... (11 Replies)
Discussion started by: rudoraj
11 Replies

3. Shell Programming and Scripting

joining files based on key column

Hi I have to join two files based on 1st column where 4th column of a2.txt=at and take 2nd column of a1.txt and 3rd column of a2.txt and check against source files ,if matches list those source file names. a1.txt a1|20090809|20090810 a2|20090907|20090908 a2.txt a1|d|file1.txt|at... (9 Replies)
Discussion started by: akil
9 Replies

4. Shell Programming and Scripting

Sum a column value based on multiple keys

Hi, I have below as i/p file: 5ABC 36488989 K 000010000ASB BYTRES 5PQR 45757754 K 000200005KPC HGTRET 5ABC 36488989 K 000045000ASB HGTRET 5GTH 36488989 K 000200200ASB BYTRES 5FTU ... (2 Replies)
Discussion started by: nirnkv
2 Replies

5. UNIX for Dummies Questions & Answers

any script for joining files based on simple conditions

Condition1; If NPID and IndID of both input1 and input2 are same take all the vaues relevant to them and print together as output Condition2; IDNo in output: Take the highly repeated same letter of similar NPID-IndID as *1* Second highly repeated same letter... (0 Replies)
Discussion started by: stateperl
0 Replies

6. UNIX for Dummies Questions & Answers

Joining string on multiple files

Hi guys, I am a forum (and a bit of a unix) newbie, and I currently have a tricky problem lying ahead of me. I have multiple files, and I am looking to join the files on the first column. Example: File 1 andy b 100 amy c 200 amy d 300 File 2 andy c 200 amy c 100 clyde o 50 ... (3 Replies)
Discussion started by: jdr0317
3 Replies

7. Shell Programming and Scripting

Joining multiple files based on one column with different and similar values (shell or perl)

Hi, I have nine files looking similar to file1 & file2 below. File1: 1 ABCA1 1 ABCC8 1 ABR:N 1 ACACB 1 ACAP2 1 ACOT1 1 ACSBG 1 ACTR1 1 ACTRT 1 ADAMT 1 AEN:N 1 AKAP1File2: 1 A4GAL 1 ACTBL 1 ACTL7 (4 Replies)
Discussion started by: seqbiologist
4 Replies

8. Shell Programming and Scripting

Find All duplicates based on multiple keys

Hi All, Input.txt 123,ABC,XYZ1,A01,IND,I68,IND,NN 123,ABC,XYZ1,A01,IND,I67,IND,NN 998,SGR,St,R834,scot,R834,scot,NN 985,SGR0399,St,R180,T15,R180,T1,YY 985,SGR0399,St,R180,T15,R180,T1,NN 985,SGR0399,St,R180,T15,R180,T1,NN 2943,SGR?99,St,R68,Scot,R77,Scot,YY... (2 Replies)
Discussion started by: unme
2 Replies

9. Shell Programming and Scripting

Combine multiple rows based on selected column keys

Hello I want to collapse a file with multiple rows into consolidated lines of entries based on selected columns as the 'key'. Example: 1 2 3 Abc def ghi 1 2 3 jkl mno p qrts 6 9 0 mno def Abc 7 8 4 Abc mno mno abc 7 8 9 mno mno abc 7 8 9 mno j k So if columns 1, 2 and 3 are... (6 Replies)
Discussion started by: linuxlearner123
6 Replies

10. Shell Programming and Scripting

awk joining multiple lines based on field count

Hi Folks, I have a file with fields as follows which has last field in multiple lines. I would like to combine a line which has three fields with single field line for as shown in expected output. Please help. INPUT hname01 windows appnamec1eda_p1, ... (5 Replies)
Discussion started by: shunya
5 Replies
COLRM(1)						    BSD General Commands Manual 						  COLRM(1)

NAME
colrm -- remove columns from a file SYNOPSIS
colrm [start [stop]] DESCRIPTION
The colrm utility removes selected columns from the lines of a file. A column is defined as a single character in a line. Input is read from the standard input. Output is written to the standard output. If only the start column is specified, columns numbered less than the start column will be written. If both start and stop columns are spec- ified, columns numbered less than the start column or greater than the stop column will be written. Column numbering starts with one, not zero. Tab characters increment the column count to the next multiple of eight. Backspace characters decrement the column count by one. ENVIRONMENT
The LANG, LC_ALL and LC_CTYPE environment variables affect the execution of colrm as described in environ(7). EXIT STATUS
The colrm utility exits 0 on success, and >0 if an error occurs. SEE ALSO
awk(1), column(1), cut(1), paste(1) HISTORY
The colrm command appeared in 3.0BSD. BSD
August 4, 2004 BSD
All times are GMT -4. The time now is 01:15 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy