how to join files with diff col # and row #?


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers how to join files with diff col # and row #?
# 1  
Old 09-13-2011
Question how to join files with diff col # and row #?

I am a new user of Unix/Linux, so this question might be a bit simple!
I am trying to join two (very large) files that both have different # of cols and rows in each file.
I want to keep 'all' rows and 'all' cols from both files in the joint file, and the primary key variables are in the rows.
I need all rows that exist in both files to be matched up and joined. However, those rows not in one file or the other should also be kept and their data maintained in the joint file. Basically, all possible max data to be included in joint file.
Hope this makes sense!

small example of files:
file 1 =
Code:
A 1 2 3 4 
B 1 2
C 1 2 3 4 5

file 2 =
Code:
A 1 2 3 4 5
B 1 2 3
C 1 2 3
D 1 2 3 4 5 6
E 1

Joint file should have =
Code:
A 1 2 3 4 5
B 1 2 3
C 1 2 3 4 5
D 1 2 3 4 5 6
E 1


Last edited by Franklin52; 09-14-2011 at 04:06 AM.. Reason: Please use code tags for code and data samples, thank you
# 2  
Old 09-14-2011
Can you count on them being in order?

---------- Post updated at 12:30 PM ---------- Previous update was at 11:53 AM ----------

If not:

Code:
$ awk -v FILE1="file1" 'BEGIN { while(getline<FILE1) { A[$1]=$0; } }
{
        if(length(A[$1]) > length($0))  print A[$1];
        else                            print $0;

        delete A[$1];
}

END {   for(k in A) print A[k];         }' < file2
A 1 2 3 4 5
B 1 2 3
C 1 2 3 4 5
D 1 2 3 4 5 6
E 1
$

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Emergency UNIX and Linux Support

Read values in each col starting 3rd row.Print occurrence value.

Hello Friends, Hope all are doing fine. Here is a tricky issue. my input file is like this 07 10 14 20 21 03 15 27 30 32 01 10 11 19 30 02 06 14 15 17 01 06 20 25 29 Logic: 1. Please print another column as "0-0-0-0-0" for the first and second rows. 2. Read the first column... (4 Replies)
Discussion started by: jacobs.smith
4 Replies

2. Shell Programming and Scripting

How to mark the row based on col value.?

Hi Gurus, I have requirement to identify the records based on one column value. the sample file as below: ID AMT, AMT1 100,10, 2 100,20, 3 200,30, 0 200, 40, 0 300, 20, 2 300, 50, 2 400, 20, 1 400, 60, 0 for each ID, there 2 records, if any one record amt1 is 0, the in 4th col add... (5 Replies)
Discussion started by: ken6503
5 Replies

3. Shell Programming and Scripting

Identify max value in diff columns for same row

Hi, I have a file with 1M records ABC 200 400 2.4 5.6 ABC 410 299 12 1.5 XYZ 4 5 6 7 MNO 22 40 30 70 MNO 47 55 80 150 What I want is for all the rows it should take the max value where there are duplicates output ABC 410 400 12 5.6 XYZ 4 5 6 7 MNO 47 55 80 150 How can i... (6 Replies)
Discussion started by: Diya123
6 Replies

4. Shell Programming and Scripting

Printing from col x to end of line, except last col

Hello, I have some tab delimited data and I need to move the last col. I could hard code it, awk '{ print $1,$NF,$2,$3,$4,etc }' infile > outfile but it would be nice to know the syntax to print a range cols. I know in cut you can do, cut -f 1,4-8,11- to print fields 1,... (8 Replies)
Discussion started by: LMHmedchem
8 Replies

5. Shell Programming and Scripting

Change col to row using shell script..Very Complex

Hi guys I have file A with Below Data ABC123 X1 X2 X3 ABC123 Y1 Y33 Y4 ABC123 Z1 ZS2 ZL3 ABC234 P1 PP3 PP9 ABC234 Q1 ABC234 R1 P09 PO332 PO331 OKI12 .. .. .. Now I want file B as below ABC123 X1 X2 X3;Y1 Y33 Y4;Z1 ZS2 ZL3 ABC234 P1 PP3 PP9;Q1;R1 P09 PO332 PO331 OKI12... (1 Reply)
Discussion started by: asavaliya
1 Replies

6. UNIX for Dummies Questions & Answers

How to use the the join command to join multiple files by a common column

Hi, I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column). I want to write a script to join the files by the first common column so that in the... (5 Replies)
Discussion started by: evelibertine
5 Replies

7. Shell Programming and Scripting

Join txt files with diff cols and rows

I am a new user of Unix/Linux, so this question might be a bit simple! I am trying to join two (very large) files that both have different # of cols and rows in each file. I want to keep 'all' rows and 'all' cols from both files in the joint file, and the primary key variables are in the rows.... (1 Reply)
Discussion started by: BNasir
1 Replies

8. UNIX for Advanced & Expert Users

Print line based on highest value of col (B) and repetion of values in col (A)

Hello everyone, I am writing a script to process data from the ATP world tour. I have a file which contains: t=540 y=2011 r=1 p=N409 t=540 y=2011 r=2 p=N409 t=540 y=2011 r=3 p=N409 t=540 y=2011 r=4 p=N409 t=520 y=2011 r=1 p=N409 t=520 y=2011 r=2 p=N409 t=520 y=2011 r=3 p=N409 The... (4 Replies)
Discussion started by: imahmoud
4 Replies

9. UNIX for Dummies Questions & Answers

Join 2 files with multiple columns: awk/grep/join?

Hello, My apologies if this has been posted elsewhere, I have had a look at several threads but I am still confused how to use these functions. I have two files, each with 5 columns: File A: (tab-delimited) PDB CHAIN Start End Fragment 1avq A 171 176 awyfan 1avq A 172 177 wyfany 1c7k A 2 7... (3 Replies)
Discussion started by: InfoSeeker
3 Replies

10. Shell Programming and Scripting

diff 2 files; output diff's to 3rd file

Hello, I want to compare two files. All records in file 2 that are not in file 1 should be output to file 3. For example: file 1 123 1234 123456 file 2 123 2345 23456 file 3 should have 2345 23456 I have looked at diff, bdiff, cmp, comm, diff3 without any luck! (2 Replies)
Discussion started by: blt123
2 Replies
Login or Register to Ask a Question