Merging two tables including multiple ocurrence of column identifiers and unique lines


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Merging two tables including multiple ocurrence of column identifiers and unique lines
# 1  
Old 09-16-2014
Merging two tables including multiple ocurrence of column identifiers and unique lines

I would like to merge two tables based on column 1:

File 1:

Code:
  1    today  
  1    green  
  2    tomorrow  
  3    red

File 2:

Code:
  1    a lot  
  1    sometimes  
  2    at work  
  2    at home  
  2    sometimes  
  3    new  
  4    a lot  
  5    sometimes  
  6    at work

Desired output (file 3):

Code:
  1        today    a lot  
  1        today    sometimes  
  1        green    a lot  
  1        green    sometimes  
  2        tomorrow    at work  
  2        tomorrow    at home  
  2        tomorrow    sometimes  
  3        red    new

I came up with the following:
Code:
    awk -F '[\t]' -v OFS='\t' '{i=$1;$1=x} NR==FNR{A[i]=$0;next} A[i]{print i,$0A[i]}' file2 file1 > file3

However, it gives me only:

Code:
  1        today    sometimes  
  2        tomorrow    sometimes  
  3        red    new


Please note that I would like to have only the lines of file 1 (column 1 as the identifier) but report all matching occurrences in file 2.

Moderator's Comments:
Mod Comment Please use code tags next time for your code and data

Last edited by Don Cragun; 09-16-2014 at 09:05 PM.. Reason: Add more CODE tags.
# 2  
Old 09-16-2014
[user@host tmp]# join -t" " -1 1 -2 1 file1 file2
1 today a lot
1 today sometimes
1 green a lot
1 green sometimes
2 tomorrow at work
2 tomorrow at home
2 tomorrow sometimes
3 red new
# 3  
Old 09-16-2014
Thanks, your answer worked on my example. However, the real files I deal with have reocurrence of specified identifiers from column 1, eg.

File 1:
Code:
1 today 
1 green 
2 tomorrow 
3 red
1 today 
2 tomorrow

File 2: as above

Desired output:
Code:
1 today a lot
1 today sometimes
1 green a lot
1 green sometimes
2 tomorrow at work
2 tomorrow at home
2 tomorrow sometimes
3 red new
1 today a lot
1 today sometimes
2 tomorrow at work
2 tomorrow at home
2 tomorrow sometimes

Any idea?

---------- Post updated at 05:17 PM ---------- Previous update was at 02:03 PM ----------

no one an idea? Smilie

Moderator's Comments:
Mod Comment edit by bakunin: first, you were asked to use CODE-tags for your code AND data. Please use them! Second: this is a forum, not a helpdesk! We are neither obliged to answer at all nor have we made any promises. Attempts at speeding the process up might slow it down (because people do not like being urged to help voluntarily) but it won't make it happen any quicker.


First, I am sorry. As you might have noticed I am new to this forum and need time to learn such things. Writing in BOLD and caps might be your attempt to speed this process up but it clearly won't. Second, ..well this one is obvious (The UNIX and Linux Forums), right? The description 'Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.' speaks for itself. Something else? Smilie

Last edited by BSP; 09-16-2014 at 02:57 PM..
# 4  
Old 09-16-2014
You were fairly close - you just need append subsequent records to the array and then split when printing:

Not sure what are spaces and what are tabs in the infiles, so I coded for worst case (ie any white space could be a tab)

Code:
awk -F '\t' '
  {key=$1;$1=x;$0=substr($0,2)}
  FNR==NR{A[key]=A[key]"|"$0;next}
  (key in A) {
     c=split(A[key],V,"|")
     for(i=1;i<c;) print key,$0,V[++i]
}' OFS="\t" file2 file1

# 5  
Old 09-17-2014
Thank you!

For others who might run into a similar problem I gathered some more options:

Perl solution:
Code:
$ perl -lane 'BEGIN{open(A,"file1"); while(<A>){chomp; @F=split(/\t/);                      push @{$k{$F[0]}},@F[1..$#F];}  }                $k{$F[0]} && print "$F[0]\t@{$k{$F[0]}}\t@F[1..$#F]"' file2

Array solution:
Code:
awk 'FNR==NR{a[$0]=$1;next}{for(i in a)if(a[i]==$1)print i,substr($0,index($0," ")+1)}' file file2

Cheers, BSP Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Merging multiple lines into single line based on one column

I Want to merge multiple lines based on the 1st field and keep into single record. SRC File: AAA_POC_DB.TAB1 AAA_POC_DB.TAB2 AAA_POC_DB.TAB3 AAA_POC_DB.TAB4 BBB_POC_DB.TAB1 BBB_POC_DB.TAB2 CCC_POC_DB.TAB6 OUTPUT ----------------- 'AAA_POC_DB','TAB1','TAB2','TAB3','TAB4'... (10 Replies)
Discussion started by: raju2016
10 Replies

2. Shell Programming and Scripting

Merging multiple lines to columns with awk, while inserting commas for missing lines

Hello all, I have a large csv file where there are four types of rows I need to merge into one row per person, where there is a column for each possible code / type of row, even if that code/row isn't there for that person. In the csv, a person may be listed from one to four times... (9 Replies)
Discussion started by: RalphNY
9 Replies

3. Shell Programming and Scripting

Merging multiple lines

I do have a text file with multiple lines on it. I want to put the lines of text into a single line where ever there is ";" for example ert, ryt, yvig, fgr; rtyu, hjk, uio, hyu, hjo; ghj, tyu, gho, hjp, jklo, kol; The resultant file I would like to have is ert, ryt, yvig, fgr;... (2 Replies)
Discussion started by: Kanja
2 Replies

4. Shell Programming and Scripting

Reading multiple values from multiple lines and columns and setting them to unique variables.

Hello, I would like to ask for help with csh script. An example of an input in .txt file is below, the number of lines varies from file to file and I have 2 or 3 columns with values. I would like to read all the values (probably one by one) and set them to independent unique variables that... (7 Replies)
Discussion started by: FMMOLA
7 Replies

5. UNIX for Dummies Questions & Answers

Merging lines based on one column

Hi, I have a file which I'd like to merge lines based on duplicates in one column while keeping the info for other columns. Let me simplify it by an example: File ESR1 ANASTROZOLE NA FDA_approved ESR1 CISPLATIN NA FDA_approved ESR1 DANAZOL agonist NA ESR1 EXEMESTANE NA FDA_approved... (3 Replies)
Discussion started by: JJ001
3 Replies

6. UNIX for Dummies Questions & Answers

Merging tables: identifiying common and unique elements

Hi all, I know how to merge two tables and to remove the duplicated lines based on a field (Column 2) . My next challenge is to be able to identify in a new column those common elements between table A & B, those elements in table A not present in table B and vice versa. A simple count would be... (6 Replies)
Discussion started by: lsantome
6 Replies

7. Shell Programming and Scripting

Including EOL in egrep pattern for multiple lines

Hi all I need your help to get a high-performance solution. I am working on a extensive script to automate file restores using the bprestore tool on a Solaris 5.10 server (bash 3.00). I will only paste the needed parts of the script to avoid any confusion. To use the script the user has to... (2 Replies)
Discussion started by: Anonym
2 Replies

8. Shell Programming and Scripting

Extracting lines based on identifiers into multiple files respectively

consider the following is the contents of the file cat 11.sql drop procedure if exists hoop1 ; Delimiter $$ CREATE PROCEDURE hoop1(id int) BEGIN END $$ Delimiter ; . . . . drop procedure if exists hoop2; Delimiter $$ CREATE PROCEDURE hoop2(id int) BEGIN END $$ (8 Replies)
Discussion started by: vivek d r
8 Replies

9. UNIX for Dummies Questions & Answers

converting unique identifiers in a column using conversion file

Hello, I often have this problem: I have a file with a column of unique identifiers e.g. file1 below has an id column and data column/columns with p rows: cat data1 dog data2 cow data3 . . . elephant datap-1 horse datap and I have a conversion file,file2, with n<p rows... (4 Replies)
Discussion started by: peanuts48
4 Replies

10. UNIX for Dummies Questions & Answers

Merging Tables by a column

Dear Friends, I really do not know Linux and I really would like to understand it because it does help to work with large data. I am reading this forum for 1 week to try a solution for my problem. I think that, using others post informations, I was almost there... I have 2 big tables... (4 Replies)
Discussion started by: lColli
4 Replies
Login or Register to Ask a Question