Merging two tables including multiple ocurrence of column identifiers and unique lines

09-16-2014

Registered User

3, 0

Join Date: Sep 2014

Last Activity: 17 September 2014, 9:50 AM EDT

Posts: 3

Thanks Given: 0

Thanked 0 Times in 0 Posts

Merging two tables including multiple ocurrence of column identifiers and unique lines

I would like to merge two tables based on column 1:

File 1:

Code:

  1    today  
  1    green  
  2    tomorrow  
  3    red

File 2:

Code:

  1    a lot  
  1    sometimes  
  2    at work  
  2    at home  
  2    sometimes  
  3    new  
  4    a lot  
  5    sometimes  
  6    at work

Desired output (file 3):

Code:

  1        today    a lot  
  1        today    sometimes  
  1        green    a lot  
  1        green    sometimes  
  2        tomorrow    at work  
  2        tomorrow    at home  
  2        tomorrow    sometimes  
  3        red    new

I came up with the following:

Code:

    awk -F '[\t]' -v OFS='\t' '{i=$1;$1=x} NR==FNR{A[i]=$0;next} A[i]{print i,$0A[i]}' file2 file1 > file3

However, it gives me only:

Code:

  1        today    sometimes  
  2        tomorrow    sometimes  
  3        red    new

Please note that I would like to have only the lines of file 1 (column 1 as the identifier) but report all matching occurrences in file 2.

Moderator's Comments:

Please use code tags next time for your code and data

Last edited by Don Cragun; 09-16-2014 at 09:05 PM.. Reason: Add more CODE tags.

BSP

View Public Profile for BSP

Find all posts by BSP

09-16-2014

Registered User

11, 0

Join Date: Sep 2008

Last Activity: 12 November 2014, 2:28 AM EST

Location: New Delhi

Posts: 11

Thanks Given: 5

Thanked 0 Times in 0 Posts

[user@host tmp]# join -t" " -1 1 -2 1 file1 file2
1 today a lot
1 today sometimes
1 green a lot
1 green sometimes
2 tomorrow at work
2 tomorrow at home
2 tomorrow sometimes
3 red new

manuswami

View Public Profile for manuswami

Find all posts by manuswami

09-16-2014

Registered User

3, 0

Join Date: Sep 2014

Last Activity: 17 September 2014, 9:50 AM EDT

Posts: 3

Thanks Given: 0

Thanked 0 Times in 0 Posts

Thanks, your answer worked on my example. However, the real files I deal with have reocurrence of specified identifiers from column 1, eg.

File 1:

Code:

1 today 
1 green 
2 tomorrow 
3 red
1 today 
2 tomorrow

File 2: as above

Desired output:

Code:

1 today a lot
1 today sometimes
1 green a lot
1 green sometimes
2 tomorrow at work
2 tomorrow at home
2 tomorrow sometimes
3 red new
1 today a lot
1 today sometimes
2 tomorrow at work
2 tomorrow at home
2 tomorrow sometimes

Any idea?

---------- Post updated at 05:17 PM ---------- Previous update was at 02:03 PM ----------

no one an idea?

Moderator's Comments:

edit by bakunin: first, you were asked to use CODE-tags for your code AND data. Please use them! Second: this is a forum, not a helpdesk! We are neither obliged to answer at all nor have we made any promises. Attempts at speeding the process up might slow it down (because people do not like being urged to help voluntarily) but it won't make it happen any quicker.

First, I am sorry. As you might have noticed I am new to this forum and need time to learn such things. Writing in BOLD and caps might be your attempt to speed this process up but it clearly won't. Second, ..well this one is obvious (The UNIX and Linux Forums), right? The description 'Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.' speaks for itself. Something else?

Last edited by BSP; 09-16-2014 at 02:57 PM..

BSP

View Public Profile for BSP

Find all posts by BSP

09-16-2014

Moderator

3,791, 1,452

Join Date: Oct 2010

Last Activity: 1 August 2020, 1:38 AM EDT

Posts: 3,791

Thanks Given: 183

Thanked 1,452 Times in 1,302 Posts

You were fairly close - you just need append subsequent records to the array and then split when printing:

Not sure what are spaces and what are tabs in the infiles, so I coded for worst case (ie any white space could be a tab)

Code:

awk -F '\t' '
  {key=$1;$1=x;$0=substr($0,2)}
  FNR==NR{A[key]=A[key]"|"$0;next}
  (key in A) {
     c=split(A[key],V,"|")
     for(i=1;i<c;) print key,$0,V[++i]
}' OFS="\t" file2 file1

Chubler_XL

View Public Profile for Chubler_XL

Find all posts by Chubler_XL

09-17-2014

Registered User

3, 0

Join Date: Sep 2014

Last Activity: 17 September 2014, 9:50 AM EDT

Posts: 3

Thanks Given: 0

Thanked 0 Times in 0 Posts

Thank you!

For others who might run into a similar problem I gathered some more options:

Perl solution:

Code:

$ perl -lane 'BEGIN{open(A,"file1"); while(<A>){chomp; @F=split(/\t/);                      push @{$k{$F[0]}},@F[1..$#F];}  }                $k{$F[0]} && print "$F[0]\t@{$k{$F[0]}}\t@F[1..$#F]"' file2

Array solution:

Code:

awk 'FNR==NR{a[$0]=$1;next}{for(i in a)if(a[i]==$1)print i,substr($0,index($0," ")+1)}' file file2

Cheers, BSP

BSP

View Public Profile for BSP

Find all posts by BSP

Shell Programming and Scripting

Merging two tables including multiple ocurrence of column identifiers and unique lines

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Merging multiple lines into single line based on one column

Discussion started by: raju2016

2. Shell Programming and Scripting

Merging multiple lines to columns with awk, while inserting commas for missing lines

Discussion started by: RalphNY

3. Shell Programming and Scripting

Merging multiple lines

Discussion started by: Kanja

4. Shell Programming and Scripting

Reading multiple values from multiple lines and columns and setting them to unique variables.

Discussion started by: FMMOLA

5. UNIX for Dummies Questions & Answers

Merging lines based on one column

Discussion started by: JJ001

6. UNIX for Dummies Questions & Answers

Merging tables: identifiying common and unique elements

Discussion started by: lsantome

7. Shell Programming and Scripting

Including EOL in egrep pattern for multiple lines

Discussion started by: Anonym

8. Shell Programming and Scripting

Extracting lines based on identifiers into multiple files respectively

Discussion started by: vivek d r

9. UNIX for Dummies Questions & Answers

converting unique identifiers in a column using conversion file

Discussion started by: peanuts48

10. UNIX for Dummies Questions & Answers

Merging Tables by a column

Discussion started by: lColli