Concatenate column values when header is Matching from multiple files

09-24-2016

Registered User

21, 0

Join Date: Sep 2016

Last Activity: 6 October 2016, 10:52 AM EDT

Posts: 21

Thanks Given: 23

Thanked 0 Times in 0 Posts

Concatenate column values when header is Matching from multiple files

there can be n number of columns but the number of columns and header name will remain same in all 3 files. Files are tab Delimited.
a.txt

Code:

Name 9/1 9/2
X 1 7
y 2 8
z 3 9
a 4 10
b 5 11
c 6 12

b.xt

Code:

Name  9/1 9/2
X 13 19
y 14 20
z 15 21
a 16 22
b 17 23
c 18 24

c.txt

Code:

Name  9/1 9/2
X 25 31
y 26 32
z 27 33
a 28 34
b 29 35
c 30 36

required output

Code:

Name  9/1 9/2
X 1/13/25 7/19/31
y 2/14/26 8/20/32
z 3/15/27 9/21/33
a 4/16/28 10/22/34
b 5/17/29 11/23/35
c 6/16/30 12/24/36

I Want to merge all three file on the basis of matching header and if values are same in the row then print one value and if different then concatenate values from all three files under that column. like in file a.txt,b.txt,c.txt, row 2 has same value under column name "NAME" so print only X but column "9/1" has different values so print one header " 9/1" and under it print all values from three files e.g. 1/13/25.

Code:

join <(sort a.txt) <(sort b.txt) <(sort c.txt) >out.txt

Code:

 awk 'FNR==NR{a[$1]=$2;next} ($1 in a){ print $0, a[$1]}' a.txt b.txt c.txt

the above commands are not concatenating the values of same headers but it is printing all the rows from file a then all rows from file b then c

Last edited by Nina2910; 09-24-2016 at 01:35 AM..

Nina2910

View Public Profile for Nina2910

Find all posts by Nina2910

09-24-2016

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Hello Nina2910,

Welcome to forums, hope you will enjoy learning/sharing knowledge here. Could you please try following and let me know how it goes.

Code:

awk 'function get(a,b,array1){array1[a]=a in array1?array1[a]"/"b:b} FNR==NR{A[$1]=$2;B[$1]=$3;S[++i]=$1;next} FNR!=NR && FNR>1{get($1,$2,A);get($1,$3,B)} END{for(j=1;j<=i;j++){print S[j], A[S[j]],B[S[j]]}}' *.txt

If you have more txt files than a.txt,b.txtandc.txt then you could use following.

Code:

awk 'function get(a,b,array1){array1[a]=a in array1?array1[a]"/"b:b} FNR==NR{A[$1]=$2;B[$1]=$3;S[++i]=$1;next} FNR!=NR && FNR>1{get($1,$2,A);get($1,$3,B)} END{for(j=1;j<=i;j++){print S[j], A[S[j]],B[S[j]]}}'  a.txt  b.txt  c.txt

Output will be as follows.

Code:

Name 9/1 9/2
X 1/13/25 7/19/31
y 2/14/26 8/20/32
z 3/15/27 9/21/33
a 4/16/28 10/22/34
b 5/17/29 11/23/35
c 6/18/30 12/24/36

EDIT: Adding a non-one liner form of solution.

Code:

awk 'function get(a,b,array1){
				array1[a]=a in array1?array1[a]"/"b:b
		             } 
     FNR==NR                 {
				A[$1]=$2;
				B[$1]=$3;
				S[++i]=$1;
				next
                             } 
     FNR!=NR && FNR>1        {
				get($1,$2,A);
				get($1,$3,B)
			     } 
     END                     {
				for(j=1;j<=i;j++){
							print S[j], A[S[j]],B[S[j]]
						 }
                             }
    ' *.txt

Thanks,
R. Singh

Last edited by RavinderSingh13; 09-24-2016 at 02:17 AM.. Reason: Adding a non-one liner form of solution successfully too now.

This User Gave Thanks to RavinderSingh13 For This Post:

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

09-24-2016

Registered User

21, 0

Join Date: Sep 2016

Last Activity: 6 October 2016, 10:52 AM EDT

Posts: 21

Thanks Given: 23

Thanked 0 Times in 0 Posts

Thank you so much Ravinder for welcoming me and giving the solution. It worked as expected

Two requests
1) could you please let me know if each file has n number of columns for example 32 columns in files a 32 columns in file b and 32 columns in file c. column header and column number will remain same in each file.how can I get the desired result.

FIles are Tab delimited
a.txt

Code:

Name 9/1 9/2 9/3.................9/30

b.txt

Code:

 Name 9/1 9/2 9/3....................9/30

c.txt

Code:

Name 9/1 9/2 9/3......................9/30

2) can you please explain the code sorry still learning

Last edited by Nina2910; 09-24-2016 at 02:35 AM..

Nina2910

View Public Profile for Nina2910

Find all posts by Nina2910

09-24-2016

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Hello Nina,

So number of columns were not mentioned before so solution was according to it, so following may help you with dynamic number of columns but condition is Input_files should have same number of columns(based on row vice), if this condition is TRUE(which seems to be as per your post) then following may help you in same. Let's say following are the Input_files.

Code:

cat a.txt
Name 9/1 9/2
X 1 7 7 8 9 10 12 12 12 12 12 13
y 2 8 3 4 5 6 7 8 8 9 1 212 12 12
z 3 9 1 1 2
a 4 10 1
b 5 11
c 6 12 12 13 14

cat b.txt
Name 9/1 9/2
X 1 7 7 8 9 10 12 12 12 12 12 13
y 22 4 13 2 53 26 17 18 82 19 111 12 13 4
z 3 9 1 1 2
a 4 10 1
b 5 11
c 6 12 12 13 14

cat c.txt
Name 9/1 9/2
X 1 7 7 8 9 10 12 12 12 12 12 13
y 2 8 3 4 5 6 7 8 8 9 1 212 12 12
z 3 9 1 1 2
a 4 10 1
b 5 11
c 6 12 12 13 14

Then following is the code for same.

Code:

paste *.txt | awk 'function get(field){q=NF/3;for(i=field;i<=NF;i+=q){$field=i>field?$field "/" $i:$field;}} NR>1{for(j=2;j<=NF/3;j++){get(j)};for(j=NF/3+1;j<=NF;j++){$j=""};sub(/[[:space:]]+$/,X,$0);print}'

Output will be as follows.

Code:

X 1/1/1 7/7/7 7/7/7 8/8/8 9/9/9 10/10/10 12/12/12 12/12/12 12/12/12 12/12/12 12/12/12 13/13/13                          
y 2/22/2 8/4/8 3/13/3 4/2/4 5/53/5 6/26/6 7/17/7 8/18/8 8/82/8 9/19/9 1/111/1 212/12/212 12/13/12 12/4/12                              
z 3/3/3 9/9/9 1/1/1 1/1/1 2/2/2            
a 4/4/4 10/10/10 1/1/1        
b 5/5/5 11/11/11      
c 6/6/6 12/12/12 12/12/12 13/13/13 14/14/14

Also in above code wherever I have used NF/3 so it is because of 3 Input_files are there if they vary in numbers then you could change according to their number. Kindly do let me know if this helps you or if you have any issues on same.
EDIT: Adding a non-one liner form of solution too here.

Code:

paste *.txt | awk 'function get(field){
					q=NF/3;
					for(i=field;i<=NF;i+=q){
								$field=i>field?$field "/" $i:$field;
							       }
                                      } 
                   NR>1               {
					for(j=2;j<=NF/3;j++)   {
								get(j)
                                                               };
					for(j=NF/3+1;j<=NF;j++){
								$j=""
							       };
                                        sub(/[[:space:]]+$/,X,$0);
					print
 				      }
                  '

Thanks,
R. Singh

Last edited by RavinderSingh13; 09-24-2016 at 04:06 AM.. Reason: Adding a non-one liner form of solution on same too succcessfully now.

This User Gave Thanks to RavinderSingh13 For This Post:

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

09-24-2016

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

You could also try the following...

Note that you say that your files are <tab> delimited, but all of the sample data you have shown us uses one or two <space> characters to separate fields; not a <tab> character.

The following will work with input files with fields separated by one or more blanks (where a blank is a <space> or a <tab>). This will not work if you have input files that really do use a <tab> as a field separator and some of your field data contains a <space>. It will work with any number of input files. It will work with any number of fields in a line (as long as all files have the same number of fields). The output header is taken from the 1st line of the 1st input file. The first line of all other input files are ignored. If the name on a data line is the same as the name on the header line in the 1st input file, that data will be merged into the output header line (i.e., it is assumed that the name used in the header in the 1st input file is not used as a name on any non-header line in any of the input files). Output fields will be separated by a <tab> character. The names of the files to be processed are not built into this script, they must be supplied as command-line arguments to the following script:

Code:

#!/bin/ksh
awk '	# Use the awk utility to interpret the following script...
BEGIN {	# Set output field separator.
	OFS = "\t"
}
NR == 1 || FNR > 1 {
	# Gather data from the 1st line in the 1st file (the header is supposed
	# to be the same in all input files) and from the 2nd line on in every
	# input file...
	# If we have not seen the name found in the first field before...
	if(!($1 in name)) {
		# Add the 1st first to the list of known names, increment the
		# number of names we have seen, and note the output line number
		# where this name should appear in the output....
		name[order[++nc] = $1]
		# and initialize the data for each output field for this name
		# from the corresponding input fields on this line.
		for(i = 2; i <= NF; i++)
			d[$1, i] = $i
	} else	# And if we have seen this name before, add data to be output
		# for this name to the accumalated data we have seen before for
		# this name.
		for(i = 2; i <= NF; i++)
			d[$1, i] = d[$1, i] "/" $i
}
END {	# Now that we have hit EOF on the last input file, print the accumulated
	# output.  For each name seen...
	for(i = 1; i <= nc; i++) {
		# Print the name...
		printf("%s", order[i])
		# and for the remaining fields...
		for(j = 2; j <= NF; j++)
			# print the output field separator followed by the
			# accumulated data for this name and field number.
			printf("%s%s", OFS, d[order[i], j])
		# and after the last field has been printed, add an aoutput
		# record separator.
		print ""
	}
}' "$@"	# Terminate the awk script and use the command line arguments as the
	# list of files to be processed.

This was written and tested using a Korn shell, but will work with any shell that uses Bourne shell syntax. If you save this script in a file named merger and make it executable:

Code:

chmod +x merger

and execute it with the pathnames of your sample input files:

Code:

./merger a.txt b.txt c.txt

it produces the output:

Code:

Name	9/1	9/2
X	1/13/25	7/19/31
y	2/14/26	8/20/32
z	3/15/27	9/21/33
a	4/16/28	10/22/34
b	5/17/29	11/23/35
c	6/18/30	12/24/36

Note that the output you said you wanted on the last line of the output was:

Code:

c 6/16/30 12/24/36

which, in addition to using <space> as a field separator instead of <tab>, also has 16 as the data from the 2nd column of the last line in b.txt instead of the value 18 that was contained in that field in your sample input file.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

09-24-2016

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Hello Nina2910,

So in previous code of mine(POST#4)we need to manually provide number of .txt file into code(where we are using NF/3). So converting code into a script and skipping pain of manually putting number of .txt files into following script, which is successfully run and tested in BASH. One more thing my code not checking about hearders of any of the Input_files.

Code:

cat script.ksh
COUNT=$(ls *.txt | wc -l)
paste *.txt | awk -vcount="$COUNT" 'function get(field){
							q=NF/count;
							for(i=field;i<=NF;i+=q){
										$field=i>field?$field "/" $i:$field;
							       		       }
                                                       } 
                   NR>1               {
					for(j=2;j<=NF/count;j++)   {
								get(j)
                                                               };
					for(j=NF/count+1;j<=NF;j++){
								$j=""
							       };
                                        sub(/[[:space:]]+$/,X,$0);
					print
 				      }
                  '

Thanks,
R. Singh

Last edited by RavinderSingh13; 09-24-2016 at 04:45 AM..

This User Gave Thanks to RavinderSingh13 For This Post:

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

09-24-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

For any number or order of fields, any order of lines, missing fields across files, try

Code:

awk '
FNR == 1        {for (n=split($0, T); n>1; n--) if (!(T[n] in HDK)) HDK[T[n]] = ++HDC
                 F1 = T[1]
                 next
                }

!($1 in NM)     {NM[$1]
                 NMS[++SQ] = $1
                }

                {for (i=2; i<=NF; i++) NC[$1,T[i]] = NC[$1,T[i]] (NC[$1,T[i]]?"/":"") $i
                }

END             {printf "%s", F1
                 for (h in HDK) printf "\t%s", h
                 printf RS

                 for (i=1; i<=SQ; i++)  {printf "%s", NMS[i]
                                         for (h in HDK) printf "\t%s", NC[NMS[i],h]
                                         printf RS
                                        } 
                }
'  file[123]
Name	9/1	9/2	9/3
X	1/25	7/19/31	13
y	2/26	8/20/32	14
z	3/27	9/33	
a	4/28	10/22/34	16
b	5/29	11/23/35	17
c	6/30	12/24/36	18
Z		21	15

The second input file has an upper case "Z" as a name, and "9/3" as a new and unique column header. Output fields are separated by <TAB> chars, the misleading output picture is due to the varying length of Col 2...

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

UNIX for Beginners Questions & Answers

Concatenate column values when header is Matching from multiple files

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Comparing same column from two files, printing whole row with matching values

Discussion started by: mitabrev83

2. Shell Programming and Scripting

Concatenate values in the first column based on the second column.

Discussion started by: shoaibjameel123

3. Shell Programming and Scripting

Extracting values based on line-column numbers from multiple text files

Discussion started by: Bastami

4. Shell Programming and Scripting

Sum column values matching other field

Discussion started by: alpha_1

5. Shell Programming and Scripting

Sum values of specific column in multiple files, considering ranges defined in another file

Discussion started by: Bastami

6. Shell Programming and Scripting

Compare values in two files. For matching rows print corresponding values from File 1 in File2.

Discussion started by: Santoshbn

7. UNIX for Dummies Questions & Answers

shift values in one column as header for values in another column

Discussion started by: Unilearn

8. UNIX for Dummies Questions & Answers

Rename a header column by adding another column entry to the header column name

Discussion started by: Vavad

9. Shell Programming and Scripting

Rename a header column by adding another column entry to the header column name URGENT!!

Discussion started by: Vavad

10. Shell Programming and Scripting

Joining multiple files based on one column with different and similar values (shell or perl)

Discussion started by: seqbiologist