Row bind multiple csv files having different column headers


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Row bind multiple csv files having different column headers
# 1  
Old 11-04-2016
Row bind multiple csv files having different column headers

All,
I guess by this time someone asked this kind of question, but sorry I am unable to find after a deep search.
Here is my request
I have many files out of which 2 sample files provided below.
File-1 (with A,B as column headers)
Code:
A,B
1,2

File-2 (with C, D as column headers)
Code:
C,D
4,5

I need the below output
Code:
A,B,C,D
1,2, ,
 , ,4,5

I actually got a perl script to concatenate the files the way I want but it is shuffling the column order randomly.

I am looking for awk based script (preferably)

Thanks in advance
ks_reddy
# 2  
Old 11-04-2016
Why do some of your output fields have <space> as a replacement for data that is not present (row 2, column C; row 3, column A; and row 3, column B) and other output fields have nothing as a replacement for data that is not present (row 2, column D)?

Why isn't the desired output just:
Code:
A,B,C,D
1,2,3,4

?

Are the column headers between the two files always distinct? If not, how do common column headers affect the desired output?

Are there always just one header line and one data line in each input file? If not, how is the output supposed to be ordered? All lines from File-1 followed by all lines from File2? Alternating lines from File-1 and File-2? Does something different happen if the number of lines in the two files differs?
# 3  
Old 11-04-2016
I'm guessing that you might have common column headings somewhere that you want to merge in, but in that case, what rule would you want to apply:-
  • file1 always?
  • file2 always?
  • file2 if value is bigger/smaller?
  • value only if matched?
  • the sequence for records to be added?
  • anything else?

Is this something that could be covered with the paste command perhaps? You might need to prefix your file2 data with ,, to make the column counts match, perhaps with sed -i '2,$ s/^/,,/' file2 or some variation?


Does that help? It still doesn't get over the question of matching columns either and might not do what you want at all. It would help if we had more test data and the various options explained.




Kind regards,
Robin
# 4  
Old 11-04-2016
I suppose you could:-
Code:
{
  printf "%s,%s\n" "$(head -1 file1)" "$(head -1 file2)"
  sed '2,$ s/$/,,/' file1
  sed '2,$ s/^/,,/' file2
} > output_file

It is rather unwieldy and has no logical control options in it, but I think it fits the brief. There must be more to it than this though,......?



Robin
# 5  
Old 11-07-2016
Hi Don,
Thanks for going through my request. I am very sorry in my late reply.
What I provided earlier is a simple case.
Here is the desired output for my previous query.
Code:
A,B,C,D
1,2,,
,,4,5

.
My actual files contain lot of columns (100+) with some of the columns have same names.
Here are modified sample inputs from 2 files.
In actual case I have many files.
Code:
File-1
A,B,C
1,2,3
4,5,6

File-2
Code:
A,C,D
2,3,4
0,1,2

Code:
Desired output
A,B,C,D
1,2,3,
4,5,6,
2,,3,4
0,,1,2

---------- Post updated at 11:57 AM ---------- Previous update was at 11:55 AM ----------

Hello rbattle1,
Yes I do have column headers sometimes between different files that I want to row bind.
Please check my updated post on my requirements.

Thanks
ks_reddy
# 6  
Old 11-07-2016
The following script seems to do what you want:
Code:
#!/bin/ksh
IAm=${0##*/}
if [ $# -lt 2 ]
then	printf 'Usage: %s file1 ... fileN\n' "$IAm" >&2
	exit 1
fi
awk '
BEGIN {	FS = OFS = ","
	# Gather field names from the 1st line of each input file...
	for(i = 1; i < ARGC; i++) {
		if((getline < ARGV[i]) != 1) {
			printf("*** Cannott read header from file \"%s\".\n",
			    ARGV[i])
			continue;
		}
		close(ARGV[i])
		for(j = 1; j <= NF; j++)
			if($j in H)
				continue;
			else {	H[$j] = ++Hc
				OH[Hc] = $j
				# Add to output file header.
				printf("%s%s", HOFS, $j)
				HOFS = OFS
			}
	}
	print ""
}
FNR == 1 {
	# Clear output order list and headers from from previous file.
	for(i in O)
		delete O[i]
	for(i in FH)
		delete FH[i]
	# Determine order in which input fields should be output for this file...
	for(i = 1; i <= NF; i++)
		FH[$i] = i
	for(i = 1; i <= Hc; i++)
		O[i] = (OH[i] in FH) ? FH[OH[i]] : NF + 1
	next
}
{	# Print the data lines for the current input file.
	for(i = 1; i <= Hc; i++)
		printf("%s%s", $O[i], (i == Hc) ? ORS : OFS)
}' "$@"

This was written and tested using a Korn shell, but should work with any shell that process POSIX standard shell variable expansions.

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk. (I don't remember whether or not nawk on Solaris systems includes support for the awk ARGV[] array that is required for this script to work correctly.)
# 7  
Old 11-07-2016
Getting Syntax error.

Hi Don,
Thanks for your code.
When I try to run like this
Code:
awk -f script.awk input1.csv input2.csv

I am getting the below error.
Code:
awk ' BEGIN { FS =OFS=","
    ^ invalid char ''' in expression

I tested this is
Code:
default bash

as well as
Code:
ksh

in Cygwin.

Any clues on debugging this ?

Last edited by ks_reddy; 11-07-2016 at 05:54 AM.. Reason: Added tags for the code part.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How do I extract specific column in multiple csv files?

file1: Name,Threshold,Curr Samples,Curr Error%,Curr ART GETHome,100,21601,0.00%,47 GETregistry,100,21592,0.00%,13 GEThomeLayout,100,30466,0.00%,17 file2: Name,Threshold,Curr Samples,Curr Error%,Curr ART GETHome,100,21601,0.00%,33 GETregistry,100,21592,0.00%,22... (6 Replies)
Discussion started by: Raghuram717
6 Replies

2. UNIX for Beginners Questions & Answers

Select and copy .csv files based on row and column number

Dear UNIX experts, I'm a command line novice working on a Macintosh computer (Bash shell) and have neither found advice that is pertinent to my problem on the internet nor in this forum. I have hundreds of .csv files in a directory. Now I would like to copy the subset of files that contains... (8 Replies)
Discussion started by: rcsapo
8 Replies

3. UNIX for Beginners Questions & Answers

Keep only the closet match of timestamped row (include headers) from file1 to precede file2 row/s

This is a question that is related to one I had last August when I was trying to sort/merge two files by millsecond time column (in this case column 6). The script (below) that helped me last august by RudiC solved the puzzle of sorting/merging two files by time, except it gets lost when the... (0 Replies)
Discussion started by: aachave1
0 Replies

4. Shell Programming and Scripting

How to row bind two text files?

Hi, I have two text files with approximately 6000 rows each. I wish the bind these two files into a single column in a new text file like this: File 1: EQTN AFAF SPACA8 equatorin ... File 2: DA3 DA5 FAM38B2 HsT748 ... (2 Replies)
Discussion started by: bronzyroo
2 Replies

5. Shell Programming and Scripting

Insterting column in csv from multiple files

Hello, I have a spec file that contains a lot of strings that looks like this: PC DELL OptiPlex 3010MT i3 3220/2GB/500GB/DVD-RW/FREE DOS / 5Y NBD Intel i3 3220 (Dual Core, 3.30GHz, 3MB, w/ HD2500 Graphics), 2GB (1x2GB) DDR3 PC3-1600MHz, 500GB HDD SATA III 7200rpm, DVD+/-RW (16x),... (9 Replies)
Discussion started by: g9100
9 Replies

6. Shell Programming and Scripting

Sar -u generates multiple column headers in csv file

Hi All, The below sar -u command generates multiple column headers in csv file Expected output should print column headers only once in the csv file shell script: $cat sar_cpu_EBS.sh #!/bin/bash while ; do sar -u 15 1 | awk '/^/ {print $1,$2,$4,$6,$7}' | tr -s ' ' ',' >>... (6 Replies)
Discussion started by: a1_win
6 Replies

7. Shell Programming and Scripting

Combining multiple column files into one with file name as first row

Hello All, I have several column files like this $cat a_b_s1.xls 1wert 2tg 3asd 4asdf 5asdf $cat c_d_s2.xls 1wert 2tg 3asd 4asdf 5asdf desired put put $cat combined.txt s1 s2 (2 Replies)
Discussion started by: avatar_007
2 Replies

8. Shell Programming and Scripting

Help converting column to row for multiple files

Hi all, I am pretty new at this so be gentle. Also, if there is any chance you could explain what the code you use is actually doing, that would really help me out, Im learning after all :) So I am trying to convert a selected column of numbers from input file1 into a row in output file2 ... (3 Replies)
Discussion started by: StudentServitor
3 Replies

9. Shell Programming and Scripting

Combine Multiple text or csv files column-wise

Hi All I am trying to combine columns from multiple text files into a single file using paste command but the record length being unequal in the different files the data is running over to the closest empty cell on the left. Please see below. What can i do to resolve this ? File 1 File... (15 Replies)
Discussion started by: venky_ibm
15 Replies

10. UNIX for Dummies Questions & Answers

Option in sql script to include column headers when spooling file to .csv format

Can anyone help me how to include COLUMN HEADER when spooling file to .CSV format through SQL statement. Thanks, Akbar (4 Replies)
Discussion started by: s1a2m3
4 Replies
Login or Register to Ask a Question