Combine columns from many files but keep them aligned in columns-shorter left column issue


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Combine columns from many files but keep them aligned in columns-shorter left column issue
# 1  
Old 11-30-2012
Combine columns from many files but keep them aligned in columns-shorter left column issue

Hello everyone,
I searched the forum looking for answers to this but I could not pinpoint exactly what I need as I keep having trouble.
I have many files each having two columns and hundreds of rows.
first column is a string (can have many words) and the second column is a number.The files are tab separated.
Example of some rows of three files:

file1.txt
Code:
Column1 Column2
S1	31.37
S1	31.49
S1	31.74
S10	31.13
S10	31.10
S10	31.29
S11	29.49
S11	29.54
S11	29.25
S12	31.24
S12	31.05
S12	31.50
S13	32.48
S13	31.07
...

file2.txt
Code:
column 1   Column 2
1 CVS IN	23.14
1 CVS IN	23.24
1 CVS IN	23.24
1 CVS IP	21.53
1 CVS IP	21.40
1 CVS IP	21.36
1 WB IN	22.89
1 WB IN	23.02
1 WB IN	23.23
1 WB IP	26.70
1 WB IP	26.50
1 WB IP	26.67
2 CVS IN	23.34
2 CVS IN	23.34
2 CVS IN	23.14
2 CVS IP	21.38
2 CVS IP	21.49
2 CVS IP	21.54

file3.txt
Code:
Column 1 Column2
1 WB IN	23.45
1 WB IN	23.53
1 WB IP	24.55
1 CVS IN	23.62
1 CVS IN	23.46
I CVS IP	21.74
1 WB IN	23.33
1 WB IN	23.42
1 WB IP	26.24
1 CVS IN	23.71
1 CVS IN	23.44
I CVS IP	22.35
1 WB IN	24.75
1 WB IN	24.88
1 WB IP	25.50

I want to to combine the files into one.
I use the following code for the time being:

Code:
paste  $folder/*.txt  | column -s $'\t' -t > folder/output_biorad.gff

The problem is that the first file in the file is smaller than the other ones and the output, when the rows of the first file end gets misaligned.
The columns of the second file shift to the left, and all the others follow etc.
I have many files and this is getting very confusing for the users, who are biologists and do not want to manipulate datasets. When there is a shorter left column the right columns become missaligned.

This is an example output that shows the problem: the problem starts at the line that i show as <----problem. the column I CVS IP should be the third column but it shifts to second.
.......
Code:
8 WB IP	25.53	8 WB IN	22.98	7 WB IN	25.7	3 CVS IN
8 CVS IN	23.44	8 WB IP	28.75	7 WB IP	26.59	3 CVS IP
8 CVS IP	22.25	8 WB IP	28.83	7 WB IP	26.16	3 CVS IP
8 CVS IP	22.37	8 WB IP	29.05	7 WB IP	26.28	3 CVS IP
8 WB IN	24.47	I CVS IP	 	8 CVS IN	25.7	3 WB IN     
8 WB IP	26.05	I CVS IP	 	8 CVS IN	25.31	3 WB IN
8 WB IP	26	I CVS IP	 	8 CVS IN	26.11	3 WB IN
8 CVS IN	25.11	I CVS IP	 	8 CVS IP	25.95	3 WB IN
8 CVS IP	23.19	I CVS IP	 	8 CVS IP	25.48	3 WB IN
8 CVS IP	23.19	I CVS IP	 	8 WB IN	25.83	3 WB IN
8 WB IN	24.18	I CVS IP	 	8 WB IN	26.1	3 WB IP
8 WB IP	33.73	I CVS IP	 	8 WB IN	25.81	3 WB IP
8 WB IP	33.75	I CVS IP	 	8 WB IP	26.74	3 WB IP
8 CVS IN	24.24	I CVS IP	 	8 WB IP	26.73	3 CVS IN
8 CVS IP	23.22	I CVS IP	 	8 WB IP	26.75	3 CVS IN
8 CVS IP	23.83	I CVS IP	 	I CVS IP	 	3 CVS IN
	I CVS IP	 	I CVS IP	 	3 CVS IN	              <--problem
	I CVS IP	 	I CVS IP	 	3 CVS IN	 
	I CVS IP	 	I CVS IP	 	3 CVS IN	 
	I CVS IP	 	I CVS IP	 	3 CVS IP	 
	I CVS IP	 	I CVS IP	 	3 CVS IP	 
	I CVS IP	 	I CVS IP	 	3 CVS IP	 
	I CVS IP	 	I CVS IP	 	3 WB IN	 
	I CVS IP	 	I CVS IP	 	3 WB IN	 
	I CVS IP	 	I CVS IP	 	3 WB IN	 
	I CVS IP	 	I CVS IP	 	3 WB IN	 
	I CVS IP	 	I CVS IP	 	3 WB IN	 
	I CVS IP	 	I CVS IP	 	3 WB IN	 
	I CVS IP	 	I CVS IP	 	3 WB IP	 
	I CVS IP	 	I CVS IP	 	3 WB IP	 
	I CVS IP	 	I CVS IP	 	3 WB IP	 
	I CVS IP	 	I CVS IP	 	3 CVS IN	 
	I CVS IP	 	I CVS IP	 	3 CVS IN	 
	I CVS IP	 	I CVS IP	 	3 CVS IN	 
	I CVS IP	 	I CVS IP	 	3 CVS IN	 
	I CVS IP	 	I CVS IP	 	3 CVS IN	 
	I CVS IP	 	I CVS IP	 	3 CVS IN	 
	I CVS IP	 	I CVS IP	 	3 CVS IP	 
	I CVS IP	 	I CVS IP	 	3 CVS IP	 
	I CVS IP	 	I CVS IP	 	3 CVS IP	 
	4V CVS IN	 	I CVS IP	 	3 WB IN	 
	4V CVS IN	 	I CVS IP	 	3 WB IN	 
	4V CVS IN	 	I CVS IP	 	3 WB IN	 
	4V CVS IN	 	I CVS IP	 	3 WB IP

When the lines from the firs file end all utput is shifted to the left. How can I keep them aligned?
i looked everywhere and I have not been able to find out how to solve it.
I am not entirely familiar with awk, i use perl for my other scripting and i am a beginner in shell scripting.
Many thanks for your help in advance

Last edited by vbe; 11-30-2012 at 09:49 AM.. Reason: use also code tags for your data, it keeps the format...
# 2  
Old 11-30-2012
try:
Code:
awk '
{
(FNR>m)? m=FNR:0;
f[FILENAME]++ ? 0:fc++;
a[fc-1,FNR]=$0;
}
END {
  for (j=1;j<=m;j++) {
     for (i=0;i<fc;i++) {
       printf (a[i,j])? (a[i,j] "\t"):("\t\t");
     }
     print "";
  }
}
' file*.txt > out.xls

This User Gave Thanks to rdrtx1 For This Post:
# 3  
Old 12-03-2012
Thank you so much. This is working and aligning the columns correctly!
If you have time, can youvery very briefly explain how it works, why your for statements manage to align the columns, especially what happens at the printf?
for (j=1;j<=m;j++) {
for (i=0;i<fc;i++) {
printf (a[i,j])? (a[i,j] "\t"):("\t\t");


awk seems to be a very powerful language for file manipulation. How did you learn it? Textbook or website?
Again, thank you so much.
# 4  
Old 12-03-2012
Code:
awk '{
(FNR>m)? m=FNR:0;                                # use file record number as max record counter
f[FILENAME]++ ? 0:fc++;                          # set new file counter (++ fail indicates new file)
a[fc-1,FNR]=$0;                                  # store data in two column,row array (file, record)}
END {  
  for (j=1;j<=m;j++) {                           # record counter loop     
    for (i=0;i<fc;i++) {                         # file counter loop       
       printf (a[i,j])? (a[i,j] "\t"):("\t\t");  # print tab separated data or just tabs if empty     
    }
    print "";                                    # print new line character  
  }
}' file*.txt > out.xls

Learned by reading awk book and samples on the web. This site has most excellent solutions all over. Search forums for examples.

Last edited by rdrtx1; 12-03-2012 at 12:52 PM..
# 5  
Old 12-04-2012
this code not works properly...

Code:
awk: cmd. line:15: fatal: cannot open file `file*.txt' for reading (No such file or directory)

if you try 

a.txt b.txt > out.xls

problem is you need to align again manually...
# 6  
Old 01-23-2013
Right now my input to this code file*.txt > out.xls
outputs the files pasted together but the files are pasted randomly. I want to read the files from my folder alphabetical lexicographical order and paste them together.
is there a way to sort my input files alphabetically before I run this code?
I tried sort file*.txt but it doesnt work, nor does ls-1|folder.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Join, merge, fill NULL the void columns of multiples files like sql "LEFT JOIN" by using awk

Hello, This post is already here but want to do this with another way Merge multiples files with multiples duplicates keys by filling "NULL" the void columns for anothers joinning files file1.csv: 1|abc 1|def 2|ghi 2|jkl 3|mno 3|pqr file2.csv: 1|123|jojo 1|NULL|bibi... (2 Replies)
Discussion started by: yjacknewton
2 Replies

2. Shell Programming and Scripting

Adding columns from 2 files with variable number of columns

I have two files, file1 and file2 who have identical number of rows and columns. However, the script is supposed to be used for for different files and I cannot know the format in advance. Also, the number of columns changes within the file, some rows have more and some less columns (they are... (13 Replies)
Discussion started by: maya3
13 Replies

3. Shell Programming and Scripting

Compare 2 csv files by columns, then extract certain columns of matcing rows

Hi all, I'm pretty much a newbie to UNIX. I would appreciate any help with UNIX coding on comparing two large csv files (greater than 10 GB in size), and output a file with matching columns. I want to compare file1 and file2 by 'id' and 'chain' columns, then extract exact matching rows'... (5 Replies)
Discussion started by: bkane3
5 Replies

4. Shell Programming and Scripting

Merge two columns from two files into one if another column matches

I have two text files that look something like this: A:B:C 123 D:E:F 234 G:H:I 345 J:K:L 123 M:N:O 456 P:Q:R 567 A:B:C 456 D:E:F 567 G:H:I 678 J:K:L 456 M:N:O 789 P:Q:R 890 I want to find the line where the first column matches and then combine the second columns into a single... (8 Replies)
Discussion started by: pbluescript
8 Replies

5. UNIX for Dummies Questions & Answers

Combine columns from 100 files with same structure

Hi, I have a bunch of files with the following format. PUR.1.9 30910 0.024 0.926 0.050 36587 0.024 0.927 0.049 91857 0.023 0.928 0.049 105797 0.024 0.927 0.049 146659 0.024 0.927 0.049 152695 0.024 0.927 0.049 192118 0.022 0.930 0.048 193310 0.018 0.936 0.046 PUR.2.9 30910 0.028... (6 Replies)
Discussion started by: genehunter
6 Replies

6. Shell Programming and Scripting

Merging columns based on one or more column in two files

I have two files. FileA.txt 30910 rs7468327 36587 rs10814410 91857 rs9408752 105797 rs1133715 146659 rs2262038 152695 rs2810979 181843 rs3008128 182129 rs3008131 192118 rs3008170 FileB.txt 30910 1.9415219673 0 36431 1.3351312477 0.0107191428 36587 1.3169171182... (2 Replies)
Discussion started by: genehunter
2 Replies

7. Shell Programming and Scripting

Combine columns from multiple files

Can anybody help on the script to combine/concatenate columns from multiple files input1 4 135 5 185 6 85 11 30 16 72 17 30 21 52 22 76 input2 2 50 4 50 6 33 8 62 10 25 12 46 14 42 15 46output (2 Replies)
Discussion started by: sdf
2 Replies

8. UNIX for Dummies Questions & Answers

How to combine 2 files with 6 columns?

This may seem obvious but I am having problems doing this as columns get converted to rows when i try to write a script. I have 2 files text1.txt and text2.txt each of which have 6 columns of numbers separated by a space. I need to combine the 2 files so that the output file text3.txt maintains... (2 Replies)
Discussion started by: tgoldstone
2 Replies

9. Shell Programming and Scripting

How to combine 2 files into 1 file with 2 columns

Hi Guys, I want to combine 2 files and and put together in 1 file and make two columns out it. See below desired output. Any help will be much appreciated. inputfile1.txt 12345 67890 24580 inputfile2.txt AAAAA BBBBB CCCCC (11 Replies)
Discussion started by: pinpe
11 Replies

10. Shell Programming and Scripting

Output columns needs to be aligned

Hi All I'm running a shell script and the output is something like: Col1 Col2 Col3 aaaa aaaaaaa aaaaa bbbbb bbbbb bbbbbb ccc cccccc ccccccc But I require the output to printed as given below: Col1 Col2 Col3 aaaa ... (4 Replies)
Discussion started by: nkamalkishore
4 Replies
Login or Register to Ask a Question