Combine columns from many files but keep them aligned in columns-shorter left column issue | Unix Linux Forums | Shell Programming and Scripting

  Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Combine columns from many files but keep them aligned in columns-shorter left column issue

Shell Programming and Scripting


Tags
awk, columns combine, perl

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 11-30-2012
isildur1234 isildur1234 is offline
Registered User
 
Join Date: Nov 2012
Last Activity: 24 January 2013, 7:28 AM EST
Posts: 5
Thanks: 3
Thanked 0 Times in 0 Posts
Combine columns from many files but keep them aligned in columns-shorter left column issue

Hello everyone,
I searched the forum looking for answers to this but I could not pinpoint exactly what I need as I keep having trouble.
I have many files each having two columns and hundreds of rows.
first column is a string (can have many words) and the second column is a number.The files are tab separated.
Example of some rows of three files:

file1.txt
Code:
Column1 Column2
S1	31.37
S1	31.49
S1	31.74
S10	31.13
S10	31.10
S10	31.29
S11	29.49
S11	29.54
S11	29.25
S12	31.24
S12	31.05
S12	31.50
S13	32.48
S13	31.07
...

file2.txt
Code:
column 1   Column 2
1 CVS IN	23.14
1 CVS IN	23.24
1 CVS IN	23.24
1 CVS IP	21.53
1 CVS IP	21.40
1 CVS IP	21.36
1 WB IN	22.89
1 WB IN	23.02
1 WB IN	23.23
1 WB IP	26.70
1 WB IP	26.50
1 WB IP	26.67
2 CVS IN	23.34
2 CVS IN	23.34
2 CVS IN	23.14
2 CVS IP	21.38
2 CVS IP	21.49
2 CVS IP	21.54

file3.txt
Code:
Column 1 Column2
1 WB IN	23.45
1 WB IN	23.53
1 WB IP	24.55
1 CVS IN	23.62
1 CVS IN	23.46
I CVS IP	21.74
1 WB IN	23.33
1 WB IN	23.42
1 WB IP	26.24
1 CVS IN	23.71
1 CVS IN	23.44
I CVS IP	22.35
1 WB IN	24.75
1 WB IN	24.88
1 WB IP	25.50

I want to to combine the files into one.
I use the following code for the time being:


Code:
paste  $folder/*.txt  | column -s $'\t' -t > folder/output_biorad.gff

The problem is that the first file in the file is smaller than the other ones and the output, when the rows of the first file end gets misaligned.
The columns of the second file shift to the left, and all the others follow etc.
I have many files and this is getting very confusing for the users, who are biologists and do not want to manipulate datasets. When there is a shorter left column the right columns become missaligned.

This is an example output that shows the problem: the problem starts at the line that i show as <----problem. the column I CVS IP should be the third column but it shifts to second.
.......

Code:
8 WB IP	25.53	8 WB IN	22.98	7 WB IN	25.7	3 CVS IN
8 CVS IN	23.44	8 WB IP	28.75	7 WB IP	26.59	3 CVS IP
8 CVS IP	22.25	8 WB IP	28.83	7 WB IP	26.16	3 CVS IP
8 CVS IP	22.37	8 WB IP	29.05	7 WB IP	26.28	3 CVS IP
8 WB IN	24.47	I CVS IP	 	8 CVS IN	25.7	3 WB IN     
8 WB IP	26.05	I CVS IP	 	8 CVS IN	25.31	3 WB IN
8 WB IP	26	I CVS IP	 	8 CVS IN	26.11	3 WB IN
8 CVS IN	25.11	I CVS IP	 	8 CVS IP	25.95	3 WB IN
8 CVS IP	23.19	I CVS IP	 	8 CVS IP	25.48	3 WB IN
8 CVS IP	23.19	I CVS IP	 	8 WB IN	25.83	3 WB IN
8 WB IN	24.18	I CVS IP	 	8 WB IN	26.1	3 WB IP
8 WB IP	33.73	I CVS IP	 	8 WB IN	25.81	3 WB IP
8 WB IP	33.75	I CVS IP	 	8 WB IP	26.74	3 WB IP
8 CVS IN	24.24	I CVS IP	 	8 WB IP	26.73	3 CVS IN
8 CVS IP	23.22	I CVS IP	 	8 WB IP	26.75	3 CVS IN
8 CVS IP	23.83	I CVS IP	 	I CVS IP	 	3 CVS IN
	I CVS IP	 	I CVS IP	 	3 CVS IN	              <--problem
	I CVS IP	 	I CVS IP	 	3 CVS IN	 
	I CVS IP	 	I CVS IP	 	3 CVS IN	 
	I CVS IP	 	I CVS IP	 	3 CVS IP	 
	I CVS IP	 	I CVS IP	 	3 CVS IP	 
	I CVS IP	 	I CVS IP	 	3 CVS IP	 
	I CVS IP	 	I CVS IP	 	3 WB IN	 
	I CVS IP	 	I CVS IP	 	3 WB IN	 
	I CVS IP	 	I CVS IP	 	3 WB IN	 
	I CVS IP	 	I CVS IP	 	3 WB IN	 
	I CVS IP	 	I CVS IP	 	3 WB IN	 
	I CVS IP	 	I CVS IP	 	3 WB IN	 
	I CVS IP	 	I CVS IP	 	3 WB IP	 
	I CVS IP	 	I CVS IP	 	3 WB IP	 
	I CVS IP	 	I CVS IP	 	3 WB IP	 
	I CVS IP	 	I CVS IP	 	3 CVS IN	 
	I CVS IP	 	I CVS IP	 	3 CVS IN	 
	I CVS IP	 	I CVS IP	 	3 CVS IN	 
	I CVS IP	 	I CVS IP	 	3 CVS IN	 
	I CVS IP	 	I CVS IP	 	3 CVS IN	 
	I CVS IP	 	I CVS IP	 	3 CVS IN	 
	I CVS IP	 	I CVS IP	 	3 CVS IP	 
	I CVS IP	 	I CVS IP	 	3 CVS IP	 
	I CVS IP	 	I CVS IP	 	3 CVS IP	 
	4V CVS IN	 	I CVS IP	 	3 WB IN	 
	4V CVS IN	 	I CVS IP	 	3 WB IN	 
	4V CVS IN	 	I CVS IP	 	3 WB IN	 
	4V CVS IN	 	I CVS IP	 	3 WB IP

When the lines from the firs file end all utput is shifted to the left. How can I keep them aligned?
i looked everywhere and I have not been able to find out how to solve it.
I am not entirely familiar with awk, i use perl for my other scripting and i am a beginner in shell scripting.
Many thanks for your help in advance

Last edited by vbe; 11-30-2012 at 08:49 AM.. Reason: use also code tags for your data, it keeps the format...
Sponsored Links
    #2  
Old 11-30-2012
rdrtx1 rdrtx1 is offline
Registered User
 
Join Date: Sep 2012
Last Activity: 23 September 2014, 12:09 PM EDT
Location: Houston, Texas, USA
Posts: 675
Thanks: 0
Thanked 203 Times in 195 Posts
try:

Code:
awk '
{
(FNR>m)? m=FNR:0;
f[FILENAME]++ ? 0:fc++;
a[fc-1,FNR]=$0;
}
END {
  for (j=1;j<=m;j++) {
     for (i=0;i<fc;i++) {
       printf (a[i,j])? (a[i,j] "\t"):("\t\t");
     }
     print "";
  }
}
' file*.txt > out.xls

The Following User Says Thank You to rdrtx1 For This Useful Post:
isildur1234 (12-03-2012)
Sponsored Links
    #3  
Old 12-03-2012
isildur1234 isildur1234 is offline
Registered User
 
Join Date: Nov 2012
Last Activity: 24 January 2013, 7:28 AM EST
Posts: 5
Thanks: 3
Thanked 0 Times in 0 Posts
Thank you so much. This is working and aligning the columns correctly!
If you have time, can youvery very briefly explain how it works, why your for statements manage to align the columns, especially what happens at the printf?
for (j=1;j<=m;j++) {
for (i=0;i<fc;i++) {
printf (a[i,j])? (a[i,j] "\t"):("\t\t");


awk seems to be a very powerful language for file manipulation. How did you learn it? Textbook or website?
Again, thank you so much.
    #4  
Old 12-03-2012
rdrtx1 rdrtx1 is offline
Registered User
 
Join Date: Sep 2012
Last Activity: 23 September 2014, 12:09 PM EDT
Location: Houston, Texas, USA
Posts: 675
Thanks: 0
Thanked 203 Times in 195 Posts

Code:
awk '{
(FNR>m)? m=FNR:0;                                # use file record number as max record counter
f[FILENAME]++ ? 0:fc++;                          # set new file counter (++ fail indicates new file)
a[fc-1,FNR]=$0;                                  # store data in two column,row array (file, record)}
END {  
  for (j=1;j<=m;j++) {                           # record counter loop     
    for (i=0;i<fc;i++) {                         # file counter loop       
       printf (a[i,j])? (a[i,j] "\t"):("\t\t");  # print tab separated data or just tabs if empty     
    }
    print "";                                    # print new line character  
  }
}' file*.txt > out.xls

Learned by reading awk book and samples on the web. This site has most excellent solutions all over. Search forums for examples.

Last edited by rdrtx1; 12-03-2012 at 11:52 AM..
Sponsored Links
    #5  
Old 12-03-2012
nex_asp nex_asp is offline
Banned
 
Join Date: Dec 2012
Last Activity: 21 May 2013, 2:14 AM EDT
Location: India
Posts: 70
Thanks: 20
Thanked 0 Times in 0 Posts
this code not works properly...


Code:
awk: cmd. line:15: fatal: cannot open file `file*.txt' for reading (No such file or directory)

if you try 

a.txt b.txt > out.xls

problem is you need to align again manually...
Sponsored Links
    #6  
Old 01-23-2013
isildur1234 isildur1234 is offline
Registered User
 
Join Date: Nov 2012
Last Activity: 24 January 2013, 7:28 AM EST
Posts: 5
Thanks: 3
Thanked 0 Times in 0 Posts
Right now my input to this code file*.txt > out.xls
outputs the files pasted together but the files are pasted randomly. I want to read the files from my folder alphabetical lexicographical order and paste them together.
is there a way to sort my input files alphabetically before I run this code?
I tried sort file*.txt but it doesnt work, nor does ls-1|folder.
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Combine columns from 100 files with same structure genehunter UNIX for Dummies Questions & Answers 6 09-04-2012 07:47 AM
Combine columns from multiple files sdf Shell Programming and Scripting 2 05-03-2012 02:16 PM
How to combine 2 files with 6 columns? tgoldstone UNIX for Dummies Questions & Answers 2 03-14-2012 02:13 AM
How to combine 2 files into 1 file with 2 columns pinpe Shell Programming and Scripting 11 06-20-2011 11:46 AM
Output columns needs to be aligned nkamalkishore Shell Programming and Scripting 4 11-06-2009 10:24 AM



All times are GMT -4. The time now is 02:24 AM.