How convert space separated list to matched columns?


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers How convert space separated list to matched columns?
# 1  
Old 07-09-2013
How convert space separated list to matched columns?

Hi
I have been racking my (limited) brains to get this to work without success

I have a file output which is a list of lists - ie a single column of data that is separated by space into sub lists below - I need to both split this so that each list is in a separate column (eg tab or semicolon separated) but then also I need to match up the rows in each column. The data looks like this but with about 2000 sub-lists:

AF10_identified_SNPs.vcf:
Code:
CADAFUAG00000594,53,240575,G,A
CADAFUAG00000594,53,242762,A,G
CADAFUAG00000594,53,243450,T,G
CADAFUAG00003722,94,896971,T,C
CADAFUAG00003722,94,898802,A,C
CADAFUAG00003979,102,144910,A,C
CADAFUAG00003979,102,145332,C,T
CADAFUAG00003979,102,145623,T,C
CADAFUAG00004144,93,1548479,A,G
CADAFUAG00004144,93,1548862,T,C
CADAFUAG00004144,93,1548899,T,A
CADAFUAG00004144,93,1549237,A,G
CADAFUAG00004144,93,1549256,G,A
CADAFUAG00004225,101,81335,T,C

AF1163_identified_SNPs.vcf:
Code:
CADAFUAG00000594,53,240575,G,A
CADAFUAG00000594,53,242762,A,G
CADAFUAG00000594,53,243450,T,G
CADAFUAG00003722,94,898802,A,C
CADAFUAG00003979,102,145628,C,T
CADAFUAG00004144,93,1548335,G,A
CADAFUAG00004144,93,1548479,A,G
CADAFUAG00004144,93,1548862,T,C
CADAFUAG00004144,93,1548899,T,A
CADAFUAG00004144,93,1549237,A,G
CADAFUAG00004144,93,1549256,G,A
CADAFUAG00004225,101,81335,T,C

AF210_identified_SNPs.vcf:
Code:
CADAFUAG00000594,53,239935,A,T
CADAFUAG00000594,53,240575,G,A
CADAFUAG00000594,53,241319,C,T
CADAFUAG00000594,53,242762,A,G
CADAFUAG00000594,53,243450,T,G
CADAFUAG00003722,94,897525,G,A
CADAFUAG00003979,102,145623,T,C
CADAFUAG00004144,93,1548479,A,G
CADAFUAG00004144,93,1548658,T,G
CADAFUAG00004144,93,1548862,T,C
CADAFUAG00004144,93,1548899,T,A
CADAFUAG00004144,93,1549256,G,A
CADAFUAG00004144,93,1549271,C,A
CADAFUAG00005594,100,552059,T,C

any help would be much appreciated
# 2  
Old 07-09-2013
Could you give us an example of what the output should look like?
# 3  
Old 07-09-2013
sorry

if the input is this:
Code:
AF10_identified_SNPs.vcf:
CADAFUAG00000594,53,240575,G,A
CADAFUAG00000594,53,242762,A,G
CADAFUAG00000594,53,243450,T,G
CADAFUAG00003722,94,896971,T,C
CADAFUAG00003722,94,898802,A,C
CADAFUAG00003979,102,144910,A,C
CADAFUAG00003979,102,145332,C,T
CADAFUAG00003979,102,145623,T,C
CADAFUAG00004144,93,1548479,A,G
CADAFUAG00004144,93,1548862,T,C
CADAFUAG00004144,93,1548899,T,A
CADAFUAG00004144,93,1549237,A,G
CADAFUAG00004144,93,1549256,G,A
CADAFUAG00004225,101,81335,T,C
CADAFUAG00005594,100,551772,G,A
CADAFUAG00005594,100,552108,A,G
CADAFUAG00005594,100,552269,A,C
CADAFUAG00005594,100,552368,C,T

AF1163_identified_SNPs.vcf:
CADAFUAG00000594,53,240575,G,A
CADAFUAG00000594,53,242762,A,G
CADAFUAG00000594,53,243450,T,G
CADAFUAG00003722,94,898802,A,C
CADAFUAG00003979,102,145628,C,T
CADAFUAG00004144,93,1548335,G,A
CADAFUAG00004144,93,1548479,A,G
CADAFUAG00004144,93,1548862,T,C
CADAFUAG00004144,93,1548899,T,A
CADAFUAG00004144,93,1549237,A,G
CADAFUAG00004144,93,1549256,G,A
CADAFUAG00004225,101,81335,T,C
CADAFUAG00005594,100,551772,G,T

then I want the output to be this:

Code:
AF10_identified_SNPs.vcf:		AF1163_identified_SNPs.vcf:		AF210_identified_SNPs.vcf:
"CADAFUAG00000594,53,239935,A,T"
"CADAFUAG00000594,53,240575,G,A"	"CADAFUAG00000594,53,240575,G,A"	"CADAFUAG00000594,53,240575,G,A"
"CADAFUAG00000594,53,241319,C,T"
"CADAFUAG00000594,53,242762,A,G"	"CADAFUAG00000594,53,242762,A,G"	"CADAFUAG00000594,53,242762,A,G"
"CADAFUAG00000594,53,243450,T,G"	"CADAFUAG00000594,53,243450,T,G"	"CADAFUAG00000594,53,243450,T,G"
"CADAFUAG00003722,94,897525,G,A"
"CADAFUAG00003722,94,896971,T,C"		
"CADAFUAG00003722,94,898802,A,C"	"CADAFUAG00003722,94,898802,A,C"	
"CADAFUAG00003979,102,144910,A,C"		
"CADAFUAG00003979,102,145332,C,T"		
"CADAFUAG00003979,102,145623,T,C"	"CADAFUAG00003979,102,145623,T,C"
"CADAFUAG00003979,102,145628,C,T"	
"CADAFUAG00004144,93,1548335,G,A"	
"CADAFUAG00004144,93,1548479,A,G"	"CADAFUAG00004144,93,1548479,A,G"	"CADAFUAG00004144,93,1548479,A,G"
"CADAFUAG00004144,93,1548658,T,G"
"CADAFUAG00004144,93,1548862,T,C"	"CADAFUAG00004144,93,1548862,T,C"	"CADAFUAG00004144,93,1548862,T,C"
"CADAFUAG00004144,93,1548899,T,A"	"CADAFUAG00004144,93,1548899,T,A"	"CADAFUAG00004144,93,1548899,T,A"
"CADAFUAG00004144,93,1549237,A,G"	"CADAFUAG00004144,93,1549237,A,G"	
"CADAFUAG00004144,93,1549256,G,A"	"CADAFUAG00004144,93,1549256,G,A"	"CADAFUAG00004144,93,1549256,G,A"
"CADAFUAG00004144,93,1549271,C,A"
"CADAFUAG00004225,101,81335,T,C"	"CADAFUAG00004225,101,81335,T,C"	
"CADAFUAG00005594,100,551772,G,A"	"CADAFUAG00005594,100,551772,G,A"	
"CADAFUAG00005594,100,552108,A,G"		
"CADAFUAG00005594,100,552269,A,C"		
"CADAFUAG00005594,100,552368,C,T"		
"CADAFUAG00005594,100,552059,T,C"
"CADAFUAG00005594,100,552368,C,T"
"CADAFUAG00005594,100,553789,A,G"
"CADAFUAG00005863,98,1433432,C,A"
"CADAFUAG00006029,92,898380,C,T"
"CADAFUAG00006248,99,1119691,T,C"
"CADAFUAG00006248,99,1119692,T,C"
"CADAFUAG00006248,99,1119994,A,G"

many thanks
# 4  
Old 07-09-2013
This is not the prettiest solution, but it should do the trick:

Code:
awk '
    {
        a[$1]=a[$1]?a[$1]","FILENAME:FILENAME
    }
    END{
        for(i in a)
            printf "%-35s %-35s %-35s\n",a[i]~ARGV[1]?i:"",a[i]~ARGV[2]?i:"",a[i]~ARGV[3]?i:""
    }
    ' file1 file2 file3

# 5  
Old 07-09-2013
many thanks Subbeh

this doesnt work for me - I get slightly garbled output in 3 columns and it only shows the rows which have matches in all 3 sub-lists. Am I right thinking that this works for 3 sub- lists but not more? I have hundreds...

I will play around with it and see what I can manage
# 6  
Old 07-09-2013
When I run it using the three files from your first post I get this as the output:

Code:
CADAFUAG00004225,101,81335,T,C      CADAFUAG00004225,101,81335,T,C
CADAFUAG00003979,102,145332,C,T
CADAFUAG00000594,53,240575,G,A      CADAFUAG00000594,53,240575,G,A      CADAFUAG00000594,53,240575,G,A
CADAFUAG00003722,94,896971,T,C
CADAFUAG00004144,93,1549256,G,A     CADAFUAG00004144,93,1549256,G,A     CADAFUAG00004144,93,1549256,G,A
CADAFUAG00000594,53,242762,A,G      CADAFUAG00000594,53,242762,A,G      CADAFUAG00000594,53,242762,A,G
                                                                        CADAFUAG00004144,93,1549271,C,A
                                    CADAFUAG00003979,102,145628,C,T
CADAFUAG00004144,93,1549237,A,G     CADAFUAG00004144,93,1549237,A,G
CADAFUAG00003979,102,144910,A,C
CADAFUAG00000594,53,243450,T,G      CADAFUAG00000594,53,243450,T,G      CADAFUAG00000594,53,243450,T,G
                                                                        CADAFUAG00000594,53,239935,A,T
CADAFUAG00004144,93,1548899,T,A     CADAFUAG00004144,93,1548899,T,A     CADAFUAG00004144,93,1548899,T,A
                                                                        CADAFUAG00000594,53,241319,C,T
CADAFUAG00004144,93,1548862,T,C     CADAFUAG00004144,93,1548862,T,C     CADAFUAG00004144,93,1548862,T,C
                                    CADAFUAG00004144,93,1548335,G,A
CADAFUAG00003979,102,145623,T,C                                         CADAFUAG00003979,102,145623,T,C
                                                                        CADAFUAG00005594,100,552059,T,C
                                                                        CADAFUAG00003722,94,897525,G,A
CADAFUAG00004144,93,1548479,A,G     CADAFUAG00004144,93,1548479,A,G     CADAFUAG00004144,93,1548479,A,G
                                                                        CADAFUAG00004144,93,1548658,T,G
CADAFUAG00003722,94,898802,A,C      CADAFUAG00003722,94,898802,A,C

Three columns for each file. Are you saying that the 2000 sub-lists each need their own column?
# 7  
Old 07-09-2013
that looks right

I get this (Linux):

Code:
CADAFUAG00004144,93,1548862,T,C     CADAFUAG00004144,93,1548862,T,C     CADAFUAG00004144,93,1548862,T,C    
CADAFUAG00000594,53,240575,G,A      CADAFUAG00000594,53,240575,G,A      CADAFUAG00000594,53,240575,G,A     
AF10_identified_SNPs.vcf:           AF10_identified_SNPs.vcf:           AF10_identified_SNPs.vcf:          
CADAFUAG00006248,99,1119692,T,C     CADAFUAG00006248,99,1119692,T,C     CADAFUAG00006248,99,1119692,T,C    
CADAFUAG00005863,98,1433432,C,A     CADAFUAG00005863,98,1433432,C,A     CADAFUAG00005863,98,1433432,C,A    
CADAFUAG00005594,100,552059,T,C     CADAFUAG00005594,100,552059,T,C     CADAFUAG00005594,100,552059,T,C    
CADAFUAG00004225,101,81335,T,C      CADAFUAG00004225,101,81335,T,C      CADAFUAG00004225,101,81335,T,C     
CADAFUAG00006029,92,898380,C,T      CADAFUAG00006029,92,898380,C,T      CADAFUAG00006029,92,898380,C,T     
CADAFUAG00000594,53,239935,A,T      CADAFUAG00000594,53,239935,A,T      CADAFUAG00000594,53,239935,A,T     
CADAFUAG00004144,93,1548335,G,A     CADAFUAG00004144,93,1548335,G,A     CADAFUAG00004144,93,1548335,G,A    
CADAFUAG00005594,100,552108,A,G     CADAFUAG00005594,100,552108,A,G     CADAFUAG00005594,100,552108,A,G    
CADAFUAG00004144,93,1548899,T,A     CADAFUAG00004144,93,1548899,T,A     CADAFUAG00004144,93,1548899,T,A    
AF210_identified_SNPs.vcf:          AF210_identified_SNPs.vcf:          AF210_identified_SNPs.vcf:         
CADAFUAG00005594,100,553789,A,G     CADAFUAG00005594,100,553789,A,G     CADAFUAG00005594,100,553789,A,G    
CADAFUAG00005594,100,552368,C,T     CADAFUAG00005594,100,552368,C,T     CADAFUAG00005594,100,552368,C,T    
CADAFUAG00004144,93,1548479,A,G     CADAFUAG00004144,93,1548479,A,G     CADAFUAG00004144,93,1548479,A,G    
CADAFUAG00006248,99,1119994,A,G     CADAFUAG00006248,99,1119994,A,G     CADAFUAG00006248,99,1119994,A,G    
CADAFUAG00004144,93,1549237,A,G     CADAFUAG00004144,93,1549237,A,G     CADAFUAG00004144,93,1549237,A,G    
CADAFUAG00003722,94,898802,A,C      CADAFUAG00003722,94,898802,A,C      CADAFUAG00003722,94,898802,A,C     
CADAFUAG00004144,93,1548658,T,G     CADAFUAG00004144,93,1548658,T,G     CADAFUAG00004144,93,1548658,T,G    
AF1163_identified_SNPs.vcf:         AF1163_identified_SNPs.vcf:         AF1163_identified_SNPs.vcf:        
CADAFUAG00004144,93,1549256,G,A     CADAFUAG00004144,93,1549256,G,A     CADAFUAG00004144,93,1549256,G,A    
CADAFUAG00003979,102,144910,A,C     CADAFUAG00003979,102,144910,A,C     CADAFUAG00003979,102,144910,A,C    
CADAFUAG00006248,99,1119691,T,C     CADAFUAG00006248,99,1119691,T,C     CADAFUAG00006248,99,1119691,T,C    
CADAFUAG00003979,102,145628,C,T     CADAFUAG00003979,102,145628,C,T     CADAFUAG00003979,102,145628,C,T    
CADAFUAG00000594,53,241319,C,T      CADAFUAG00000594,53,241319,C,T      CADAFUAG00000594,53,241319,C,T     
CADAFUAG00003722,94,896971,T,C      CADAFUAG00003722,94,896971,T,C      CADAFUAG00003722,94,896971,T,C     
CADAFUAG00000594,53,243450,T,G      CADAFUAG00000594,53,243450,T,G      CADAFUAG00000594,53,243450,T,G     
CADAFUAG00004144,93,1549271,C,A     CADAFUAG00004144,93,1549271,C,A     CADAFUAG00004144,93,1549271,C,A    
CADAFUAG00003722,94,897525,G,A      CADAFUAG00003722,94,897525,G,A      CADAFUAG00003722,94,897525,G,A     
CADAFUAG00005594,100,551772,G,A     CADAFUAG00005594,100,551772,G,A     CADAFUAG00005594,100,551772,G,A    
CADAFUAG00005594,100,552269,A,C     CADAFUAG00005594,100,552269,A,C     CADAFUAG00005594,100,552269,A,C    
CADAFUAG00003979,102,145623,T,C     CADAFUAG00003979,102,145623,T,C     CADAFUAG00003979,102,145623,T,C    
CADAFUAG00003979,102,145332,C,T     CADAFUAG00003979,102,145332,C,T     CADAFUAG00003979,102,145332,C,T    
CADAFUAG00000594,53,242762,A,G      CADAFUAG00000594,53,242762,A,G      CADAFUAG00000594,53,242762,A,G

and yes - unfortunately there need to be as many columns as sub-lists - its for a genotyping project where I need to ID an dcompare the hits for each individual

best

Paul

Last edited by Scrutinizer; 07-09-2013 at 11:40 AM.. Reason: code tags
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Convert fixed value fields to comma separated values

Hi All, Hope you are doing Great!!!. Today i have came up with a problem to say exactly it was for performance improvement. I have written code in perl as a solution for this to cut in specific range, but it is taking time to run for files thousands of lines so i am expecting a sed... (9 Replies)
Discussion started by: mad man
9 Replies

2. Shell Programming and Scripting

Convert column to quote and comma separated row

Hi, I have a list of tables in a file.txt C_CLAIM C_HLD C_PROVIDER I want the output to be 'C_CLAIM','C_HLD','C_PROVIDER' Currently I'm usin awk and getting output which is almost correct but still has minor defects awk -vORS="','" '{ print $1 }' file.txt The output of... (4 Replies)
Discussion started by: wahi80
4 Replies

3. Shell Programming and Scripting

Need Help on For Loop to pass space separated value as one value

Hi, I am having a file say list1 with a output like below jun 12 18:23 may 20 18:23 Now i want to pass the above two values into for loop,I have written a script like this. #!/bin/bash a=`cat list1` for i in $a do echo "HI $i" done expected output: HI jun 12 18:23 (3 Replies)
Discussion started by: sumanthupar
3 Replies

4. Shell Programming and Scripting

Matched a pattern from multiple columns

Hi, I need to extract an info in $1 based on a matched pattern in $2,$3,$4, and $5. The sample input file as follows:- ID Pat1 Pat2 Pro1 use1 add41 M M M add87 M M M M add32 ... (16 Replies)
Discussion started by: redse171
16 Replies

5. Shell Programming and Scripting

Convert comma separated file to fix length

Hi, I am converting a comma separated file to fixed field lenght and I am using that: COLUMNS="25 24 67 26 39 63 20 34 35 14 397" ( cat $indir/input_file.dat | \ $AWK -v columns="$COLUMNS" ' BEGIN { FS=","; OFS=""; split(columns, arr, " "); } { for(i=1; i<=NF;... (5 Replies)
Discussion started by: apenkov
5 Replies

6. UNIX for Dummies Questions & Answers

[solved] Comma separated values to space separated

Hi, I have a large number of files which are written as csv (comma-separated values). Does anyone know of simple sed/awk command do achieve this? Thanks! ---------- Post updated at 10:59 AM ---------- Previous update was at 10:54 AM ---------- Guess I asked this too soon. Found the... (0 Replies)
Discussion started by: lost.identity
0 Replies

7. Shell Programming and Scripting

How to loop through space separated values?

How do I loop thru space separated values in a variable? I hate to use very complicated counter increment logic for this kind of simple problem. Expected result(using ksh) $>echo "aaa bbbb cccc" | <looping code here> var=aaa var=bbbb var=cccc $>echo "aaa bbbb cccc" | while IFS=" "... (12 Replies)
Discussion started by: kchinnam
12 Replies

8. Shell Programming and Scripting

Convert a tab separated file using bash

Dear all, I have a file in this format (like a matrix) - A B C .. X A 1 4 2 .. 2 B 2 6 4 .. 8 C 3 5 5 .. 4 . . . ... . X . . ... . and want to convert it into a file with this format: A A = 1 A B = 4 A C = 2 ... A X = 2 B A = 2 B B = 6 etc (2 Replies)
Discussion started by: TheTransporter
2 Replies

9. Shell Programming and Scripting

how to convert a line to columns, separated by | (pipe)

Hi, Plz help. input line 1;20100403;400|2;20100403;4|3;20290903;400|4;20290903;0|5;20290903;0|9;20100304;0|10;20100304;0|11;20100402;0|18;20100304;0 expected output 1;20100403;400 2;20100403;4 3;20290903;400 4;20290903;0 5;20290903;0 9;20100304;0 10;20100304;0 11;20100402;0... (4 Replies)
Discussion started by: suresh3566
4 Replies

10. Shell Programming and Scripting

Convert from CSV to space padded columns (.ksh)

Hello, Could someone please help me to convert a string(s) of comma separated values into space padded columns in .ksh? ex. 10-21-2008,someword,blah,127.0.0.1,8,3 10-21-2008,randomword,ick,128.0.111.128,1,0 converted to 10-21-2008 someword blah 127.0.0.1 8... (6 Replies)
Discussion started by: WhotheWhat
6 Replies
Login or Register to Ask a Question