merge multiple tables with perl


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting merge multiple tables with perl
# 1  
Old 09-14-2011
Question merge multiple tables with perl

Hi everyone,

I once again got stuck with merging tables and was wondering if someone could help me out on that problem.

I have a number of tab delimited tables which I need to merge into one big one. All tables have the same header but a different number of rows (this could be changed if it would be easier). I would like to merge them according to the first 3 columns ("chromo", "pos", "ref"), All the following columns ( "alleles", "refAllele", "refCount", "refFreq", "altAllele", "altCount", "altFreq") should be added after each other. Preferably there would also be an indication in the final file to which sample the columns belong.

Four short examples how the tables look like:

SampleA

chromo pos ref alleles refAllele refCount refFreq altAllele altCount altFreq
chr1 30146 A A A 31 100 NA 0 NA
chr1 55217 G G G 2 100 NA 0 NA
chr1 55223 C C C 2 100 NA 0 NA
chr1 55987 C C C 19 100 NA 0 NA
chr1 62138 T T T 114 100 NA 0 NA
chr1 62233 A A A 110 100 NA 0 NA
chr1 64310 A A A 64 100 NA 0 NA
chr1 64321 A A A 17 100 NA 0 NA
chr1 64377 A A A 56 98 NA 1 NA

SampleB
chromo pos ref alleles refAllele refCount refFreq altAllele altCount altFreq
chr1 30146 A A A 10 100 NA 0 NA
chr1 55217 G G G 2 100 NA 0 NA
chr1 55987 C C C 8 100 NA 0 NA
chr1 62138 T C 0 0 C 10 100
chr1 62233 A A A 34 100 NA 0 NA
chr1 64310 A A A 37 100 NA 0 NA
chr1 64321 A A A 9 100 NA 0 NA
chr1 64377 A A A 27 100 NA 0 NA
chr1 65570 A C 0 0 C 2 100

SampleC
chromo pos ref alleles refAllele refCount refFreq altAllele altCount altFreq
chr1 30146 A A A 54 100 NA 0 NA
chr1 55217 G A/G 0 0 A 5 55
chr1 55223 C T/C 0 0 T 4 57
chr1 55987 C C C 17 100 NA 0 NA
chr1 56065 T T T 18 90 NA 2 NA
chr1 62138 T T/C T 19 70 C 8 29
chr1 62233 A G/A 0 0 G 16 66
chr1 64310 A A A 28 100 NA 0 NA
chr1 64321 A C 0 0 C 4 100

SampleD
chromo pos ref alleles refAllele refCount refFreq altAllele altCount altFreq
chr1 30146 A A A 23 100 NA 0 NA
chr1 55217 G G G 2 100 NA 0 NA
chr1 55223 C C C 2 100 NA 0 NA
chr1 55987 C C C 19 100 NA 0 NA
chr1 62138 T C 0 0 C 38 100
chr1 62233 A A A 108 100 NA 0 NA
chr1 64377 A A A 2 100 NA 0 NA
chr1 65570 A A A 3 100 NA 0 NA
chr1 66577 T T T 45 100 NA 0 NA
How the header of the merged table should look like:
 
chromo pos ref alleles.SampleA refAllele.SampleA refCount.SampleA refFreq.SampleA altAllele.SampleA altCount.SampleA altFreq.SampleA alleles.SampleB refAllele.SampleB refCount.SampleB refFreq.SampleB altAllele.SampleB altCount.SampleB altFreq.SampleB alleles.SampleC refAllele.SampleC refCount.SampleC refFreq.SampleC altAllele.SampleC altCount.SampleC altFreq.SampleC alleles.SampleD refAllele.SampleD refCount.SampleD refFreq.SampleD altAllele.SampleD altCount.SampleD altFreq.SampleD

I tried it using R, but I couldn't really get there.

R
Code:
DF1<-read.table(file="/home/maja/Desktop/R/2836_SNPtable_CLC_stringent.txt", sep = "\t", header=TRUE, fill=TRUE) 
DF2<-read.table(file="/home/maja/Desktop/R/2838_SNPtable_CLC_stringent.txt", sep = "\t",header=TRUE, fill=TRUE) 
DF3<-read.table(file="/home/maja/Desktop/R/2840_SNPtable_CLC_stringent.txt", sep = "\t",header=TRUE, fill=TRUE)
DF4<-read.table(file="/home/maja/Desktop/R/5039_SNPtable_CLC_stringent.txt", sep = "\t",header=TRUE, fill=TRUE)

my.list <- list(DF1, DF2, DF3, DF4)
 
DF <- DF1
for ( .df in my.list ) {
  DF <-merge(DF,.df,by=c("chromo","pos","ref"), all=T)
 }

Error in match.names(clabs, names(xi)) :
names do not match previous names


Any help would be greatly appreciated!!
# 2  
Old 09-14-2011
Hi,

Test next Perl solution:
Code:
$ ls -1 *SNPtable*
2836_SNPtable_CLC_stringent.txt
2838_SNPtable_CLC_stringent.txt
2840_SNPtable_CLC_stringent.txt
5039_SNPtable_CLC_stringent.txt
$ cat script.pl
use warnings;
use strict;

@ARGV >= 1 or die qq[Usage: perl $0 file1 [file2] [file3] ...\n];

my $suffix_word = qq[Sample];
my $suffix_letter = qq[A];

my $argc = @ARGV;
my @header_in = split /\s+/, scalar <>;

my $header_out = "@header_in[0..2]" . qq[ ];
for ( 1 .. $argc ) {
        $header_out .= join( qq[ ], map { $_ . qq[.] . $suffix_word . $suffix_letter } @header_in[3..$#header_in] ) . qq[ ];
        ++$suffix_letter;
}

printf "%s\n", $header_out;

while ( <> ) {
        next if $. == 1;
        print;
} continue {
        close ARGV if eof;
}
$ perl script.pl
Usage: perl script.pl file1 [file2] [file3] ...
$ perl script.pl 2836_SNPtable_CLC_stringent.txt 2838_SNPtable_CLC_stringent.txt 2840_SNPtable_CLC_stringent.txt  5039_SNPtable_CLC_stringent.txt 
chromo pos ref alleles.SampleA refAllele.SampleA refCount.SampleA refFreq.SampleA altAllele.SampleA altCount.SampleA altFreq.SampleA alleles.SampleB refAllele.SampleB refCount.SampleB refFreq.SampleB altAllele.SampleB altCount.SampleB altFreq.SampleB alleles.SampleC refAllele.SampleC refCount.SampleC refFreq.SampleC altAllele.SampleC altCount.SampleC altFreq.SampleC alleles.SampleD refAllele.SampleD refCount.SampleD refFreq.SampleD altAllele.SampleD altCount.SampleD altFreq.SampleD 
chr1 30146 A A A 31 100 NA 0 NA
chr1 55217 G G G 2 100 NA 0 NA
chr1 55223 C C C 2 100 NA 0 NA
chr1 55987 C C C 19 100 NA 0 NA
chr1 62138 T T T 114 100 NA 0 NA
chr1 62233 A A A 110 100 NA 0 NA
chr1 64310 A A A 64 100 NA 0 NA
chr1 64321 A A A 17 100 NA 0 NA
chr1 64377 A A A 56 98 NA 1 NA
chr1 30146 A A A 10 100 NA 0 NA
chr1 55217 G G G 2 100 NA 0 NA
chr1 55987 C C C 8 100 NA 0 NA
chr1 62138 T C 0 0 C 10 100
chr1 62233 A A A 34 100 NA 0 NA
chr1 64310 A A A 37 100 NA 0 NA
chr1 64321 A A A 9 100 NA 0 NA
chr1 64377 A A A 27 100 NA 0 NA
chr1 65570 A C 0 0 C 2 100
chr1 30146 A A A 54 100 NA 0 NA
chr1 55217 G A/G 0 0 A 5 55
chr1 55223 C T/C 0 0 T 4 57
chr1 55987 C C C 17 100 NA 0 NA
chr1 56065 T T T 18 90 NA 2 NA
chr1 62138 T T/C T 19 70 C 8 29
chr1 62233 A G/A 0 0 G 16 66
chr1 64310 A A A 28 100 NA 0 NA
chr1 64321 A C 0 0 C 4 100
chr1 30146 A A A 23 100 NA 0 NA
chr1 55217 G G G 2 100 NA 0 NA
chr1 55223 C C C 2 100 NA 0 NA
chr1 55987 C C C 19 100 NA 0 NA
chr1 62138 T C 0 0 C 38 100
chr1 62233 A A A 108 100 NA 0 NA
chr1 64377 A A A 2 100 NA 0 NA
chr1 65570 A A A 3 100 NA 0 NA
chr1 66577 T T T 45 100 NA 0 NA

Regards,
Birei
# 3  
Old 10-12-2011
Merging sideways

Hi birei,
currently your script merges the files in a vertical fashion:

chromo pos ref alleles.SampleA refAllele.SampleA refCount.SampleA refFreq.SampleA altAllele.SampleA altCount.SampleA altFreq.SampleA alleles.SampleB refAllele.SampleB refCount.SampleB refFreq.SampleB altAllele.SampleB altCount.SampleB altFreq.SampleB alleles.SampleC refAllele.SampleC refCount.SampleC refFreq.SampleC altAllele.SampleC altCount.SampleC altFreq.SampleC alleles.SampleD refAllele.SampleD refCount.SampleD refFreq.SampleD altAllele.SampleD altCount.SampleD altFreq.SampleD file 1
file 1
.
.
.
file 2
file 2 file 2 and so on....

I was wondering is there an easy way where the fields from file 2 are listed next to that of file 1?
If there is an easy fix to this that would be super wonderful. Thanks.
# 4  
Old 10-12-2011
Maybe something like this?

Code:
$
$
$ perl -F"\t" -lane 'BEGIN {@hdr = qw(chromo pos ref
                                    alleles.SampleA refAllele.SampleA refCount.SampleA refFreq.SampleA
                                    altAllele.SampleA altCount.SampleA altFreq.SampleA
                                    alleles.SampleB refAllele.SampleB refCount.SampleB refFreq.SampleB
                                    altAllele.SampleB altCount.SampleB altFreq.SampleB
                                    alleles.SampleC refAllele.SampleC refCount.SampleC refFreq.SampleC
                                    altAllele.SampleC altCount.SampleC altFreq.SampleC
                                    alleles.SampleD refAllele.SampleD refCount.SampleD refFreq.SampleD
                                    altAllele.SampleD altCount.SampleD altFreq.SampleD)
                           }
                     $x{join"\t",@F[0..2]}.="\t".join"\t",@F[3..9];
                     END
                     {
                       print join " ", @hdr;
                       foreach $k (sort keys %x) {print "$k\t$x{$k}" if $k !~ /chromo/}
                     }' samplea sampleb samplec sampled
chromo pos ref alleles.SampleA refAllele.SampleA refCount.SampleA refFreq.SampleA altAllele.SampleA altCount.SampleA altFreq.SampleA alleles.SampleB refAllele.SampleB refCount.SampleB refFreq.SampleB altAllele.SampleB altCount.SampleB altFreq.SampleB alleles.SampleC refAllele.SampleC refCount.SampleC refFreq.SampleC altAllele.SampleC altCount.SampleC altFreq.SampleC alleles.SampleD refAllele.SampleD refCount.SampleD refFreq.SampleD altAllele.SampleD altCount.SampleD altFreq.SampleD
chr1    30146   A               A       A       31      100     NA      0       NA      A       A       10      100     NA      0       NA      A    NA0
chr1    55217   G               G       G       2       100     NA      0       NA      G       G       2       100     NA      0       NA      A/G  NA0
chr1    55223   C               C       C       2       100     NA      0       NA      T/C     0       0       T       4       57              C    NA0
chr1    55987   C               C       C       19      100     NA      0       NA      C       C       8       100     NA      0       NA      C    NA0
chr1    56065   T               T       T       18      90      NA      2       NA
chr1    62138   T               T       T       114     100     NA      0       NA      C       0       0       C       10      100             T/C  100
chr1    62233   A               A       A       110     100     NA      0       NA      A       A       34      100     NA      0       NA      G/A  NA0
chr1    64310   A               A       A       64      100     NA      0       NA      A       A       37      100     NA      0       NA      A    NA0
chr1    64321   A               A       A       17      100     NA      0       NA      A       A       9       100     NA      0       NA      C    100
chr1    64377   A               A       A       56      98      NA      1       NA      A       A       27      100     NA      0       NA      A    NA0
chr1    65570   A               C       0       0       C       2       100             A       A       3       100     NA      0       NA
chr1    66577   T               T       T       45      100     NA      0       NA
$
$
$
$

tyler_durden
# 5  
Old 10-13-2011
Hi tyler,
I used your instructions as follows:
C:\Perl>perl -F"\t" -lane 'BEGIN {@hdr = qw(chromo pos refalleles.SampleA refAllele.SampleA refCount.SampleA refFreq.SampleA altAllele.SampleA altCount.SampleA altFreq.SampleA alleles.SampleB refAllele.SampleB refCount.SampleB refFreq.SampleB altAllele.SampleB altCount.SampleB altFreq.SampleB alleles.SampleC refAllele.SampleC refCount.SampleC refFreq.SampleC altAllele.SampleC altCount.SampleC altFreq.SampleC alleles.SampleD refAllele.SampleD refCount.SampleD refFreq.SampleD altAllele.SampleD altCount.SampleD altFreq.SampleD)} $x{join"\t",@F[0..2]}.="\t".join"\t",@F[3..9]; { print join " ", @hdr; foreach $k (sort keys %x) {print "$k\t$x{$k}" if $k !~ /chromo/}}' samplea sampleb samplec sampled

But I get the following error message:

Can't find string terminator "'" anywhere before EOF at -e line 1.

C:\Perl>
# 6  
Old 10-14-2011
Quote:
Originally Posted by birkhe
Hi tyler,
I used your instructions as follows:
C:\Perl>perl -F"\t" -lane 'BEGIN {@hdr = qw(chromo pos refalleles.SampleA refAllele.SampleA refCount.SampleA refFreq.SampleA altAllele.SampleA altCount.SampleA altFreq.SampleA alleles.SampleB refAllele.SampleB refCount.SampleB refFreq.SampleB altAllele.SampleB altCount.SampleB altFreq.SampleB alleles.SampleC refAllele.SampleC refCount.SampleC refFreq.SampleC altAllele.SampleC altCount.SampleC altFreq.SampleC alleles.SampleD refAllele.SampleD refCount.SampleD refFreq.SampleD altAllele.SampleD altCount.SampleD altFreq.SampleD)} $x{join"\t",@F[0..2]}.="\t".join"\t",@F[3..9]; { print join " ", @hdr; foreach $k (sort keys %x) {print "$k\t$x{$k}" if $k !~ /chromo/}}' samplea sampleb samplec sampled

But I get the following error message:

Can't find string terminator "'" anywhere before EOF at -e line 1.

C:\Perl>
Your code is unreadable, please reformat your code and use code tags.
# 7  
Old 10-14-2011
Quote:
Originally Posted by birkhe
...
But I get the following error message:
Can't find string terminator "'" anywhere before EOF at -e line 1.
C:\Perl>
I ran the script in Unix, whereas you are using Windows (I assume, upon noticing the "C:\Perl>" prompt).

Quoting for the scripts fed to the perl interpreter is different in Unix than Windows. You may want to try the following in Windows:

Code:
perl -F'\t' -lane "BEGIN {@hdr = qw(chromo pos ref
                                    alleles.SampleA refAllele.SampleA refCount.SampleA refFreq.SampleA
                                    altAllele.SampleA altCount.SampleA altFreq.SampleA
                                    alleles.SampleB refAllele.SampleB refCount.SampleB refFreq.SampleB
                                    altAllele.SampleB altCount.SampleB altFreq.SampleB
                                    alleles.SampleC refAllele.SampleC refCount.SampleC refFreq.SampleC
                                    altAllele.SampleC altCount.SampleC altFreq.SampleC
                                    alleles.SampleD refAllele.SampleD refCount.SampleD refFreq.SampleD
                                    altAllele.SampleD altCount.SampleD altFreq.SampleD)
                         }
                   $x{join \"\t\",@F[0..2]} .= \"\t\".join \"\t\",@F[3..9];
                   END
                   {
                     print join \" \", @hdr;
                     foreach $k (sort keys %x) {print \"$k\t$x{$k}\" if $k !~ /chromo/}
                   }" samplea sampleb samplec sampled

The script has not been tested though, as I do not have Perl on Windows.

tyler_durden

---------- Post updated at 10:44 PM ---------- Previous update was at 09:01 AM ----------

In Windows, you could create and run a Perl program like the following:

Code:
C:\>
C:\>
C:\>type join_data.pl
#!perl -w
use strict;

my %merged;
my @hdr = qw(chromo pos ref
             alleles.SampleA refAllele.SampleA refCount.SampleA refFreq.SampleA altAllele.SampleA altCount.SampleA altFreq.SampleA
             alleles.SampleB refAllele.SampleB refCount.SampleB refFreq.SampleB altAllele.SampleB altCount.SampleB altFreq.SampleB
             alleles.SampleC refAllele.SampleC refCount.SampleC refFreq.SampleC altAllele.SampleC altCount.SampleC altFreq.SampleC
             alleles.SampleD refAllele.SampleD refCount.SampleD refFreq.SampleD altAllele.SampleD altCount.SampleD altFreq.SampleD
            );

# loop through sample files; and keep appending the values
# if the key exists in the hash called "merged"
while (defined (my $file  = glob ("sample*"))) {
  open (FH, "<", $file) or die "Can't open $file for reading: $!";
  while (<FH>) {
    chomp (my @chro = split /\t/);
    $merged{join("\t", @chro[0..2])} .= "\t".join("\t", @chro[3..$#chro]);
  }
  close (FH) or die "Can't close $file: $!";
}

# we are done processing all files
# now print the header and then the hash
print join(" ", @hdr), "\n";
foreach my $k (sort keys %merged) {
  print $k,"\t",$merged{$k},"\n" if $k !~ /chromo/
}

C:\>
C:\>perl join_data.pl
chromo pos ref alleles.SampleA refAllele.SampleA refCount.SampleA refFreq.SampleA altAllele.SampleA altCount.SampleA altFreq.SampleA alleles.SampleB refAllele.SampleB refCount.SampleB refFreq.SampleB altAllele.SampleB altCount.SampleB altFreq.SampleB alleles.SampleC refAllele.SampleC refCount.SampleC refFreq.SampleC altAllele.SampleC altCount.SampleC altFreq.SampleC alleles.SampleD refAllele.SampleD refCount.SampleD refFreq.SampleD altAllele.SampleD altCount.SampleD altFreq.SampleD
chr1    30146   A               A       A       31      100     NA      0       NA      A       A   10  100     NA      0       NA      A       A       54      100     NA      0       NA      A       A       23      100     NA      0       NA
chr1    55217   G               G       G       2       100     NA      0       NA      G       G   2   100     NA      0       NA      A/G     0       0       A       5       55      G       G       2       100     NA      0       NA
chr1    55223   C               C       C       2       100     NA      0       NA      T/C     0   0   T       4       57      C       C       2       100     NA      0       NA
chr1    55987   C               C       C       19      100     NA      0       NA      C       C   8   100     NA      0       NA      C       C       17      100     NA      0       NA      C       C       19      100     NA      0       NA
chr1    56065   T               T       T       18      90      NA      2       NA
chr1    62138   T               T       T       114     100     NA      0       NA      C       0   0   C       10      100     T/C     T       19      70      C       8       29      C       0       0       C       38      100
chr1    62233   A               A       A       110     100     NA      0       NA      A       A   34  100     NA      0       NA      G/A     0       0       G       16      66      A       A       108     100     NA      0       NA
chr1    64310   A               A       A       64      100     NA      0       NA      A       A   37  100     NA      0       NA      A       A       28      100     NA      0       NA
chr1    64321   A               A       A       17      100     NA      0       NA      A       A   9   100     NA      0       NA      C       0       0       C       4       100
chr1    64377   A               A       A       56      98      NA      1       NA      A       A   27  100     NA      0       NA      A       A       2       100     NA      0       NA
chr1    65570   A               C       0       0       C       2       100     A       A       3   100 NA      0       NA
chr1    66577   T               T       T       45      100     NA      0       NA

C:\>
C:\>

tyler_durden
This User Gave Thanks to durden_tyler For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Export Oracle multiple tables to multiple csv files using UNIX shell scripting

Hello All, just wanted to export multiple tables from oracle sql using unix shell script to csv file and the below code is exporting only the first table. Can you please suggest why? or any better idea? export FILE="/abc/autom/file/geo_JOB.csv" Export= `sqlplus -s dev01/password@dEV3... (16 Replies)
Discussion started by: Hope
16 Replies

2. UNIX for Dummies Questions & Answers

How to merge two tables based on a matched column?

Hi, Please excuse me , i have searched unix forum, i am unable to find what i expect , my query is , i have 2 files of same structure and having 1 similar field/column , i need to merge 2 tables/files based on the one matched field/column (that is field 1), file 1:... (5 Replies)
Discussion started by: karthikram
5 Replies

3. Shell Programming and Scripting

Multiple files to load into different tables

multiple files to load into different tables, I have a script show below, but this script loads data from txt file into a table, but i have multiple input files(xyzload.txt,xyz1load.txt,xyz2load.txt......) in the unix folder , can we load these files in diff tables (table 1, table2... (1 Reply)
Discussion started by: nani1984
1 Replies

4. Shell Programming and Scripting

Multiple files to load into different tables,

multiple files to load into different tables, I have a script show below, but this script loads data from txt file into a table, but i have multiple input files(xyzload.txt,xyz1load.txt,xyz2load.txt......) in the unix folder , can we load these files in diff tables (table 1, table2... (0 Replies)
Discussion started by: nani1984
0 Replies

5. Shell Programming and Scripting

Merge multiple tables into big matrix

Hi all, I have a complex (beyond my biological expertise) problem at hand. I need to merge multiple files into 1 big matrix. Please help me with some code. Inp1 Ang_0 chr1 98 T A Ang_0 chr1 352 G A Ang_0 chr1 425 C T Ang_0 chr2 ... (1 Reply)
Discussion started by: newbie83
1 Replies

6. Web Development

mysql query for multiple columns from multiple tables in a DB

Say I have two tables like below.. status HId sName dName StartTime EndTime 1 E E 9:10 10:10 2 E F 9:15 10:15 3 G H 9:17 10:00 logic Id devName capacity free Line 1 E 123 34 1 2 E 345 ... (3 Replies)
Discussion started by: ilan
3 Replies

7. Shell Programming and Scripting

Merge Two Tables with duplicates in first table

Hi.. File 1: 1 aa rep 1 dd rep 1 kk rep 2 bb sad 2 ss sad 3 ee dam File 2 1 apple fruit 2 mango tree 3 lilly flower output: 1 aaple fruit aa,dd,kk rep (7 Replies)
Discussion started by: empyrean
7 Replies

8. Shell Programming and Scripting

Using Perl to Merge Multiple Lines in a File

I've hunted and hunted but nothing seems to apply to what I need. Any help will be much appreciated! My input file looks like (Unix): marker,allele1,allele2 RS1002244,1,1 RS1002244,1,3 RS1002244,3,3 RS1003719,2,2 RS1003719,2,4 RS1003719,4,4 Most markers are listed 3 times but a few... (2 Replies)
Discussion started by: Peggy White
2 Replies

9. Programming

SQL Add to Multiple Tables

I'm pretty new to the database world and I've run into a mental block of sorts. I've been unable to find the answer anywhere. Here's my problem: I have several tables and everything is as normalized as possible (as I've been lead to understand normalization.) Normalization has lead to some... (1 Reply)
Discussion started by: flakblas
1 Replies

10. Shell Programming and Scripting

Reading data from multiple tables from Oracle DB

Hi , I want to read the data from 9 tables in oracle DB into 9 different files in the same connection instance (session). I am able to get data from one table to one file with below code : X=`sqlplus -s user/pwd@DB <<eof select col1 from table1; EXIT; eof` echo $X>myfile Can anyone... (2 Replies)
Discussion started by: net
2 Replies
Login or Register to Ask a Question