I once again got stuck with merging tables and was wondering if someone could help me out on that problem.
I have a number of tab delimited tables which I need to merge into one big one. All tables have the same header but a different number of rows (this could be changed if it would be easier). I would like to merge them according to the first 3 columns ("chromo", "pos", "ref"), All the following columns ( "alleles", "refAllele", "refCount", "refFreq", "altAllele", "altCount", "altFreq") should be added after each other. Preferably there would also be an indication in the final file to which sample the columns belong.
$ ls -1 *SNPtable*
2836_SNPtable_CLC_stringent.txt
2838_SNPtable_CLC_stringent.txt
2840_SNPtable_CLC_stringent.txt
5039_SNPtable_CLC_stringent.txt
$ cat script.pl
use warnings;
use strict;
@ARGV >= 1 or die qq[Usage: perl $0 file1 [file2] [file3] ...\n];
my $suffix_word = qq[Sample];
my $suffix_letter = qq[A];
my $argc = @ARGV;
my @header_in = split /\s+/, scalar <>;
my $header_out = "@header_in[0..2]" . qq[ ];
for ( 1 .. $argc ) {
$header_out .= join( qq[ ], map { $_ . qq[.] . $suffix_word . $suffix_letter } @header_in[3..$#header_in] ) . qq[ ];
++$suffix_letter;
}
printf "%s\n", $header_out;
while ( <> ) {
next if $. == 1;
print;
} continue {
close ARGV if eof;
}
$ perl script.pl
Usage: perl script.pl file1 [file2] [file3] ...
$ perl script.pl 2836_SNPtable_CLC_stringent.txt 2838_SNPtable_CLC_stringent.txt 2840_SNPtable_CLC_stringent.txt 5039_SNPtable_CLC_stringent.txt
chromo pos ref alleles.SampleA refAllele.SampleA refCount.SampleA refFreq.SampleA altAllele.SampleA altCount.SampleA altFreq.SampleA alleles.SampleB refAllele.SampleB refCount.SampleB refFreq.SampleB altAllele.SampleB altCount.SampleB altFreq.SampleB alleles.SampleC refAllele.SampleC refCount.SampleC refFreq.SampleC altAllele.SampleC altCount.SampleC altFreq.SampleC alleles.SampleD refAllele.SampleD refCount.SampleD refFreq.SampleD altAllele.SampleD altCount.SampleD altFreq.SampleD
chr1 30146 A A A 31 100 NA 0 NA
chr1 55217 G G G 2 100 NA 0 NA
chr1 55223 C C C 2 100 NA 0 NA
chr1 55987 C C C 19 100 NA 0 NA
chr1 62138 T T T 114 100 NA 0 NA
chr1 62233 A A A 110 100 NA 0 NA
chr1 64310 A A A 64 100 NA 0 NA
chr1 64321 A A A 17 100 NA 0 NA
chr1 64377 A A A 56 98 NA 1 NA
chr1 30146 A A A 10 100 NA 0 NA
chr1 55217 G G G 2 100 NA 0 NA
chr1 55987 C C C 8 100 NA 0 NA
chr1 62138 T C 0 0 C 10 100
chr1 62233 A A A 34 100 NA 0 NA
chr1 64310 A A A 37 100 NA 0 NA
chr1 64321 A A A 9 100 NA 0 NA
chr1 64377 A A A 27 100 NA 0 NA
chr1 65570 A C 0 0 C 2 100
chr1 30146 A A A 54 100 NA 0 NA
chr1 55217 G A/G 0 0 A 5 55
chr1 55223 C T/C 0 0 T 4 57
chr1 55987 C C C 17 100 NA 0 NA
chr1 56065 T T T 18 90 NA 2 NA
chr1 62138 T T/C T 19 70 C 8 29
chr1 62233 A G/A 0 0 G 16 66
chr1 64310 A A A 28 100 NA 0 NA
chr1 64321 A C 0 0 C 4 100
chr1 30146 A A A 23 100 NA 0 NA
chr1 55217 G G G 2 100 NA 0 NA
chr1 55223 C C C 2 100 NA 0 NA
chr1 55987 C C C 19 100 NA 0 NA
chr1 62138 T C 0 0 C 38 100
chr1 62233 A A A 108 100 NA 0 NA
chr1 64377 A A A 2 100 NA 0 NA
chr1 65570 A A A 3 100 NA 0 NA
chr1 66577 T T T 45 100 NA 0 NA
I was wondering is there an easy way where the fields from file 2 are listed next to that of file 1?
If there is an easy fix to this that would be super wonderful. Thanks.
$
$
$ perl -F"\t" -lane 'BEGIN {@hdr = qw(chromo pos ref
alleles.SampleA refAllele.SampleA refCount.SampleA refFreq.SampleA
altAllele.SampleA altCount.SampleA altFreq.SampleA
alleles.SampleB refAllele.SampleB refCount.SampleB refFreq.SampleB
altAllele.SampleB altCount.SampleB altFreq.SampleB
alleles.SampleC refAllele.SampleC refCount.SampleC refFreq.SampleC
altAllele.SampleC altCount.SampleC altFreq.SampleC
alleles.SampleD refAllele.SampleD refCount.SampleD refFreq.SampleD
altAllele.SampleD altCount.SampleD altFreq.SampleD)
}
$x{join"\t",@F[0..2]}.="\t".join"\t",@F[3..9];
END
{
print join " ", @hdr;
foreach $k (sort keys %x) {print "$k\t$x{$k}" if $k !~ /chromo/}
}' samplea sampleb samplec sampled
chromo pos ref alleles.SampleA refAllele.SampleA refCount.SampleA refFreq.SampleA altAllele.SampleA altCount.SampleA altFreq.SampleA alleles.SampleB refAllele.SampleB refCount.SampleB refFreq.SampleB altAllele.SampleB altCount.SampleB altFreq.SampleB alleles.SampleC refAllele.SampleC refCount.SampleC refFreq.SampleC altAllele.SampleC altCount.SampleC altFreq.SampleC alleles.SampleD refAllele.SampleD refCount.SampleD refFreq.SampleD altAllele.SampleD altCount.SampleD altFreq.SampleD
chr1 30146 A A A 31 100 NA 0 NA A A 10 100 NA 0 NA A NA0
chr1 55217 G G G 2 100 NA 0 NA G G 2 100 NA 0 NA A/G NA0
chr1 55223 C C C 2 100 NA 0 NA T/C 0 0 T 4 57 C NA0
chr1 55987 C C C 19 100 NA 0 NA C C 8 100 NA 0 NA C NA0
chr1 56065 T T T 18 90 NA 2 NA
chr1 62138 T T T 114 100 NA 0 NA C 0 0 C 10 100 T/C 100
chr1 62233 A A A 110 100 NA 0 NA A A 34 100 NA 0 NA G/A NA0
chr1 64310 A A A 64 100 NA 0 NA A A 37 100 NA 0 NA A NA0
chr1 64321 A A A 17 100 NA 0 NA A A 9 100 NA 0 NA C 100
chr1 64377 A A A 56 98 NA 1 NA A A 27 100 NA 0 NA A NA0
chr1 65570 A C 0 0 C 2 100 A A 3 100 NA 0 NA
chr1 66577 T T T 45 100 NA 0 NA
$
$
$
$
The script has not been tested though, as I do not have Perl on Windows.
tyler_durden
---------- Post updated at 10:44 PM ---------- Previous update was at 09:01 AM ----------
In Windows, you could create and run a Perl program like the following:
Code:
C:\>
C:\>
C:\>type join_data.pl
#!perl -w
use strict;
my %merged;
my @hdr = qw(chromo pos ref
alleles.SampleA refAllele.SampleA refCount.SampleA refFreq.SampleA altAllele.SampleA altCount.SampleA altFreq.SampleA
alleles.SampleB refAllele.SampleB refCount.SampleB refFreq.SampleB altAllele.SampleB altCount.SampleB altFreq.SampleB
alleles.SampleC refAllele.SampleC refCount.SampleC refFreq.SampleC altAllele.SampleC altCount.SampleC altFreq.SampleC
alleles.SampleD refAllele.SampleD refCount.SampleD refFreq.SampleD altAllele.SampleD altCount.SampleD altFreq.SampleD
);
# loop through sample files; and keep appending the values
# if the key exists in the hash called "merged"
while (defined (my $file = glob ("sample*"))) {
open (FH, "<", $file) or die "Can't open $file for reading: $!";
while (<FH>) {
chomp (my @chro = split /\t/);
$merged{join("\t", @chro[0..2])} .= "\t".join("\t", @chro[3..$#chro]);
}
close (FH) or die "Can't close $file: $!";
}
# we are done processing all files
# now print the header and then the hash
print join(" ", @hdr), "\n";
foreach my $k (sort keys %merged) {
print $k,"\t",$merged{$k},"\n" if $k !~ /chromo/
}
C:\>
C:\>perl join_data.pl
chromo pos ref alleles.SampleA refAllele.SampleA refCount.SampleA refFreq.SampleA altAllele.SampleA altCount.SampleA altFreq.SampleA alleles.SampleB refAllele.SampleB refCount.SampleB refFreq.SampleB altAllele.SampleB altCount.SampleB altFreq.SampleB alleles.SampleC refAllele.SampleC refCount.SampleC refFreq.SampleC altAllele.SampleC altCount.SampleC altFreq.SampleC alleles.SampleD refAllele.SampleD refCount.SampleD refFreq.SampleD altAllele.SampleD altCount.SampleD altFreq.SampleD
chr1 30146 A A A 31 100 NA 0 NA A A 10 100 NA 0 NA A A 54 100 NA 0 NA A A 23 100 NA 0 NA
chr1 55217 G G G 2 100 NA 0 NA G G 2 100 NA 0 NA A/G 0 0 A 5 55 G G 2 100 NA 0 NA
chr1 55223 C C C 2 100 NA 0 NA T/C 0 0 T 4 57 C C 2 100 NA 0 NA
chr1 55987 C C C 19 100 NA 0 NA C C 8 100 NA 0 NA C C 17 100 NA 0 NA C C 19 100 NA 0 NA
chr1 56065 T T T 18 90 NA 2 NA
chr1 62138 T T T 114 100 NA 0 NA C 0 0 C 10 100 T/C T 19 70 C 8 29 C 0 0 C 38 100
chr1 62233 A A A 110 100 NA 0 NA A A 34 100 NA 0 NA G/A 0 0 G 16 66 A A 108 100 NA 0 NA
chr1 64310 A A A 64 100 NA 0 NA A A 37 100 NA 0 NA A A 28 100 NA 0 NA
chr1 64321 A A A 17 100 NA 0 NA A A 9 100 NA 0 NA C 0 0 C 4 100
chr1 64377 A A A 56 98 NA 1 NA A A 27 100 NA 0 NA A A 2 100 NA 0 NA
chr1 65570 A C 0 0 C 2 100 A A 3 100 NA 0 NA
chr1 66577 T T T 45 100 NA 0 NA
C:\>
C:\>
tyler_durden
This User Gave Thanks to durden_tyler For This Post:
Hello All,
just wanted to export multiple tables from oracle sql using unix shell script to csv file and the below code is exporting only the first table.
Can you please suggest why? or any better idea?
export FILE="/abc/autom/file/geo_JOB.csv"
Export= `sqlplus -s dev01/password@dEV3... (16 Replies)
Hi,
Please excuse me , i have searched unix forum, i am unable to find what i expect ,
my query is , i have 2 files of same structure and having 1 similar field/column , i need to merge 2 tables/files based on the one matched field/column (that is field 1),
file 1:... (5 Replies)
multiple files to load into different tables,
I have a script show below, but this script loads data from txt file into a table,
but i have multiple input files(xyzload.txt,xyz1load.txt,xyz2load.txt......) in the unix folder ,
can we load these files in diff tables (table 1, table2... (1 Reply)
multiple files to load into different tables,
I have a script show below, but this script loads data from txt file into a table,
but i have multiple input files(xyzload.txt,xyz1load.txt,xyz2load.txt......) in the unix folder ,
can we load these files in diff tables (table 1, table2... (0 Replies)
Hi all,
I have a complex (beyond my biological expertise) problem at hand.
I need to merge multiple files into 1 big matrix. Please help me with some code.
Inp1
Ang_0 chr1 98 T A
Ang_0 chr1 352 G A
Ang_0 chr1 425 C T
Ang_0 chr2 ... (1 Reply)
Say I have two tables like below..
status
HId sName dName StartTime EndTime
1 E E 9:10 10:10
2 E F 9:15 10:15
3 G H 9:17 10:00
logic
Id devName capacity free Line
1 E 123 34 1
2 E 345 ... (3 Replies)
Hi..
File 1:
1 aa rep
1 dd rep
1 kk rep
2 bb sad
2 ss sad
3 ee dam
File 2
1 apple fruit
2 mango tree
3 lilly flower
output:
1 aaple fruit aa,dd,kk rep (7 Replies)
I've hunted and hunted but nothing seems to apply to what I need. Any help will be much appreciated!
My input file looks like (Unix):
marker,allele1,allele2
RS1002244,1,1
RS1002244,1,3
RS1002244,3,3
RS1003719,2,2
RS1003719,2,4
RS1003719,4,4
Most markers are listed 3 times but a few... (2 Replies)
I'm pretty new to the database world and I've run into a mental block of sorts. I've been unable to find the answer anywhere. Here's my problem: I have several tables and everything is as normalized as possible (as I've been lead to understand normalization.) Normalization has lead to some... (1 Reply)
Hi ,
I want to read the data from 9 tables in oracle DB into 9 different files in the same connection instance (session). I am able to get data from one table to one file with below code :
X=`sqlplus -s user/pwd@DB <<eof
select col1 from table1;
EXIT;
eof`
echo $X>myfile
Can anyone... (2 Replies)