create table file from different files with index


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers create table file from different files with index
# 1  
Old 12-18-2011
create table file from different files with index

Hi,
I've several files with two collumns, where first collumn can be used as index.

filename1
Quote:
CATG 10
AATG 20
AAAG 1
AAAA 5
...
and filename2
Quote:
CATG 100
AATG 20
AAAG 2
CCCC 1000
...
how to create a file
Quote:
ID filename1 filename2
CATG 10 100
AATG 20 20
AAAG 1 2
AAAA 5 0
CCCC 0 1000
...
I should start with cat all files and extract first collumn to create an index?
# 2  
Old 12-18-2011
This snippet does the job well, (though there may already be a function in the bio-perl modules that addresses the wider issue you are solving, always worth having a look through CPAN before coding Smilie

Code:
#!/usr/bin/perl

use strict;
use warnings;

open (my $index_1 , '<', $ARGV[0]);
my %index_1;
while (<$index_1>){
        chomp;
        if (/^([CAGT]+)\s(\d+)$/){
                $index_1{$1}=$2;
        }
        else {
                print "Invalid input \"$_\" in $ARGV[0], line $.\n";
        }
}
close $index_1;

open (my $index_2 , '<', $ARGV[1]);
my %index_2;
while (<$index_2>){
        chomp;
        if (/^([CAGT]+)\s(\d+)$/){
                $index_2{$1}=$2;
        }
        else {
                print "Invalid input \"$_\" in $ARGV[1], line $.\n";

        }
}
close $index_2;
my %indices;
@indices{(keys %index_1),(keys %index_2)}++;
for my $index (sort (keys %indices)){
        print "$index\t ",$index_1{$index}||"0","\t", $index_2{$index}||"0","\n";
}

This yields the following when called with the named files, the script could be "genericised" to deal with any number of files on the command line.
Code:
~/src/Perl/tmp$ perl test.pl file_1.dat file_2.dat
AAAA     5      0
AAAG     1      2
AATG     20     20
CATG     10     100
CCCC     0      1000
~/src/Perl/tmp$


Last edited by Skrynesaver; 12-18-2011 at 08:22 AM.. Reason: Added example output
This User Gave Thanks to Skrynesaver For This Post:
# 3  
Old 12-18-2011
Thanks for the reply Skrynesaver,
I tried your script (I named it "index?counts.pl") after making it executable, with the following command line:
Quote:
perl index_counts.pl <a.count <b.count >matrix
but I got the following error
Quote:
Modification of a read-only value attempted at index_counts.pl line 33.
which corresponds to:
Code:
@indices{(keys %index_1),(keys %index_2)}++;

Any idea about the error.
Does this script accepts mores than two input files?
Sorry for what might be basic questions, but I'm a really dummy in Perl.
Cheers!

---------- Post updated at 12:42 PM ---------- Previous update was at 12:37 PM ----------

Sorry only now I saw your command line,
but even so I get the following error trying the examples I gave
Quote:
perl index_counts.pl a.count b.count

Invalid input "CATG 10" in a.count, line 1
Invalid input "AATG 20" in a.count, line 2
Invalid input "AAAG 1" in a.count, line 3
Invalid input "AAAA 5" in a.count, line 4
Invalid input "CATG 100" in b.count, line 1
Invalid input "AATG 20" in b.count, line 2
Invalid input "AAAG 2" in b.count, line 3
Invalid input "CCCC 1000" in b.count, line 4
Modification of a read-only value attempted at index_counts.pl line 33.
---------- Post updated at 12:44 PM ---------- Previous update was at 12:42 PM ----------

I figured out, it was an extra space between collumns!
Thanks!

---------- Post updated at 01:16 PM ---------- Previous update was at 12:44 PM ----------

I tried to "genericised" the script for 3 input files:
Code:
#!/usr/bin/perl

use strict;
use warnings;

open (my $index_1 , '<', $ARGV[0]);
my %index_1;
while (<$index_1>){
        chomp;
        if (/^([CAGT]+)\s(\d+)$/){
                $index_1{$1}=$2;
        }
        else {
                print "Invalid input \"$_\" in $ARGV[0], line $.\n";
        }
}
close $index_1;

open (my $index_2 , '<', $ARGV[1]);
my %index_2;
while (<$index_2>){
        chomp;
        if (/^([CAGT]+)\s(\d+)$/){
                $index_2{$1}=$2;
        }
        else {
                print "Invalid input \"$_\" in $ARGV[1], line $.\n";

        }
}
close $index_2;

open (my $index_3 , '<', $ARGV[2]);
my %index_3;
while (<$index_3>){
        chomp;
        if (/^([CAGT]+)\s(\d+)$/){
                $index_3{$1}=$2;
        }
        else {
                print "Invalid input \"$_\" in $ARGV[2], line $.\n";

        }
}
close $index_3;



my %indices;
@indices{(keys %index_1),(keys %index_2),(keys %index_3)}++;
for my $index (sort (keys %indices)){
        print "$index\t ",$index_1{$index}||"0","\t", $index_2{$index}||"0","\n", $index_3{$index}||"0","\n";
}

the results are OK, the printed file is somehow deformated
Code:
AAAAAAACGTCAGGGAAGCTTGTCATG	 3	0
0
AAAAAACCGCCGATGCCTCCGGCCATG	 2	0
0
AAAAAAGTTTTAATCATTATCGACATG	 1	0
0
AAAAACTGTATTCAGTGACCAATCATG	 4	0
0
AAAAAGACAGCTGCTCCAGATAACATG	 1	0
0
AAAAATCTGTAAGGCTAATGGTGCATG	 1	0
0
...

Is there also any simpler way to increase input files, I have 50plus....
# 4  
Old 12-19-2011
This would be my approach to accepting a random number of files as arguments

Code:
#!/usr/bin/perl

use strict;
use warnings;
my @aoh; # array of hashes
for my $file (@ARGV){
        push (@aoh,include_data($file));
}
report(@aoh);
exit 1;

sub include_data{
        my $file=shift;
        open (my $index , '<', $file);
        my %index;
        while (<$index>){
                chomp;
                if (/^([CAGT]+)\s+(\d+)$/){
                        $index{$1}=$2;
                }
                else {
                        print "Invalid input \"$_\" in $ARGV[0], line $.\n";
                }
        }
        close($index);
        return \%index;
}
sub report{
        my @aoh=@_;
        my %indices;
        for my $index_ref (@aoh){
                @indices{keys %{$index_ref}}++;
        }
        for my $index (keys %indices){
                print "$index\t";
                for my $index_ref (@aoh){
                        print $index_ref->{$index}||0,"\t";
                }
                print "\n";
        }
}


Last edited by Skrynesaver; 12-19-2011 at 05:54 AM.. Reason: removed debugging statement
This User Gave Thanks to Skrynesaver For This Post:
# 5  
Old 01-02-2012
Skrynesaver,
just two specific questions:

1.How to make the @aoh accepting numbers in scientifc format, or even diferent characthers as text?

2.How to make headers with input file name?

Now a general question, what suggest would you give me as the best way to learn perl, any nice online tutorial?

Thanks for all your help!

---------- Post updated at 03:17 PM ---------- Previous update was at 02:12 PM ----------

Just replaced "\d" for a "\S", to look for non-whitespace character instead of a digit (0-9) as regular expressions, to solve my first question.

I´m getting there, the biggest problem is the semantic.
cheers!
 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to create a for loop statement for removing files listed in Oracle table?

Hello Frens, I am a newbie to shell scripting. I need a help on creating a for loop script (shell script) for removing files. I have a table called a_table with column name fil_name which contains all the files that need to be removed. Thank you in advance (6 Replies)
Discussion started by: manisha_singh
6 Replies

2. Shell Programming and Scripting

Read log file to create Performance index

I am required to create a CSV file reading last 200000 lines form a log file. I have to grep 3 parameters from this log file and write these parameters in the .csv file, with time stamp. This script will be setup in a cron job which will run every 10 minutes. I have written the script but it is... (5 Replies)
Discussion started by: Crazy_Nix
5 Replies

3. Shell Programming and Scripting

Create a pivot table from CSV file

Gents, Can you please help me to create a pivot table from a csv file. ( I have zip the csv file) Using the file attached, columns 1,28 and 21 i would like to get something like this output JD Val 1 2 3 4 5 6 7 8 9 10 11 12 Total... (4 Replies)
Discussion started by: jiam912
4 Replies

4. Shell Programming and Scripting

Create a control file from Table definition

Hi Team, I need to create a control file with a pre-defined structure for a given table name. The table is in teradata. Ex: Table Name: TBL1 Table structure: create multiset table tbl1, no fallback, no before journal, no after journal, checksum = default, default mergeblockratio... (7 Replies)
Discussion started by: unankix
7 Replies

5. Shell Programming and Scripting

rm -i and deleting files from an index table

Hi, I am trying to make a command to delete my files out the trash can, but one at a time. I am currently using rm - i to do this, but the original file locations for restoring my files are heard on a .txt file which I am using as an index table. How would I manage to make it so that if I... (21 Replies)
Discussion started by: E-WAN
21 Replies

6. Shell Programming and Scripting

Linux Script create index.html file

I need a script that can do this: A script that searches all directories and subdirectories for .html files When a .html file is found it creates a index.html file in that folder. It then edits the index.html file and inserts links to all of the .html files that are in that folder into the... (5 Replies)
Discussion started by: seashell11
5 Replies

7. UNIX and Linux Applications

create table via stored procedure (passing the table name to it)

hi there, I am trying to create a stored procedure that i can pass the table name to and it will create a table with that name. but for some reason it creates with what i have defined as the variable name . In the case of the example below it creates a table called 'tname' for example ... (6 Replies)
Discussion started by: rethink
6 Replies

8. Shell Programming and Scripting

to create an output file as a table

Hi, I have four input files and would like to create an output file as a table. Please check the example below. File 1. 111111 222222 333333 444444 File 2. 555555 666666 777777 888888 File 3. aaaaa bbbbb ccccc ddddd (2 Replies)
Discussion started by: marcelus
2 Replies

9. Filesystems, Disks and Memory

why the inode index of file system starts from 1 unlike array index(0)

why do inode indices starts from 1 unlike array indexes which starts from 0 its a question from "the design of unix operating system" of maurice j bach id be glad if i get to know the answer quickly :) (0 Replies)
Discussion started by: sairamdevotee
0 Replies
Login or Register to Ask a Question