TFBS::Matrix::PWM(3pm) User Contributed Perl Documentation TFBS::Matrix::PWM(3pm)
NAME
TFBS::Matrix::PWM - class for position weight matrices of nucleotide patterns
SYNOPSIS
o creating a TFBS::Matrix::PWM object manually:
my $matrixref = [ [ 0.61, -3.16, 1.83, -3.16, 1.21, -0.06],
[-0.15, -2.57, -3.16, -3.16, -2.57, -1.83],
[-1.57, 1.85, -2.57, -1.34, -1.57, 1.14],
[ 0.31, -3.16, -2.57, 1.76, 0.24, -0.83]
];
my $pwm = TFBS::Matrix::PWM->new(-matrix => $matrixref,
-name => "MyProfile",
-ID => "M0001"
);
# or
my $matrixstring = <<ENDMATRIX
0.61 -3.16 1.83 -3.16 1.21 -0.06
-0.15 -2.57 -3.16 -3.16 -2.57 -1.83
-1.57 1.85 -2.57 -1.34 -1.57 1.14
0.31 -3.16 -2.57 1.76 0.24 -0.83
ENDMATRIX
;
my $pwm = TFBS::Matrix::PWM->new(-matrixstring => $matrixstring,
-name => "MyProfile",
-ID => "M0001"
);
o retrieving a TFBS::Matix::PWM object from a database:
(See documentation of individual TFBS::DB::* modules to learn how to connect to different types of pattern databases and retrieve
TFBS::Matrix::* objects from them.)
my $db_obj = TFBS::DB::JASPAR2->new
(-connect => ["dbi:mysql:JASPAR2:myhost",
"myusername", "mypassword"]);
my $pwm = $db_obj->get_Matrix_by_ID("M0001", "PWM");
# or
my $pwm = $db_obj->get_Matrix_by_name("MyProfile", "PWM");
o retrieving list of individual TFBS::Matrix::PWM objects from a TFBS::MatrixSet object
(see decumentation of TFBS::MatrixSet to learn how to create objects for storage and manipulation of multiple matrices)
my @pwm_list = $matrixset->all_patterns(-sort_by=>"name");
o scanning a nucleotide sequence with a matrix
my $siteset = $pwm->search_seq(-file =>"myseq.fa",
-threshold => "80%");
o scanning a pairwise alignment with a matrix
my $site_pair_set = $pwm->search_aln(-file =>"myalign.aln",
-threshold => "80%",
-cutoff => "70%",
-window => 50);
DESCRIPTION
TFBS::Matrix::PWM is a class whose instances are objects representing position weight matrices (PWMs). A PWM is normally calculated from a
raw position frequency matrix (see TFBS::Matrix::PFM for the explanation of position frequency matrices). For example, given the following
position frequency matrix:
A:[ 12 3 0 0 4 0 ]
C:[ 0 0 0 11 7 0 ]
G:[ 0 9 12 0 0 0 ]
T:[ 0 0 0 1 1 12 ]
The standard computational procedure is applied to convert it into the following position weight matrix:
A:[ 0.61 -3.16 1.83 -3.16 1.21 -0.06]
C:[-0.15 -2.57 -3.16 -3.16 -2.57 -1.83]
G:[-1.57 1.85 -2.57 -1.34 -1.57 1.14]
T:[ 0.31 -3.16 -2.57 1.76 0.24 -0.83]
which contains the "weights" associated with the occurence of each nucleotide at the given position in a pattern.
A TFBS::Matrix::PWM object is equipped with methods to search nucleotide sequences and pairwise alignments of nucleotide sequences with the
pattern they represent, and return a set of sites in nucleotide sequence (a TFBS::SiteSet object for single sequence search, and a
TFBS::SitePairSet for the alignment search).
FEEDBACK
Please send bug reports and other comments to the author.
AUTHOR - Boris Lenhard
Boris Lenhard <Boris.Lenhard@cgb.ki.se>
APPENDIX
The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore.
new
Title : new
Usage : my $pwm = TFBS::Matrix::PWM->new(%args)
Function: constructor for the TFBS::Matrix::PWM object
Returns : a new TFBS::Matrix::PWM object
Args : # you must specify either one of the following three:
-matrix, # reference to an array of arrays of integers
#or
-matrixstring,# a string containing four lines
# of tab- or space-delimited integers
#or
-matrixfile, # the name of a file containing four lines
# of tab- or space-delimited integers
#######
-name, # string, OPTIONAL
-ID, # string, OPTIONAL
-class, # string, OPTIONAL
-tags # an array reference, OPTIONAL
search_seq
Title : search_seq
Usage : my $siteset = $pwm->search_seq(%args)
Function: scans a nucleotide sequence with the pattern represented
by the PWM
Returns : a TFBS::SiteSet object
Args : # you must specify either one of the following three:
-file, # the name od a fasta file (single sequence)
#or
-seqobj # a Bio::Seq object
# (more accurately, a Bio::PrimarySeqobject or a
# subclass thereof)
#or
-seqstring # a string containing the sequence
-threshold, # minimum score for the hit, either absolute
# (e.g. 11.2) or relative (e.g. "75%")
# OPTIONAL: default "80%"
-subpart # subpart of the sequence to search, given as
# -subpart => { start => 140,
# end => 180 }
# where start and end are coordinates in the
# sequence; the coordinate range is interpreted
# in the BioPerl tradition (1-based, inclusive)
# OPTIONAL: by default searches entire alignment
search_aln
Title : search_aln
Usage : my $site_pair_set = $pwm->search_aln(%args)
Function: Scans a pairwise alignment of nucleotide sequences
with the pattern represented by the PWM: it reports only
those hits that are present in equivalent positions of both
sequences and exceed a specified threshold score in both, AND
are found in regions of the alignment above the specified
conservation cutoff value.
Returns : a TFBS::SitePairSet object
Args : # you must specify either one of the following three:
-file, # the name of the alignment file in Clustal
format
#or
-alignobj # a Bio::SimpleAlign object
# (more accurately, a Bio::PrimarySeqobject or a
# subclass thereof)
#or
-alignstring # a multi-line string containing the alignment
# in clustal format
#############
-threshold, # minimum score for the hit, either absolute
# (e.g. 11.2) or relative (e.g. "75%")
# OPTIONAL: default "80%"
-window, # size of the sliding window (inn nucleotides)
# for calculating local conservation in the
# alignment
# OPTIONAL: default 50
-cutoff # conservation cutoff (%) for including the
# region in the results of the pattern search
# OPTIONAL: default "70%"
-subpart # subpart of the alignment to search, given as e.g.
# -subpart => { relative_to => 1,
# start => 140,
# end => 180 }
# where start and end are coordinates in the
# sequence indicated by relative_to (1 for the
# 1st sequence in the alignment, 2 for the 2nd)
# OPTIONAL: by default searches entire alignment
-conservation
# conservation profile, a TFBS::ConservationProfile
# OPTIONAL: by default the conservation profile is
# computed internally on the fly (less efficient)
name
ID
class
matrix
length
revcom
rawprint
prettyprint
The above methods are common to all matrix objects. Please consult TFBS::Matrix to find out how to use them.
perl v5.14.2 2008-01-24 TFBS::Matrix::PWM(3pm)