06-11-2009
Making an information matrix using Java
Hello all,
I'm doing some research in Biostatistics this summer studying chloroplast genomes. I have 19 text files that look exactly like this:
Name: Marchantia polymorpha
FileName: NC_001319
Bases: 121024
Genes: rps12 <85..842
rps7 (892..1359)
ndhB (1514..3555)
psbM (4001..4105)
rpoB (5859..9056)
rpoC1 (9087..11737)
rpoC2 (11811..15971)
rps2 (16055..16762)
atpI (16890..17636)
atpH (18014..18259)
atpF (18468..19609)
atpA (19654..21177)
ycf12 -(22162..22263)
psbI -(22997..23107)
psbK -(23438..23605)
chlB (24053..25594)
psbA (28368..29429)
mbpX (37012..38124)
psbD (38855..39916)
psbC (39864..41285)
psbZ (41647..41835)
rps14 -(42333..42635)
psaB -(42724..44928)
psaA -(44955..47207)
rps4 -(49425..50033)
ndhJ -(51233..51709)
ndhK -(51793..52524)
ndhC -(52515..52877)
atpE -(53955..54362)
atpB -(54368..55846)
rbcL (56355..57782)
accD (58065..59015)
psaI (59193..59303)
ycf4 (59525..60079)
cemA (60151..61455)
petA (61641..62603)
psbJ -(62794..62916)
psbL -(63036..63152)
psbF -(63174..63293)
psbE -(63303..63554)
petG (64370..64483)
psaJ (65027..65155)
rpl33 (65273..65470)
rps18 (65498..65725)
rpl20 -(65807..66157)
clpP -(67130..68640)
psbB (69026..70552)
psbT (70669..70776)
psbN -(70863..70994)
psbH (71092..71316)
petB (71424..72566)
petD (72715..73690)
rpoA -(73802..74824)
rps11 -(74857..75249)
rpl36 -(75300..75413)
infA -(75450..75686)
rps8 -(75773..76171)
rpl14 -(76253..76621)
rpl16 -(76719..77685)
rps3 -(77743..78396)
rpl22 -(78445..78804)
rps19 -(78822..79100)
rpl2 -(79137..80514)
rpl23 -(80550..80825)
ndhF -(91101..93179)
rpl21 (93469..93819)
rpl32 (93886..94095)
cysT (94183..95049)
ccsA (95482..96444)
ndhD -(96665..98164)
psaC -(98289..98534)
ndhE -(98757..99059)
ndhG -(99113..99688)
ndhI -(99779..100330)
ndhA -(100382..102200)
ndhH -(102202..103380)
rps15 -(103433..103699)
chlL -(110104..110973)
Of course, each one of the 19 files have a different Name, NC_ number, number of bases, and different genes in different numerical positions. I have a slight knowledge of Java, and wish to take these files and make an information matrix. The NC_ numbers of the 19 different genes would be listed across the top, and each gene would be listed down the side. Then, in a matrix, if that NC_ number file contains a certain gene on the left, place a 1 in the matrix, otherwise a 0. If I just have the 19 text files as command line args, is it possible to do this somehow? Maybe with a TreeMap or other data structure?
If it would make it easier, I could also trim the files down to just the gene names, with the headings of the numerical positions next to each. I don't really know what would be best, but some program that could make this matrix would really help my research. Thanks to anyone taking the time to read this and any ideas would help!
-akreibich07
5 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hello, all!
I am struggling with a short script to read a diagonal matrix for later retrieval.
1.000 0.234 0.435 0.123 0.012 0.102 0.325 0.412 0.087 0.098
1.000 0.111 0.412 0.115 0.058 0.091 0.190 0.045 0.058
1.000 0.205 0.542 0.335 0.054 0.117 0.203 0.125
1.000 0.587 0.159 0.357... (11 Replies)
Discussion started by: yifangt
11 Replies
2. Ubuntu
Hi all,
Is there a way to convert full data matrix to linearised left data matrix?
e.g full data matrix
Bh1 Bh2 Bh3 Bh4 Bh5 Bh6 Bh7
Bh1 0 0.241058 0.236129 0.244397 0.237479 0.240767 0.245245
Bh2 0.241058 0 0.240594 0.241931 0.241975 ... (8 Replies)
Discussion started by: evoll
8 Replies
3. Shell Programming and Scripting
Hi everyone
I am very new at awk but think that that might be the best strategy for this. I have a matrix very similar to a correlation matrix and in practical terms I need to convert it into a list containing the values from the matrix (one value per line) with the first field of the line (row... (5 Replies)
Discussion started by: stonemonkey
5 Replies
4. Shell Programming and Scripting
Hello all!
This is my first post and I'm very new to programming. I would like help creating a simple perl or bash script that I will be using in my work as a junior bioinformatician.
Essentially, I would like to take a tab-delimted or .csv text with 3 columns and write them to a "3D" matrix:
... (16 Replies)
Discussion started by: torchij
16 Replies
5. UNIX for Beginners Questions & Answers
Hi,
I have a requirement to check the expiry date of Java software installed in linux servers.
We are using the below Java
java version "1.8.0_51"
Java(TM) SE Runtime Environment (build 1.8.0_51-b16)
Java HotSpot(TM) 64-Bit Server VM (build 25.51-b03, mixed mode)
I am not sure... (3 Replies)
Discussion started by: nextStep
3 Replies
LEARN ABOUT SUSE
java-functions
java-functions(7) RPM Java packaging java-functions(7)
NAME
java-functions - Functions library for Java applications. Written for the JPackage Project <http://www.jpackage.org/>:
SYNOPSIS
set_jvm()
Set the java virtual machine. Use a JAVA_HOME if defined, or try to find it from java command
set_classpath()
Set the classpath - this functions requires a valid JAVA_HOME, JAVACMD, and JAVA_LIBDIR. JARs could be specified as an argument, or
via ADDITIONAL_JARS variable.
set_javacmd()
set the JAVACMD variable. Options should be passed via JAVACMD_OPTS variable.
set_flags()
set FLAGS variable. They could be specified as an argument, or via ADDITIONAL_FLAGS variable.
set_options()
set OPTIONS variable. They could be specified as an argument, or via ADDITIONAL_OPTIONS variable.
run()
run the application. It executed a following command. If VERBOSE is defined, then it prints the command to stdout.
exec $JAVACMD $FLAGS -classpath $CLASSPATH $OPTIONS $MAIN_CLASS "$@"
set_jvm_dirs()
set JVM-related directories (JVM_LIBDIR, JAVA_VERSION, JAVAVER_LIBDIR and JAVAVER_JNIDIR variables). Requires a correct JAVA_LIBDIR,
JAVA_HOME and JAVA_CMD.
link_jar_repository()
links a jar repository. Options could be found in build-jar-repository(1)
find_jar()
finds a specific extention (jar or directory). Requires a correct JAVA_LIBDIR, JAVAVER_LIBDIR and JVM_LIBDIR. Used by find-jar(1) com-
mand.
do_find_jar()
core routine used by find_jar()
check_java_env()
checks java environment - the JAVA_HOME, JAVACMD, JAVA_LIBDIR, JNI_LIBDIR variables.
DESCRIPTION
This is a library of generic shell functions which should be used on jpackage.org compatible distributions.
FILES
/use/share/java-utils/java-functions
shell script functions library for Java applications
/etc/java/java.conf
system-wide Java configuration file
~/.java/java.conf
user's Java configuration
SEE ALSO
Regular Manual Pages
build-jar-repository(1)
find-jar(1)
java.conf(5)
jpackage-utils(7)
AUTHORS
Guillaume Rousse <guillomovitch@sourceforge.net>
Ville Skytta <scop at jpackage.org>
David Walluck <david@jpackage.org>
Nicolas Mailhot <Nicolas.Mailhot at laPoste.net>
REPORTING BUGS
Report bugs using JPackage Bugzilla (http://www.jpackage.org/bugzilla/)
jpackage-utils 1.7.5 February 2009 java-functions(7)