Making an information matrix using Java


 
Thread Tools Search this Thread
Top Forums Programming Making an information matrix using Java
# 1  
Old 06-11-2009
Making an information matrix using Java

Hello all,

I'm doing some research in Biostatistics this summer studying chloroplast genomes. I have 19 text files that look exactly like this:

Name: Marchantia polymorpha
FileName: NC_001319
Bases: 121024
Genes: rps12 <85..842
rps7 (892..1359)
ndhB (1514..3555)
psbM (4001..4105)
rpoB (5859..9056)
rpoC1 (9087..11737)
rpoC2 (11811..15971)
rps2 (16055..16762)
atpI (16890..17636)
atpH (18014..18259)
atpF (18468..19609)
atpA (19654..21177)
ycf12 -(22162..22263)
psbI -(22997..23107)
psbK -(23438..23605)
chlB (24053..25594)
psbA (28368..29429)
mbpX (37012..38124)
psbD (38855..39916)
psbC (39864..41285)
psbZ (41647..41835)
rps14 -(42333..42635)
psaB -(42724..44928)
psaA -(44955..47207)
rps4 -(49425..50033)
ndhJ -(51233..51709)
ndhK -(51793..52524)
ndhC -(52515..52877)
atpE -(53955..54362)
atpB -(54368..55846)
rbcL (56355..57782)
accD (58065..59015)
psaI (59193..59303)
ycf4 (59525..60079)
cemA (60151..61455)
petA (61641..62603)
psbJ -(62794..62916)
psbL -(63036..63152)
psbF -(63174..63293)
psbE -(63303..63554)
petG (64370..64483)
psaJ (65027..65155)
rpl33 (65273..65470)
rps18 (65498..65725)
rpl20 -(65807..66157)
clpP -(67130..68640)
psbB (69026..70552)
psbT (70669..70776)
psbN -(70863..70994)
psbH (71092..71316)
petB (71424..72566)
petD (72715..73690)
rpoA -(73802..74824)
rps11 -(74857..75249)
rpl36 -(75300..75413)
infA -(75450..75686)
rps8 -(75773..76171)
rpl14 -(76253..76621)
rpl16 -(76719..77685)
rps3 -(77743..78396)
rpl22 -(78445..78804)
rps19 -(78822..79100)
rpl2 -(79137..80514)
rpl23 -(80550..80825)
ndhF -(91101..93179)
rpl21 (93469..93819)
rpl32 (93886..94095)
cysT (94183..95049)
ccsA (95482..96444)
ndhD -(96665..98164)
psaC -(98289..98534)
ndhE -(98757..99059)
ndhG -(99113..99688)
ndhI -(99779..100330)
ndhA -(100382..102200)
ndhH -(102202..103380)
rps15 -(103433..103699)
chlL -(110104..110973)

Of course, each one of the 19 files have a different Name, NC_ number, number of bases, and different genes in different numerical positions. I have a slight knowledge of Java, and wish to take these files and make an information matrix. The NC_ numbers of the 19 different genes would be listed across the top, and each gene would be listed down the side. Then, in a matrix, if that NC_ number file contains a certain gene on the left, place a 1 in the matrix, otherwise a 0. If I just have the 19 text files as command line args, is it possible to do this somehow? Maybe with a TreeMap or other data structure?

If it would make it easier, I could also trim the files down to just the gene names, with the headings of the numerical positions next to each. I don't really know what would be best, but some program that could make this matrix would really help my research. Thanks to anyone taking the time to read this and any ideas would help!

-akreibich07
Login or Register to Ask a Question

Previous Thread | Next Thread

5 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Java expiry information

Hi, I have a requirement to check the expiry date of Java software installed in linux servers. We are using the below Java java version "1.8.0_51" Java(TM) SE Runtime Environment (build 1.8.0_51-b16) Java HotSpot(TM) 64-Bit Server VM (build 25.51-b03, mixed mode) I am not sure... (3 Replies)
Discussion started by: nextStep
3 Replies

2. Shell Programming and Scripting

Help making simple perl or bash script to create a simple matrix

Hello all! This is my first post and I'm very new to programming. I would like help creating a simple perl or bash script that I will be using in my work as a junior bioinformatician. Essentially, I would like to take a tab-delimted or .csv text with 3 columns and write them to a "3D" matrix: ... (16 Replies)
Discussion started by: torchij
16 Replies

3. Shell Programming and Scripting

awk? adjacency matrix to adjacency list / correlation matrix to list

Hi everyone I am very new at awk but think that that might be the best strategy for this. I have a matrix very similar to a correlation matrix and in practical terms I need to convert it into a list containing the values from the matrix (one value per line) with the first field of the line (row... (5 Replies)
Discussion started by: stonemonkey
5 Replies

4. Ubuntu

How to convert full data matrix to linearised left data matrix?

Hi all, Is there a way to convert full data matrix to linearised left data matrix? e.g full data matrix Bh1 Bh2 Bh3 Bh4 Bh5 Bh6 Bh7 Bh1 0 0.241058 0.236129 0.244397 0.237479 0.240767 0.245245 Bh2 0.241058 0 0.240594 0.241931 0.241975 ... (8 Replies)
Discussion started by: evoll
8 Replies

5. Shell Programming and Scripting

diagonal matrix to square matrix

Hello, all! I am struggling with a short script to read a diagonal matrix for later retrieval. 1.000 0.234 0.435 0.123 0.012 0.102 0.325 0.412 0.087 0.098 1.000 0.111 0.412 0.115 0.058 0.091 0.190 0.045 0.058 1.000 0.205 0.542 0.335 0.054 0.117 0.203 0.125 1.000 0.587 0.159 0.357... (11 Replies)
Discussion started by: yifangt
11 Replies
Login or Register to Ask a Question