Go Back   The UNIX and Linux Forums > Top Forums > Programming
.
google site



Programming Post questions about C, C++, Java, SQL, and other programming languages here.

Closed Thread
English Japanese Spanish French German Portuguese Italian Powered by Powered by Google
 
Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 06-11-2009
Registered User
 

Join Date: May 2009
Posts: 10
Making an information matrix using Java

Hello all,

I'm doing some research in Biostatistics this summer studying chloroplast genomes. I have 19 text files that look exactly like this:

Name: Marchantia polymorpha
FileName: NC_001319
Bases: 121024
Genes: rps12 <85..842
rps7 (892..1359)
ndhB (1514..3555)
psbM (4001..4105)
rpoB (5859..9056)
rpoC1 (9087..11737)
rpoC2 (11811..15971)
rps2 (16055..16762)
atpI (16890..17636)
atpH (18014..18259)
atpF (18468..19609)
atpA (19654..21177)
ycf12 -(22162..22263)
psbI -(22997..23107)
psbK -(23438..23605)
chlB (24053..25594)
psbA (28368..29429)
mbpX (37012..38124)
psbD (38855..39916)
psbC (39864..41285)
psbZ (41647..41835)
rps14 -(42333..42635)
psaB -(42724..44928)
psaA -(44955..47207)
rps4 -(49425..50033)
ndhJ -(51233..51709)
ndhK -(51793..52524)
ndhC -(52515..52877)
atpE -(53955..54362)
atpB -(54368..55846)
rbcL (56355..57782)
accD (58065..59015)
psaI (59193..59303)
ycf4 (59525..60079)
cemA (60151..61455)
petA (61641..62603)
psbJ -(62794..62916)
psbL -(63036..63152)
psbF -(63174..63293)
psbE -(63303..63554)
petG (64370..64483)
psaJ (65027..65155)
rpl33 (65273..65470)
rps18 (65498..65725)
rpl20 -(65807..66157)
clpP -(67130..68640)
psbB (69026..70552)
psbT (70669..70776)
psbN -(70863..70994)
psbH (71092..71316)
petB (71424..72566)
petD (72715..73690)
rpoA -(73802..74824)
rps11 -(74857..75249)
rpl36 -(75300..75413)
infA -(75450..75686)
rps8 -(75773..76171)
rpl14 -(76253..76621)
rpl16 -(76719..77685)
rps3 -(77743..78396)
rpl22 -(78445..78804)
rps19 -(78822..79100)
rpl2 -(79137..80514)
rpl23 -(80550..80825)
ndhF -(91101..93179)
rpl21 (93469..93819)
rpl32 (93886..94095)
cysT (94183..95049)
ccsA (95482..96444)
ndhD -(96665..98164)
psaC -(98289..98534)
ndhE -(98757..99059)
ndhG -(99113..99688)
ndhI -(99779..100330)
ndhA -(100382..102200)
ndhH -(102202..103380)
rps15 -(103433..103699)
chlL -(110104..110973)

Of course, each one of the 19 files have a different Name, NC_ number, number of bases, and different genes in different numerical positions. I have a slight knowledge of Java, and wish to take these files and make an information matrix. The NC_ numbers of the 19 different genes would be listed across the top, and each gene would be listed down the side. Then, in a matrix, if that NC_ number file contains a certain gene on the left, place a 1 in the matrix, otherwise a 0. If I just have the 19 text files as command line args, is it possible to do this somehow? Maybe with a TreeMap or other data structure?

If it would make it easier, I could also trim the files down to just the gene names, with the headings of the numerical positions next to each. I don't really know what would be best, but some program that could make this matrix would really help my research. Thanks to anyone taking the time to read this and any ideas would help!

-akreibich07
Sponsored Links
Closed Thread

Bookmarks

Tags
biology, data, java, matrix, structures

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
matrix pointer littleboyblu Programming 3 03-03-2009 02:09 PM
matrix indexes tal Shell Programming and Scripting 2 10-27-2008 07:08 AM
matrix inverse (awk) vesyyr Shell Programming and Scripting 0 12-14-2007 03:18 PM
need help-matrix inverse (awk) vesyyr UNIX for Dummies Questions & Answers 0 12-14-2007 02:44 PM



All times are GMT -4. The time now is 06:55 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2010. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0