Greetings!
please help me produce the following solution. I need
to produce one big matrix file from several files in different levels.
If it helps, the index folder provides information on chromosome index and
the data folder provides information on values for chromosomes.
there are 2 folders at the same level, index and data.
The index folder has multiple files named chr1, chr2 etc.
The data folder has many subfolders, Each subfolder has multiple files named chr1, chr2 etc. with the same names
as files in the index folder. A particular file and its namesake will have the same number of rows in it.
So if chr1 in index has 5 rows, chr1 in all subfolders within data will also have 5 rows.
The output should be a big matrix with a nested format, where the rownames(first col) starting row2 should be the file names
and the column names(first row) startng col3 should be the names of corresponding subfolders in data folder.
All files have 1 column and multiple rows with only integer numbers.
This works like a charm with the sample data, but with the actual data it is taking forever, ...40 mins and it hasn't written a single output row..i guess i will have to wait it out....thanks again...
if itsnt much trouble, is there a more efficient way?
---------- Post updated at 09:29 AM ---------- Previous update was at 02:38 AM ----------
Update : 6 hours in , still no output lines, the data code and the code are fine...
just that the data is too big , 22 gigs to be precise.
Wow, 22Gb is a lot of data I'm assuming your using GNU awk or it would have probably fallen over by now.
The solution does need to read in all the data files before any output starts. I would assume the final output phase will be quite quick so dont get too worried that no output has appeared yet.
What is the total number of index files and the total number of subdirectories?
I can think of another method to solve this problem it those file/subdir counts aren't too massive, but I have some other stuff to do for the next 3 hours or so - I'll start working on it for you then.
This User Gave Thanks to Chubler_XL For This Post:
There are 13 files in index, 83 sub-folders in data with 13 files each.
The size of files is what is creating this hang-time.
please take your time,its not a matter of life and death to get done in the next few hours.
and cant thank you enough for your help.
As you only have 83 subfolders gawk should be able to keep all the files open as it works (ulimit -n controls how many files gawk can have open) this will reduce the memory requirements from tens of Gb to a few Kb (and you should get output pretty much straight away).
Many traditional awk have a 15 open file limit and if you only have this awk your out of luck with this solution.
Last edited by Chubler_XL; 12-18-2012 at 05:09 PM..
Reason: Fix indents + remove debug line
This User Gave Thanks to Chubler_XL For This Post:
The code runs fine with the sample, but with the actual data it just prints the first row , the sub-folder names. let me play around the code a little bit and try to find out whats happening.
Hi there!
I'm new to Unix and haven't done command line stuff since MS-Dos and Turbo Pascal (hah!),
I would love some help figuring out this basic command (what I assume is basic).
I'd like to add a User to the permissions of all files in a folder and all files in all subfolders, as well... (9 Replies)
I am trying to move specific folders and subfolders within a directory using the below. I can see the folders to move and they are at the location, but I am getting an error. Thank you :).
mv -v /home/cmccabe/Desktop/NGS/API/6-10-2016{bam/{validation,coverage},bedtools /media/cmccabe/"My... (6 Replies)
Hi,
I need a script/command to list out all the files in current path and also the files in folder and subfolders.
Ex: My files are like below
$ ls -lrt
total 8
-rw-r--r-- 1 abc users 419 May 25 10:27 abcd.xml
drwxr-xr-x 3 abc users 4096 May 25 10:28 TEST
$
Under TEST, there are... (2 Replies)
Hi all, I'm a newbie in shell scripting and currently I'm trying to create a matrix using bash. The Output will look like this
AB CDE FG
1
2
3
4
5
6
7
I'm stuck on the ABCDEFG display.
printFlightSeats()
{
rows=7
columns=7
for ((i=0;i<=$rows;i++))
do (0 Replies)
Hi Unix Gurus,
I am able to copy only files that exist in the parent folder. My parent folder has sub folders and within sub folders there are lots files.
I need to copy folder, sub folders and files from Unix to the remote windows SFTP location.
The directory structure is something like... (1 Reply)
Hi,
I need help in writing a script to search a particular text in multiple files present in folders and sub folders and replace it with another string which also has special characters like '&', '|', etc.. I know sed command will be used to replace the text but i'm not sure how to use it for... (5 Replies)
Sir
From a unix machine some folders and their folders have to be copied to windows XP PC. Please help me with a batch file or a shell script. I am new to the the shell and batch files. Thanks in anticipation.
sastry (3 Replies)
HI,
I have the following command that shows me the total size of folders and subfolders :
du -hs *| sort -n
result:
1.0M sandeep
1.4G sandy
1.4M important
1.6M files
but I will need to know the size of folders and its subfolders( not size of individual files though)... (5 Replies)
I would like to know if there is a script out there that someone may have already written that I can use to analyze folders and sub folders on my AIX system.
It can be in perl or a basic korn script.
Thanks in advance. (7 Replies)
Hi,
Can any one help me how to create folders using shellscript.My requirement is:
FolderName: Main/Main1
:Main/Main2
:Main/Main3
underSubFolder : Main1/A
:Main1/B
:Main1/C
underSubfolder: A/A1
... (2 Replies)