Help making simple perl or bash script to create a simple matrix


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help making simple perl or bash script to create a simple matrix
# 8  
Old 04-25-2012
Hi neutronscott,
I'm trying to run your script but it seems to be giving me an endless loop.
I tried running:
Code:
$ awk -f code.awk test.txt >output.txt

and
Code:
$ awk code.awk test.txt >output.txt

I've attached the example test.txt file and how I want the output to look.

Thanks again for you time
# 9  
Old 04-25-2012
Ok. Three things I missed then.

1. I forgot to skip the header, which gave a non-numerical value and really messed up the column loop.
2. My output used spaces rather than tabs
3. Didn't expect quotes around field 2

new code:
Code:
#!/usr/bin/awk -f

BEGIN { FS="\t" }
NR==1 {next}    # skip header

# keep list of unique genes, in order
!($1 in genes_uniq) { genes_uniq[$1]; genes[gene_idx++]=$1; }

{
        # unquote
        gsub(/(^"|"$)/,"",$2)
        split($2, cols, /,/)
        for (col in cols) {
                if (cols[col] > max_col) max_col=cols[col]
                matrix[$1,cols[col]] = matrix[$1,cols[col]] "," $3
        }
}

END {
        # print header
        printf("gene\t")
        for (col = 1; col <= max_col; col++)
                printf("%d%c", col, (col==max_col)?"\n":"\t");

        for (i = 0; i < gene_idx; i++) {
                printf("%s\t", genes[i]);
                for (col = 1; col <= max_col; col++)
                        printf("%s%c", substr(matrix[genes[i],col],2),
                                (col==max_col)?"\n":"\t");
        }
}

also you can use awk -f script.awk input >output or chmod a+x script.awk and simply run ./script.awk input >output

Last edited by neutronscott; 04-25-2012 at 03:43 PM.. Reason: how to invoke
# 10  
Old 04-25-2012
It's very close. It seems to work when there is a single identifier for each gene or sample, but there is a problem with gene c and d. For gene c, two samples (3 and 4) share the same identifier, but it is printing sample 4 incorrectly.

For gene d and sample 5, i was hoping % and * would be printed in the same cell, seperated by a comma. It bugs with the second identifier for gene d.

I've attached the output your script gave.

Thanks again.

Output
Code:
 gene 1 2 3 4 5
a     @
 
b       #
 
c         @
      @
 
d             %
,*

# 11  
Old 04-25-2012
ah. windows formatted input file is adding \r to those identifiers. add this line near top somewhere (like after the NR==1 rule)

Code:
{sub(/\r$/,"")}

# 12  
Old 04-25-2012
It works perfectly with the test input I had, but it's reproducing the endless loop with my data file. I wonder, in my data file all three of sample, gene, identifier will be strings, not integers. I wonder if that is the issue.

Code:
gene sample identifier
C10orf107 NJATRTSP228B,ATRTSP228,ATRT34 pL132F
C10orf11 ATRT5B,ATRT5 
C10orf111 ATRT2B,ATRT2,ATRT4B,ATRT4,ATRT16B,ATRT16 pR62W
C10orf113 ATRT15B,ATRT15 
C10orf113 ATRT63 pA312T
C10orf12 ATRT33 pL314P
C10orf12 ATRT63 pE396G
C10orf12 ATRT45B,ATRT45 pP988A
C10orf12 ATRT46B,ATRT46 pR191C

# 13  
Old 04-25-2012
Yes. Almost changed that but give me an hour or so. I left work Smilie
# 14  
Old 04-25-2012
Hey man, I am in no position to complain about time, you've been such an extremely big help, hopefully I'll be able to pay you back. Take your time, this is a favor!
Cheers
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Simple 4x4 matrix

I am trying to make a 4x4 matrix and I would greatly appreciate any help. I have 4 text files and I want to do the following. I want to concatenate them and gzip them. Then I want to find the file size of the concatenated file and subtract the value of file A. Finally, I want to output this final... (1 Reply)
Discussion started by: sdw8253
1 Replies

2. Shell Programming and Scripting

Convert bash to simple perl

please delete! (0 Replies)
Discussion started by: SkySmart
0 Replies

3. Shell Programming and Scripting

Covert simple bash script in perl language

Hello, Anyone please covert this in perl language ######################## if ps faux | grep -v grep | grep ProcessXYZ then echo "$SERVICE is running, , everything is fine" exit 0 else echo "$SERVICE is not running" exit 2 fi Additional... (1 Reply)
Discussion started by: fed.linuxgossip
1 Replies

4. Homework & Coursework Questions

Create a simple bash backup script of a file

This is the problem: Write a script that will make a backup of a file giving it a ‘.bak’ extension & verify that it works. I have tried a number of different scripts that haven't worked and I haven't seen anything really concise and to the point via google. For brevity's sake this is one of the... (4 Replies)
Discussion started by: demet8
4 Replies

5. Shell Programming and Scripting

Create simple script

Dear all, I have a directory named A and some subdirectories named B, C, D with .xml files. I want to use the following command to strip the file. sed -re ':start s/<*>//g; /</ {N; b start}' file.xml > file.xml At the same time, I want to remove the blank lines using sed '/^$/d' How can... (6 Replies)
Discussion started by: corfuitl
6 Replies

6. Shell Programming and Scripting

How to create a simple copy script?

Guys I want to do this: copy: /var/router/system1/config/backup/install.put /var/router/system2/config/backup/install.put /var/router/system3/config/backup/install.put /var/router/system4/config/backup/install.put into: /var/router/system1/config/install.dat... (22 Replies)
Discussion started by: DallasT
22 Replies

7. Shell Programming and Scripting

Hopefully a simple script, bash or perl...

I'm attempting to parse a file whose contents follow this format; 4:/eula.1028.txt: 8:/eula.1031.txt: 19:/eula.1033.txt: 23:/eula.1036.txt: 27:/eula.1040.txt: 31:/eula.1041.txt: 35:/eula.1042.txt: 39:/eula.2052.txt: 43:/eula.3082.txt: The number of lines of the file... (4 Replies)
Discussion started by: CudaPrime
4 Replies

8. Solaris

How to create a simple background script on Solaris

I have a local account for a unix server. The idle timeout for the account is around 10 mins. I have to login to the server multiple times during the day. Is there a way to increase the idle timeout or may be a script that I can run on background so it is not idle. Something like echo date every 9... (3 Replies)
Discussion started by: vinaysa
3 Replies

9. Shell Programming and Scripting

Simple Script to create folders

Hi I want to write a small script that will create folders named from `AAAA' all the way to `ZZZZ'. That is: `AAAA' `AAAB' `AAAC' ... `AABA' `AABB' `AABC' ... `ABAA' `ABAB' `ABAC' ... `ABBA' ... `ZZZZ' (4 Replies)
Discussion started by: ksk
4 Replies

10. Shell Programming and Scripting

Modifying simple commands to create a script

Can anyone direct me to a resource that explains scripting in simple terms? I have visited many sites and browsed this forum and have yet to find simple explanations. (8 Replies)
Discussion started by: rocinante
8 Replies
Login or Register to Ask a Question