Normalization using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Normalization using awk
# 1  
Old 05-31-2011
Normalization using awk

Hi

I have a file with



Code:
chr22_190_200    XXY    0    0    
 chr22_201_210    XXY    0    30    
 chr22_211_220    XXY    3    0    
 chr22_221_230    XXY    0    0    
 chr22_231_240    XXY    5    0    
 chr22_241_250    ABC    0    0    
 chr22_251_260   ABC    22    11    
 chr22_261_270    ABC    20    0    
 chr22_271_280    ABC    0    0

I want to perform normalization in order to get a constant .. for instance for gene XXY i want to separate the reads and calculate the constant by summing up counts in column 3 and column 4 and based on the greater value divide the other column sum and get a constant

for example from the above file I just picked the reads for gene XXY and listed below:

Code:
chr22_190_200    XXY    0    0    
 chr22_201_210    XXY    0    30    
 chr22_211_220    XXY    3    0    
 chr22_221_230    XXY    0    0    
 chr22_231_240    XXY    5    0

Total sum of column 3 is 8 and column 4 is 30

In the above sum of column 4 is higher than column 3 so the constant (c) will be 30/8 which is ~3.7

I can perform the above in excel for each gene but my file has 348000 genes. So I want to perform it using scripting.

The output should have all columns as above along with the constant listed in column 5

o/p:
Code:
chr22_190_200    XXY    0    0    3.7
 chr22_201_210    XXY    0    30     3.7

Thanks,

Diya
Moderator's Comments:
Mod Comment
Please use code tags when posting data and code samples!

Last edited by Diya123; 05-31-2011 at 04:29 PM.. Reason: once again - please use code tags!
# 2  
Old 05-31-2011
Code:
nawk 'BEGIN{ ARGV[ARGC++] = ARGV[1] } FNR==NR {f3[$2]+=$3; f4[$2]+=$4;next}{print $0, (f3[$2]>f4[$2])?f3[$2]/f4[$2]:f4[$2]/f3[$2]}' myFile

or a bit shorter:
Code:
nawk 'BEGIN{ ARGV[ARGC++] = ARGV[1] } FNR==NR {f3[$2]+=$3; f4[$2]+=$4;next}{div=f4[$2]/f3[$2];print $0, (f3[$2]>f4[$2])?1/div:div}' myFile


Last edited by vgersh99; 05-31-2011 at 04:40 PM..
# 3  
Old 05-31-2011
Thanks a lot for the quick response.

When I tried with my original file it dint work.. It worked with my example file which I posted.

The only difference is column 2 has names with hyphens and underscores. Do you think that will make difference.

Thanks,

Diya
# 4  
Old 05-31-2011
In what way did it "not work"?
This User Gave Thanks to Corona688 For This Post:
# 5  
Old 05-31-2011
Quote:
Originally Posted by Diya123
Thanks a lot for the quick response.

When I tried with my original file it dint work.. It worked with my example file which I posted.

The only difference is column 2 has names with hyphens and underscores. Do you think that will make difference.

Thanks,

Diya
repost the portion of the real file that "didn't work" - please use code tags when doing so.
# 6  
Old 05-31-2011
Thank you so much.

It worked..I had some issues on my end.
# 7  
Old 06-02-2011
normalization using awk

In my example above some of the symbol names in column 2 are like XXY_abc etc.. So when I execute the code below its actually treating XXY and XXY_abc or XXY_abc_XXY_bcd as different, but they are the same( as their starting is XXY)

How can I tell awk to iterate for each gene based on the first value( For instance if it sees XXY or XXY_abc it should consider both as same and normalize the counts)

Thanks,

Diya
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk output yields error: awk:can't open job_name (Autosys)

Good evening, Im newbie at unix specially with awk From an scheduler program called Autosys i want to extract some data reading an inputfile that comprises jobs names, then formating the output to columns for example 1. This is the inputfile: $ more MapaRep.txt ds_extra_nikira_usuarios... (18 Replies)
Discussion started by: alexcol
18 Replies

2. Shell Programming and Scripting

Data Normalization

Hi, there Need help on rearranging the data. I have data in the following format. LAC = 040 DN = 24001001 EQN = 920- 2- 0- 1 CAT = MS OPTRCL (7 Replies)
Discussion started by: meetsriharsha
7 Replies

3. Shell Programming and Scripting

Passing awk variable argument to a script which is being called inside awk

consider the script below sh /opt/hqe/hqapi1-client-5.0.0/bin/hqapi.sh alert list --host=localhost --port=7443 --user=hqadmin --password=hqadmin --secure=true >/tmp/alerts.xml awk -F'' '{for(i=1;i<=NF;i++){ if($i=="Alert id") { if(id!="") if(dt!=""){ cmd="sh someScript.sh... (2 Replies)
Discussion started by: vivek d r
2 Replies

4. Shell Programming and Scripting

HELP with AWK one-liner. Need to employ an If condition inside AWK to check for array variable ?

Hello experts, I'm stuck with this script for three days now. Here's what i need. I need to split a large delimited (,) file into 2 files based on the value present in the last field. Samp: Something.csv bca,adc,asdf,123,12C bca,adc,asdf,123,13C def,adc,asdf,123,12A I need this split... (6 Replies)
Discussion started by: shell_boy23
6 Replies

5. Shell Programming and Scripting

awk command to compare a file with set of files in a directory using 'awk'

Hi, I have a situation to compare one file, say file1.txt with a set of files in directory.The directory contains more than 100 files. To be more precise, the requirement is to compare the first field of file1.txt with the first field in all the files in the directory.The files in the... (10 Replies)
Discussion started by: anandek
10 Replies

6. Shell Programming and Scripting

Normalization using awk

I made my explanation precise in the CODE below. I can do this manually. But is there a way to automate this? If I give 4 or 10 or any number of inputs. It should calculate the CODE and print the different outputs with normalization value ? some thing like script.sh input1 input2 input3 input4... (12 Replies)
Discussion started by: quincyjones
12 Replies

7. Shell Programming and Scripting

Problem with awk awk: program limit exceeded: sprintf buffer size=1020

Hi I have many problems with a script. I have a script that formats a text file but always prints the same error when i try to execute it The code is that: { if (NF==17){ print $0 }else{ fields=NF; all=$0; while... (2 Replies)
Discussion started by: fate
2 Replies

8. Shell Programming and Scripting

Normalization Using Shell Scripting.

Hi All, I am having a file having below three lines or maybe more than 3 lines. The first line will be always constant. ### Line 1 #### Transformation||Transformation Mapplet Name||Transformation Group||Partition Index||Transformation Row ID||Error Sequence||Error Timestamp||Error UTC... (4 Replies)
Discussion started by: satyaranjon
4 Replies

9. Shell Programming and Scripting

Awk problem: How to express the single quote(') by using awk print function

Actually I got a list of file end with *.txt I want to use the same command apply to all the *.txt Thus I try to find out the fastest way to write those same command in a script and then want to let them run automatics. For example: I got the file below: file1.txt file2.txt file3.txt... (4 Replies)
Discussion started by: patrick87
4 Replies
Login or Register to Ask a Question