Visit Our UNIX and Linux User Community


Normalize a dataset with AWK


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Normalize a dataset with AWK
# 1  
Old 02-24-2009
Normalize a dataset with AWK

Hello everyone,
i have to normalize this dataset (with 20.000 rows):

2,4,4,3,2,7,8,2,9,11,7,7,1,8,5,6
4,7,5,5,5,5,9,6,4,8,7,9,2,9,7,10
7,10,8,7,4,8,8,5,10,11,2,8,2,5,5,10
4,9,5,7,4,7,7,13,1,7,6,8,3,8,0,8,8
6,7,8,5,4,7,6,3,7,10,7,9,3,8,3,7,8

in this form: value=($1*mean)/standard_deviation but i cant figure out how to normalize it.

I write this file to calculate the standard distribution and mean.
Code:
BEGIN{
FS=","
}
{
for(i=1;i<NF;i++)
    {
        total[i]+=$i;
        totalSquared[i]+=$i^2;
    }
numberColumn=NF;
}
END{
for (i=1;i<numberColumn;i++)
    {
        media=total[i]/NR;
        printf("%.2f|%.2f\n",media,sqrt((totalSquared[i]/NR)-media^2));
    }
}

Can anyone help me to figure out?
# 2  
Old 03-05-2009
I don't know what you mean by "normalize". Do you mean to put every column in terms of number of deviations from the mean? Would you have one output row, or one row for each row of input?

Your code is pretty good, but has a few bugs.
Code:
BEGIN{ FS="," }
{
    for(i=1;i <=  NF;i++)
    {
        total[i]+=$i;
        totalSquared[i]+=$i^2;
    }
    numberColumn=NF;
}
END{
    for (i=1;i <= numberColumn;i++)
    {
        media=total[i]/NR;
        printf("%.2f|%.2f\n",media,sqrt((totalSquared[i]/NR)-media^2));
    }
}


Previous Thread | Next Thread
Test Your Knowledge in Computers #316
Difficulty: Easy
RAM stands for Registered Access Memory.
True or False?

9 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

SAS dataset to CSV

Hi Guys, Is there a way to export a sas file i.e .sas7bdat file to .csv file with header and data using unix. I dont want to use SAS program instead using unix tool or unix scripting is it possible ? (25 Replies)
Discussion started by: Master_Mind
25 Replies

2. Programming

Need sql query to string split and normalize data

Hello gurus, I have data in one of the oracle tables as as below: Column 1 Column 2 1 NY,NJ,CA 2 US,UK, 3 AS,EU,NA fyi, Column 2 above has data delimited with a comma as shown. I need a sql query the produce the below output in two columns... (5 Replies)
Discussion started by: calredd
5 Replies

3. Shell Programming and Scripting

How to insert a column inside a dataset with awk?

Hello folks I have a file called fill1.txt which contains: 1 2 2 1 1 2 1 2 my other file is called fill2.txt which contains: 1 2 1 2 2 2 1 2 1 2 1 1 2 1 1 2 1 1 1 1 2 2 2 1 1 2 2 1 1 2 1 1 1 2 2 2 1 2 2 1 Now, I am looking for a awk command which could insert fill1.txt between... (1 Reply)
Discussion started by: sajmar
1 Replies

4. Solaris

flarecreate for zfs root dataset and ignore multiple dataset

Hi All, I want to write a script to create flar images on multiple servers. In non zfs filesystem I am using -X option to refer a file to exclude mounts on different servers. but on ZFS -X option is not working. I want multiple mounts to be ignore on ZFS base system during flarecreate. I... (0 Replies)
Discussion started by: uxravi
0 Replies

5. UNIX for Dummies Questions & Answers

Normalize Data and write to a flat file

All, Can anyone please help me with the below scenario. I have a Flat file of the below format. ID|Name|Level|Type|Zip|MAD|Risk|Band|Salesl|Dealer|CID|AType|CValue|LV|HV|DCode|TR|DU|NStartDate|UserRole|WFlag|EOption|PName|NActivationDate|Os|Orig|Cus|OType|ORequired|DType 03|... (10 Replies)
Discussion started by: sp999
10 Replies

6. Programming

Dataset Library for C?

I am looking for an opensource dataset library for C. Something equivalent to ADO.Net. Specifically, I am looking for the following features: 1. Create a Dataset from a file (XML or CSV). 2. Create a Dataset from a select query using an ODBC connection. 3. Load a created Dataset into a... (1 Reply)
Discussion started by: a_programmer
1 Replies

7. Shell Programming and Scripting

Computing dataset for a specific record

Hello everybody, I want to compute a data file in awk. I am new in awk and I need your help. The data file has the following fields. It has thousands of records. Col1 Col2 Col3 Col4 Col5 0.85 0.07 Fre 42:86 25 0.73 0.03 frp 21:10 28 0.64... (12 Replies)
Discussion started by: ubeejani
12 Replies

8. Shell Programming and Scripting

Numbers of records in SAS dataset

I'm declaring a variable within a Korn shell to represent the total number of records in a SAS dataset and could use a little help with the syntax. This is what I have thus far: #!/usr/bin/ksh RecCount = `sas -x "select count(*) from /users/abc/123/sas_dataset.sas7bdat"` (2 Replies)
Discussion started by: sasaliasim
2 Replies

9. UNIX for Dummies Questions & Answers

Accessing Mainframe Dataset

Hi May I know is there a way to read/copy a mainframe (IBM OS/390) dataset (sequential file) into a UNIX directory? Thank you for your time. IcyGuava (4 Replies)
Discussion started by: IcyGuava
4 Replies

Featured Tech Videos