AWK Data Cleaning


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers AWK Data Cleaning
# 8  
Old 10-03-2010
Your code is a bit confusing to me (leading curly braces before the if()s for instance). I'd write it this way based on your sample data and what I think you want:

Code:
#!/usr/bin/env ksh

awk '
        NR == 1 {
                printf( "%5s %4s %10s\n", "TRIAL", "MS", "LPUPSIZE" );  # generate our header
                next;
        }

        $9 == 1 || $10 == 1 { next; }   # skip when one of the blink/saccade values are 1

        {
                printf( "%5d %4d %10s\n",  $4, NR-1, $8 );      # NR has the record count which is miliseconds since strart
        }
' <data-file

Using the bit of data you provided, the output is:

Code:
TRIAL   MS   LPUPSIZE
    1    1    1872.00
    1    2    1874.00
    1    3    1873.00
    1    4    1873.00
    1    5    1873.00
    1   30    1877.00
    1   31    1878.00
    1   32    1879.00
    1   33    1881.00

I've used the record number as the 'counter' for milliseconds, adjusted by one to account for the header record. This is consistent even when removing the records that have a 1 in either of the last two columns and as awk maintains it there is no need for us to.

I've also used the regular field seperator (whitespace). I personally have never liked depending on the tab character for separation. This does shift the values of the last two fields a bit, but shouldn't be a problem.
# 9  
Old 10-04-2010
Thank you for your help! I adjusted the code slightly, but for some reason I am not getting a lapse in the counter due to blinks and saccades.
# 10  
Old 10-04-2010
Quote:
Originally Posted by carmar87
Thank you for your help! I adjusted the code slightly, but for some reason I am not getting a lapse in the counter due to blinks and saccades.
Can you post what you are using -- might be able to see what is wrong.
# 11  
Old 10-04-2010
This is the way I did it:

Code:
Code:
BEGIN {
FS="\t";RS="\n";
}
{if ( NR == 1) { printf "%s\t%s\t%s\n", "Trial", "MS", "LPUPILSIZE"}}
{if ($8 == 1 ||  $9  == 1) {not; } { printf "%s\t%s\t%s\n", $3,NR, $7;
}
}


Last edited by radoulov; 10-05-2010 at 05:19 AM.. Reason: Code tags, please!
# 12  
Old 10-05-2010
I shortened your code a bit. This piece of code {not; } is not valid. I interpreted it to mean that the line should not be printed.
Code:
awk -F"\t" 'NR==1{printf "%s\t\t%s\t%s\n", "Trial", "MS", "LPUPILSIZE";next} 
            NF{if($8!=1 && $9!=1)printf "%s\t%s\t%s\n",$3,++i,$7}' infile

Perhaps this is more want you are trying to achieve? Are the ms correct?

Last edited by Scrutinizer; 10-05-2010 at 04:31 AM..
# 13  
Old 10-05-2010
For some reason it is printing each millisecond of the data set, including the blinks and saccades. The code listed should remove the 1's infile, but it is not.
# 14  
Old 10-05-2010
Oops I posted the wrong solution. Here is the correct one:
Code:
awk -F"\t" 'NR==1{printf "%s\t\t%s\t%s\n", "Trial", "MS", "LPUPILSIZE";next} 
            NF{i++; if($8!=1 && $9!=1)printf "%s\t%s\t%s\n",$3,i,$7}' infile

 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk --> math-operation in data-record and joining with second file data

Hi! I have a pretty complex job - at least for me! i have two csv-files with meassurement-data: fileA ...... (2 Replies)
Discussion started by: IMPe
2 Replies

2. Shell Programming and Scripting

Cleaning through perl or awk a Stemmer dictionary

Hello, I work under Windows Vista and I am compiling an open-source stemmer dictionary for English and eventually for other Indian languages. The Engine which I have written has spewed out all lemmatised/expanded forms of the words: Nouns, Adjectives, Adverbs etc. Each set of expanded forms is... (4 Replies)
Discussion started by: gimley
4 Replies

3. Shell Programming and Scripting

Cleaning output using awk

I have some small problem with my code. data.html <TD class="statuscol2">c</TD> <TD class="statuscol3">18</TD> <TD class="statuscol4"><SPAN TITLE="#04">test4</SPAN></TD> <TD... (4 Replies)
Discussion started by: Jotne
4 Replies

4. Shell Programming and Scripting

Help with parsing data with awk , eliminating unwanted data

Experts , Below is the data: --- Physical volumes --- PV Name /dev/dsk/c1t2d0 VG Name /dev/vg00 PV Status available Allocatable yes VGDA 2 Cur LV 8 PE Size (Mbytes) 8 Total PE 4350 Free PE 2036 Allocated PE 2314 Stale PE 0 IO Timeout (Seconds) default --- Physical volumes ---... (5 Replies)
Discussion started by: rveri
5 Replies

5. Shell Programming and Scripting

Cleaning AWK code

Hi I need some help to clean my code used to get city location. wget -q -O - http://www.ip2location.com/ | grep chkRegionCity | awk 'END { print }' | awk -F"" '{print $4}' It gives me the city but have a leading space. I am sure this could all be done by one single AWK Also if possible... (8 Replies)
Discussion started by: Jotne
8 Replies

6. Shell Programming and Scripting

cleaning the file

Hi, I have a file with multiple rows. each row has 8 columns. Column 8 has entries separated by commas. I want to exclude all the rows in which column 8 has more than 3 commas. 1234#0/1 - ABC_1234 3 ATGCATGCATGC HHHIIIGIHVF 1 49:T>C,60:T>C,78:C>A,76:G>T,65:T>G Thanks, Diya (3 Replies)
Discussion started by: Diya123
3 Replies

7. Shell Programming and Scripting

File cleaning

HI , I am getting the source data as below. Source Data CDR_Data,,,,, F1,F2,F3,F4,F5,F6 5,5,6,7,8,7 6,6,g,,, 7,7,76,,, 8,8,gt,,, 9,9,df ,d,d,d ,,,,, (4 Replies)
Discussion started by: wangkc
4 Replies

8. Shell Programming and Scripting

Data Cleaning in a file

Hi , I have the below source data, I need to clean the data in 2nd,4th,5th columns. Source Data RECORD,CASH_TRANS,BEJING,AUG2011/CASH_TRANS,Y/N150/CASH_TRANS,N/201108 RECORD,CASH_TRANS,INDIA,AUG2011/CASH_TRANS,Y/NC110/CASH_TRANS,N/201108... (7 Replies)
Discussion started by: mora
7 Replies

9. AIX

doing some spring cleaning....

USERS="me you jim joe sue" for user in ${USERS}; do rmuser -p $user usrdir=`cat /etc/passwd|grep $user|awk -F":" '{ print $6 }'` rm -fr `cat /etc/passwd|grep $user|awk -F":" '{ print $6 }'` echo Deleting: $user '\t' REMOVING: $usrdir done This is for AIX ONLY!!! but easily ported to... (0 Replies)
Discussion started by: Optimus_P
0 Replies
Login or Register to Ask a Question