Data processing using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Data processing using awk
# 8  
Old 01-13-2013
Awesome, works although I now feel like I know absolutely nothing!

Any comments to explain what you've done would be fab. Otherwise, thank you again.
# 9  
Old 01-13-2013
Quote:
Originally Posted by shadyuk
Both very clever indeed. Output is correct if theres only one column. However, a little more assistance please. If the csv is as follow, how do I get it to convert the bitrate columns (data1, data2, data3) without having to parse the file first?

Don - Your method only outputs if the data is in the first column.
Chubler - Your method outputs the wrong data.

Additionally, I'd like to be able to call the script and give it an input file from the shell prompt rather than explicitly calling the file in the script. For example:

Thanks again.
Sorry. I must have misunderstood what you wanted when you said:
Quote:
Sorry, maybe I didnt make it clear. The file only has the Data column.
in message #3 in this thread.

Anyway, try this:
Code:
#!/bin/ksh
if [ $# -ne 1 ] || [ ! -f "$1" ]
then    echo "Usage: $0 file" >&2
        exit 1
fi
awk 'BEGIN {
        FS = OFS = ","
        f[1] = 100
        f[2] = 10
        f[3] = 1
}
FNR>1 { for(i = 2; i <= 4; i++)
                if(split($i, a, /[KM]/) == 2) {
                        m = $i ~ /M/
                        $i = a[1] * 1000 + a[2] * f[length(a[2])]
                        if(m) $i *= 1000
                }
}
1' "$1"

This applies the requested transformations to fields 2, 3, and 4. However, I have no idea why:
Code:
timestamp,data1,data2,data3

in your input file changed to:
Code:
timestamp,wan1,wan2,wan3

I don't know whether you were saying the other transformations were the incorrect values produced by Chubler_XL's script or if those were the values you want to get from the input you gave.

Assuming the data:
Code:
timestamp,data1,data2,data3
08/01/13 16:31:00,1M07,786,896K
08/01/13 16:31:10,1K23,2K44,4M34
08/01/13 16:31:20,345,2M84,437K
13/01/13 18:51:20,1M,1M2,1M34
13/01/13 18:51:21,1M567,9K,9K8
13/01/13 18:51:22,9K76,9K543,1

is in a file named infile, that you save my script above in a file named awkscript, adjust the /bin/ksh in the first line of my script to be the absolute pathname of the Korn shell on your system, and that you make awkscript executable by running the command:
Code:
chmod +x awkscript

then the command:
Code:
awkscript infile

will produce the output:
Code:
timestamp,data1,data2,data3
08/01/13 16:31:00,1070000,786,896000
08/01/13 16:31:10,1230,2440,4340000
08/01/13 16:31:20,345,2840000,437000
13/01/13 18:51:20,1000000,1200000,1340000
13/01/13 18:51:21,1567000,9000,9800
13/01/13 18:51:22,9760,9543,1

Is this what you wanted?

PS Note that for the last three lines of this input file, this script and the script provided by Chubler_XL in message #7 in this thread produce different results. If I understand you input formats, I think this script does what you want.

Last edited by Don Cragun; 01-14-2013 at 07:49 AM.. Reason: Fix typo "imestamp" -> "timestamp"
# 10  
Old 01-14-2013
The argument could also be made that "1M567" should produce an output of "6670000" (1M + 567 * 10,000). If it's even a valid input.
# 11  
Old 01-14-2013
Quote:
Originally Posted by Chubler_XL
The argument could also be made that "1M567" should produce an output of "6670000" (1M + 567 * 10,000). If it's even a valid input.
I will agree that none of the sample input used xMxxx or xKxxx, but we also provide different output for the input 2K44. My script produces 2440, your script produces 24400. Both of our scripts produce 4340000 for the input 4M34 and I can't believe the intent was for xMxx and xKxx to differ by a factor of 100 instead of 1000 in the way they handle two digits after the K and M multiplier codes. Both of these appear in the lines provided in the sample input (not just in my extended test cases).

We'll have to let shadyuk tell us which one of us made the right assumption for the desired behavior given that the specification didn't cover any of these cases explicitly.
# 12  
Old 01-14-2013
Good spot Don, the output of Chublers script is not right. Was very late last night so I didn't pick this up. The output of your script is correct.

I now have a new problem. The bit rates are combined as follow:

Code:
timestamp,data1,data2,data3
08/01/13 16:31:00,1M07/54K3,786/2K1,896K/1M54
08/01/13 16:31:10,1K23/432,2K44/76K,4M34/29K1

I need to split the output and then 'translate' each individually.

Code:
timestamp,data1_dl,data1_ul,data2_dl,data2_ul,data3_dl,data3_ul
08/01/13 16:31:00,1070000,5430,786,2100,896000,1540000
08/01/13 16:31:10,1230,432,2440,7600,4340000,2910

Thank you!!!

Moderator's Comments:
Mod Comment Please use code tags instead of quote tags for code and data

Last edited by Scrutinizer; 01-14-2013 at 08:53 AM.. Reason: quote tags -> code tags
# 13  
Old 01-14-2013
Why not do this in the spreadsheet itself? Using gnumeric, I came up with this (I know, Mantissa is not the correct term...):
Code:
Data    MantissaFactor  Result  Result
968     968     1       968     968
31K7    31.7    1000    31700   31700
69K5    69.5    1000    69500   69500
1M07    1.07    1000000 1070000 1070000
842K    842.    1000    842000  842000
5M99    5.99    1000000 5990000 5990000
                                ^--- =if(iserr(find("K",A8)),if(iserr(find("M",A8)),1,1000000),1000)*substitute(substitute(A8,"K","."),"M",".")
                        ^--- =if(iserr(find("K",A8)),if(iserr(find("M",A8)),1,1000000),1000)*B8                                
                ^--- =if(iserr(find("K",A8)),if(iserr(find("M",A8)),1,1000000),1000)   
        ^--- =substitute(substitute(A8,"K","."),"M",".")

The functions should be EXCEL- compatible, so give it a shot...

Last edited by RudiC; 01-14-2013 at 07:59 AM..
# 14  
Old 01-14-2013
I was originally using excel to do this but it continuously freezes when my worksheet contains tens of thousands of rows and becomes impossible to work with. Awk is far more efficient. I can then graph up what I need when I have the script output.

Thanks for the advice though Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Data Processing

I have below Data *************************************************** ********************BEGINNING-1******************** directive url is : https://coursera-eu.mokar.com/directives/96df29ff-176a-35f7-8b1b-4ce483d15762 Src urls are :... (8 Replies)
Discussion started by: nikhil jain
8 Replies

2. Shell Programming and Scripting

awk processing of variable number of fields data file

Hy! I need to post-process some data files which have variable (and periodic) number of fields. For example, I need to square (data -> data*data) the folowing data file: -5.34281E-28 -3.69822E-29 8.19128E-29 9.55444E-29 8.16494E-29 6.23125E-29 4.42106E-29 2.94592E-29 1.84841E-29 ... (5 Replies)
Discussion started by: radudownload
5 Replies

3. Programming

awk processing / Shell Script Processing to remove columns text file

Hello, I extracted a list of files in a directory with the command ls . However this is not my computer, so the ls functionality has been revamped so that it gives the filesizes in front like this : This is the output of ls command : I stored the output in a file filelist 1.1M... (5 Replies)
Discussion started by: ajayram
5 Replies

4. UNIX for Dummies Questions & Answers

Genomic data processing

Dear fellow members, I've just joined the forum and am a newbie to shell scripting and programming. I'm stuck on the following problem. I'm working with large scale genomic data and need to do some analyses on it. Essentially it is text processing problem, so please don't mind the scientific... (0 Replies)
Discussion started by: mvaishnav
0 Replies

5. Programming

Data processing

Hello guys! I have some issue in how to processing some data. I have some files with 3 columns. The 1st column is a name of my sample. The 2nd column is a numerical sequence (very big sequence) starting from "1". And the 3rd column is a feature of each line, represented for a number (completely... (2 Replies)
Discussion started by: bfantinatti
2 Replies

6. Shell Programming and Scripting

awk script processing data from 2 files

Hi! I have 2 files containing data that I need to process at the same time, I have problems in reading a different number of lines from the different files. Here is an explanation of what I need to do (possibly with an awk script). File "samples.txt" contains data in the format: time_instant... (6 Replies)
Discussion started by: Alice236
6 Replies

7. Shell Programming and Scripting

Help with data processing, maybe awk

I have a file, first 5 columns are very normal, like "1107",106027,71400,"Y","BIOLOGY",, however, the 6th columns, the user can put comments, anything, just any characters, like new line, double quote, single quote, whatever from the keyboard, like"Please load my previous SOM597G course content in... (3 Replies)
Discussion started by: freelong
3 Replies

8. Shell Programming and Scripting

How should i know that the process is still processing data

I have some process . How should i know that the process is still processing data or got hanged even though it is showing that it is running in background I know of a command called truss. how should i use this command and determine 1) process is still processing data 2) process got hanged... (7 Replies)
Discussion started by: ali560045
7 Replies

9. UNIX for Dummies Questions & Answers

Data File Processing Help

I need to read contents of directory and create a list of data files that match a certain pattern and process by renaming it and calling a existing .ksh script then archiving off to file another directory. Any suggestions or samples u could point me to on using .ksh perl or other to process... (5 Replies)
Discussion started by: mavsman
5 Replies

10. UNIX for Advanced & Expert Users

data processing

hi i am having a file of following kind: 20015#67143645#143123#4214 62014#67143148#67143159#456 15432#67143568#00143862#4632 54112#67143752#0067143657#143 54623#67143357#167215#34531 65446#67143785#143598#7456 75642#67143546#156146#845 24464#67143465#172532#6544... (5 Replies)
Discussion started by: rochitsharma
5 Replies
Login or Register to Ask a Question