Data processing using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Data processing using awk
# 1  
Old 01-13-2013
Data processing using awk

Hello,

I have some bitrate data in a csv which is in an odd format and is difficult to process in Excel when I have thousands of rows. Therefore, I was thinking of doing this in bash and using awk as the primary application except that due to its complication, I'm a little stuck.

Code:
Data____Output_____Calculations
968_____968________no K or M so do nothing
31K7____31700______31*K + 7*100
69K5____69500______69*K + 5*100
1M07____1070000____1*M + 5*10,000
842K____842000_____842*K
5M99____5990000____5*M + 99*10,000

Note: K = 1000, M = 1,000,000

As you can see, it's a bit of a nuisance. The file only has the Data column. What I need the script to do is the calculations and spit out the Output column. Please ignore the underscores as I couldn't get it to indent properly with the editor. Any help would be much appreciated.

Thanks

Last edited by Scrutinizer; 01-14-2013 at 08:52 AM.. Reason: quote tags -> code tags
# 2  
Old 01-13-2013
How about something like this:

Code:
sed -E 's/_+/\t/g' infile > outfile

Result would be a TAB delimited file
# 3  
Old 01-13-2013
Sorry, maybe I didnt make it clear. The file only has the Data column. What I need the script to do is the calculations and spit out the Output column.

Thanks.
# 4  
Old 01-13-2013
This should work for input with 1, 2, or 3 digits following a K or M in your input:
Code:
awk 'BEGIN {
        f[1] = 100
        f[2] = 10
        f[3] = 1
}
$1 ~ /[KM]/ {
        split($1, a, /[KM]/)
        m = $1 ~ /M/
        $1 = a[1] * 1000 + a[2] * f[length(a[2])]
        if(m) $1 *= 1000
}
1' data

If you are using a Solaris system, use /usr/xpg4/bin/awk or nawk instead of awk.
This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 01-13-2013
Try this:

Code:
awk '
  /[KM]$/{gsub(/$/,"0")}
  /K/{gsub(/K/,"");gsub(/$/,"00")}
  /M/{gsub(/M/,"");gsub(/$/,"0000")}
  1' infile

This User Gave Thanks to Chubler_XL For This Post:
# 6  
Old 01-13-2013
Both very clever indeed. Output is correct if theres only one column. However, a little more assistance please. If the csv is as follow, how do I get it to convert the bitrate columns (data1, data2, data3) without having to parse the file first?

Code:
timestamp,data1,data2,data3
08/01/13 16:31:00,1M07,786,896K
08/01/13 16:31:10,1K23,2K44,4M34
08/01/13 16:31:20,345,2M84,437K

Don - Your method only outputs if the data is in the first column.
Chubler - Your method outputs the wrong data.

Code:
timestamp,wan1,wan2,wan3
08/01/13 16:31:00,107,786,8960000000
08/01/13 16:31:10,123,244,434000000
08/01/13 16:31:20,345,284,4370000000

Additionally, I'd like to be able to call the script and give it an input file from the shell prompt rather than explicitly calling the file in the script. For example:

Code:
$ awkscript infile

Thanks again.

Last edited by Scrutinizer; 01-14-2013 at 08:52 AM.. Reason: quote tags -> code tags
# 7  
Old 01-13-2013
Try this:
Code:
#!/bin/awk -f
function expkm(a) {
  if(split(a,v,"K")>1) return v[1] (v[2]?v[2]:"0") "00"
  if(split(a,v,"M")>1) return v[1] (v[2]?v[2]:"0") "0000"
  return a
}
BEGIN {FS=OFS=","}
{for(i=2;i<=NF;i++) $i=expkm($i)}
1

Output:
Code:
$ ./awkscript infile
timestamp,wan1,wan2,wan3
08/01/13 16:31:00,1070000,786,896000
08/01/13 16:31:10,12300,24400,4340000
08/01/13 16:31:20,345,2840000,437000

This User Gave Thanks to Chubler_XL For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Data Processing

I have below Data *************************************************** ********************BEGINNING-1******************** directive url is : https://coursera-eu.mokar.com/directives/96df29ff-176a-35f7-8b1b-4ce483d15762 Src urls are :... (8 Replies)
Discussion started by: nikhil jain
8 Replies

2. Shell Programming and Scripting

awk processing of variable number of fields data file

Hy! I need to post-process some data files which have variable (and periodic) number of fields. For example, I need to square (data -> data*data) the folowing data file: -5.34281E-28 -3.69822E-29 8.19128E-29 9.55444E-29 8.16494E-29 6.23125E-29 4.42106E-29 2.94592E-29 1.84841E-29 ... (5 Replies)
Discussion started by: radudownload
5 Replies

3. Programming

awk processing / Shell Script Processing to remove columns text file

Hello, I extracted a list of files in a directory with the command ls . However this is not my computer, so the ls functionality has been revamped so that it gives the filesizes in front like this : This is the output of ls command : I stored the output in a file filelist 1.1M... (5 Replies)
Discussion started by: ajayram
5 Replies

4. UNIX for Dummies Questions & Answers

Genomic data processing

Dear fellow members, I've just joined the forum and am a newbie to shell scripting and programming. I'm stuck on the following problem. I'm working with large scale genomic data and need to do some analyses on it. Essentially it is text processing problem, so please don't mind the scientific... (0 Replies)
Discussion started by: mvaishnav
0 Replies

5. Programming

Data processing

Hello guys! I have some issue in how to processing some data. I have some files with 3 columns. The 1st column is a name of my sample. The 2nd column is a numerical sequence (very big sequence) starting from "1". And the 3rd column is a feature of each line, represented for a number (completely... (2 Replies)
Discussion started by: bfantinatti
2 Replies

6. Shell Programming and Scripting

awk script processing data from 2 files

Hi! I have 2 files containing data that I need to process at the same time, I have problems in reading a different number of lines from the different files. Here is an explanation of what I need to do (possibly with an awk script). File "samples.txt" contains data in the format: time_instant... (6 Replies)
Discussion started by: Alice236
6 Replies

7. Shell Programming and Scripting

Help with data processing, maybe awk

I have a file, first 5 columns are very normal, like "1107",106027,71400,"Y","BIOLOGY",, however, the 6th columns, the user can put comments, anything, just any characters, like new line, double quote, single quote, whatever from the keyboard, like"Please load my previous SOM597G course content in... (3 Replies)
Discussion started by: freelong
3 Replies

8. Shell Programming and Scripting

How should i know that the process is still processing data

I have some process . How should i know that the process is still processing data or got hanged even though it is showing that it is running in background I know of a command called truss. how should i use this command and determine 1) process is still processing data 2) process got hanged... (7 Replies)
Discussion started by: ali560045
7 Replies

9. UNIX for Dummies Questions & Answers

Data File Processing Help

I need to read contents of directory and create a list of data files that match a certain pattern and process by renaming it and calling a existing .ksh script then archiving off to file another directory. Any suggestions or samples u could point me to on using .ksh perl or other to process... (5 Replies)
Discussion started by: mavsman
5 Replies

10. UNIX for Advanced & Expert Users

data processing

hi i am having a file of following kind: 20015#67143645#143123#4214 62014#67143148#67143159#456 15432#67143568#00143862#4632 54112#67143752#0067143657#143 54623#67143357#167215#34531 65446#67143785#143598#7456 75642#67143546#156146#845 24464#67143465#172532#6544... (5 Replies)
Discussion started by: rochitsharma
5 Replies
Login or Register to Ask a Question