Data processing using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Data processing using awk
# 15  
Old 01-14-2013
Try:
Code:
awk '
  NR>1{
    for(i=2; i<=NF; i++) sub("/",",",$i)
    $0=$0
    for(i=2; i<=NF; i++){
      p=0
      if($i~/G/) p=9
      if($i~/M/) p=6
      if($i~/K/) p=3
      sub(/[GMK]/,".",$i)
      $i*=10^p
    }
  }
  1
' FS=, OFS=, infile

# 16  
Old 01-14-2013
Thank you. How do I make it so that I can call the script from shell and give it a file as per Don's original code?

Code:
./awkscript infile

# 17  
Old 01-14-2013
Quote:
Originally Posted by shadyuk
Thank you. How do I make it so that I can call the script from shell and give it a file as per Don's original code?

Code:
./awkscript infile

Do the same thing I did in my original code. I also added a line to change the headers for you:
Code:
#!/bin/ksh
if [ $# -ne 1 ] || [ ! -f "$1" ]
then    echo "Usage: $0 file" >&2
        exit 1
fi
awk '
  NR==1{
    $0="timestamp,data1_dl,data1_ul,data2_dl,data2_ul,data3_dl,data3_ul"
  }
  NR>1{
    for(i=2; i<=NF; i++) sub("/",",",$i)
    $0=$0
    for(i=2; i<=NF; i++){
      p=0
      if($i~/G/) p=9
      if($i~/M/) p=6
      if($i~/K/) p=3
      sub(/[GMK]/,".",$i)
      $i*=10^p
    }
  }
  1
' FS=, OFS=, "$1"

This User Gave Thanks to Don Cragun For This Post:
# 18  
Old 01-14-2013
Hello again and thanks.

Works just as i'd like it to. I amended one part so that I can define which columns it should translate as my csv may have additional ones before and after the bit rate data. Please let me know if there's a more efficient way of doing this.

Code:
#!/bin/ksh

if [ $# -ne 1 ] || [ ! -f "$1" ]
then    echo "Usage: $0 file" >&2
        exit 1
fi
awk '
  NR==1{
    $0="timestamp,data1_dl,data1_ul,data2_dl,data2_ul,data3_dl,data3_ul"
  }
  NR>1{
    for(i = 5; i <= 10; i++) sub("/",",",$i)
    $0=$0
    for(i = 5; i <= 10; i++){
      p=0
      if($i~/G/) p=9
      if($i~/M/) p=6
      if($i~/K/) p=3
      sub(/[GMK]/,".",$i)
      $i*=10^p
    }
  }
  1
' FS=, OFS=, "$1"

Thanks again everyone.

---------- Post updated at 10:34 AM ---------- Previous update was at 09:48 AM ----------

One more thing...Smilie

What if the bit rate data is not in sequential columns? How can I amend the script so that I tell it which columns to translate? For example:

Code:
timestamp,test1,data1,test2,data2,test3,data3
08/01/13 16:31:00,test,1M07/54K3,test,786/2K1,test,896K/1M54
08/01/13 16:31:10,test,1K23/432,test,2K44/76K,test,4M34/29K1

Note that the position of the bit rate columns doesn't always follow a pattern such as in the following example:

Code:
timestamp,test1,data1,data2,test3,data3
08/01/13 16:31:00,test,1M07/54K3,786/2K1,test,896K/1M54
08/01/13 16:31:10,test,1K23/432,2K44/76K,test,4M34/29K1

I understand that I'll have to modify the script before use but it will at least be flexible.

Thanks.
# 19  
Old 01-14-2013
Quote:
Originally Posted by shadyuk
Hello again and thanks.

Works just as i'd like it to. I amended one part so that I can define which columns it should translate as my csv may have additional ones before and after the bit rate data. Please let me know if there's a more efficient way of doing this.

Code:
#!/bin/ksh

if [ $# -ne 1 ] || [ ! -f "$1" ]
then    echo "Usage: $0 file" >&2
        exit 1
fi
awk '
  NR==1{
    $0="timestamp,data1_dl,data1_ul,data2_dl,data2_ul,data3_dl,data3_ul"
  }
  NR>1{
    for(i = 5; i <= 10; i++) sub("/",",",$i)
    $0=$0
    for(i = 5; i <= 10; i++){
      p=0
      if($i~/G/) p=9
      if($i~/M/) p=6
      if($i~/K/) p=3
      sub(/[GMK]/,".",$i)
      $i*=10^p
    }
  }
  1
' FS=, OFS=, "$1"

Thanks again everyone.
Obviously, the way I set up your output heading is incorrect since the sample input you gave us didn't match your input.

Assuming that instead of fields 2, 3, and 4 containing slash separated values that you want to convert, you now have fields 5, 6, 7, 8, 9, and 10 containing slash separated values you want to convert and that each of these fields contains a single slash character, then the 2nd for loop needs to be:
Code:
    for(i = 5; i <= 16; i++){

instead of:
Code:
    for(i = 5; i <= 10; i++){

since you're adding 6 new fields to the line in the first loop. If some of these fields don't have slashes or some of these fields contain more than one slash, you need to add the number of slashes in fields 5 through 10 to 10 for the end point for the loop. If there are a variable number of slashes in fields 5 through 10, additional logic is needed to determine the end point.

If you could set up your input so the fields that need to be modified are all at the end of the line, going back to using NF as the end point will be easier.

This would have been easier on all of us if you had given us a representative sample of what you wanted done originally instead of changing requirements every time we give you something that does what you requested!
# 20  
Old 01-14-2013
Thanks Don and noted for next time. It was an evolving thing and I only noticed the limitations in my requests as we trudged along.

Apologies for wasting anyone's time. Smilie
# 21  
Old 01-14-2013
One way might be:

Code:
if [ $# -ne 1 ] || [ ! -f "$1" ]
then    echo "Usage: $0 file" >&2
        exit 1
fi
awk '
  NR==1{
    $0="Some header"
  }
  NR>1{
    for(i=2; i<=NF; i++) if($i~/^[KMG0-9]+\/[KMG0-9]+$/) sub("/",",",$i)
    $0=$0
    for(i=2; i<=NF; i++) if($i~/^[KMG0-9]+$/){
      p=0
      if($i~/G/) p=9
      if($i~/M/) p=6
      if($i~/K/) p=3
      sub(/[GMK]/,".",$i)
      $i*=10^p
    }
  }
  1
' FS=, OFS=, "$1"


Last edited by Scrutinizer; 01-14-2013 at 11:54 AM..
These 2 Users Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Data Processing

I have below Data *************************************************** ********************BEGINNING-1******************** directive url is : https://coursera-eu.mokar.com/directives/96df29ff-176a-35f7-8b1b-4ce483d15762 Src urls are :... (8 Replies)
Discussion started by: nikhil jain
8 Replies

2. Shell Programming and Scripting

awk processing of variable number of fields data file

Hy! I need to post-process some data files which have variable (and periodic) number of fields. For example, I need to square (data -> data*data) the folowing data file: -5.34281E-28 -3.69822E-29 8.19128E-29 9.55444E-29 8.16494E-29 6.23125E-29 4.42106E-29 2.94592E-29 1.84841E-29 ... (5 Replies)
Discussion started by: radudownload
5 Replies

3. Programming

awk processing / Shell Script Processing to remove columns text file

Hello, I extracted a list of files in a directory with the command ls . However this is not my computer, so the ls functionality has been revamped so that it gives the filesizes in front like this : This is the output of ls command : I stored the output in a file filelist 1.1M... (5 Replies)
Discussion started by: ajayram
5 Replies

4. UNIX for Dummies Questions & Answers

Genomic data processing

Dear fellow members, I've just joined the forum and am a newbie to shell scripting and programming. I'm stuck on the following problem. I'm working with large scale genomic data and need to do some analyses on it. Essentially it is text processing problem, so please don't mind the scientific... (0 Replies)
Discussion started by: mvaishnav
0 Replies

5. Programming

Data processing

Hello guys! I have some issue in how to processing some data. I have some files with 3 columns. The 1st column is a name of my sample. The 2nd column is a numerical sequence (very big sequence) starting from "1". And the 3rd column is a feature of each line, represented for a number (completely... (2 Replies)
Discussion started by: bfantinatti
2 Replies

6. Shell Programming and Scripting

awk script processing data from 2 files

Hi! I have 2 files containing data that I need to process at the same time, I have problems in reading a different number of lines from the different files. Here is an explanation of what I need to do (possibly with an awk script). File "samples.txt" contains data in the format: time_instant... (6 Replies)
Discussion started by: Alice236
6 Replies

7. Shell Programming and Scripting

Help with data processing, maybe awk

I have a file, first 5 columns are very normal, like "1107",106027,71400,"Y","BIOLOGY",, however, the 6th columns, the user can put comments, anything, just any characters, like new line, double quote, single quote, whatever from the keyboard, like"Please load my previous SOM597G course content in... (3 Replies)
Discussion started by: freelong
3 Replies

8. Shell Programming and Scripting

How should i know that the process is still processing data

I have some process . How should i know that the process is still processing data or got hanged even though it is showing that it is running in background I know of a command called truss. how should i use this command and determine 1) process is still processing data 2) process got hanged... (7 Replies)
Discussion started by: ali560045
7 Replies

9. UNIX for Dummies Questions & Answers

Data File Processing Help

I need to read contents of directory and create a list of data files that match a certain pattern and process by renaming it and calling a existing .ksh script then archiving off to file another directory. Any suggestions or samples u could point me to on using .ksh perl or other to process... (5 Replies)
Discussion started by: mavsman
5 Replies

10. UNIX for Advanced & Expert Users

data processing

hi i am having a file of following kind: 20015#67143645#143123#4214 62014#67143148#67143159#456 15432#67143568#00143862#4632 54112#67143752#0067143657#143 54623#67143357#167215#34531 65446#67143785#143598#7456 75642#67143546#156146#845 24464#67143465#172532#6544... (5 Replies)
Discussion started by: rochitsharma
5 Replies
Login or Register to Ask a Question