Sed or awk script to remove text / or perform calculations from large CSV files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sed or awk script to remove text / or perform calculations from large CSV files
# 1  
Old 06-17-2009
Question Sed or awk script to remove text / or perform calculations from large CSV files

I have a large CSV files (e.g. 2 million records) and am hoping to do one of two things. I have been trying to use awk and sed but am a newbie and can't figure out how to get it to work. Any help you could offer would be greatly appreciated - I'm stuck trying to remove the colon and wildcards in sed, and the average sample I've found using awk is giving me values of around 4e08.

The CSV file looks like this:

Code:
Date,AIRCOMPRESSOR\FLARE_FLOW,AIRCOMPRESSOR\FLARE_TEMP
3/1/2008,1044.83215332,1090.88208008
3/1/2008 12:00:10 AM,1044.83215332,1090.88208008
3/1/2008 12:00:21 AM,1046.71142578,1090.88208008
3/1/2008 12:00:31 AM,1044.83215332,1090.88208008
3/1/2008 12:00:41 AM,1048.59057617,1083.96069336
3/1/2008 12:00:51 AM,1044.83215332,1083.96069336

I am hoping to either use sed or another script to remove the seconds portion of the data lines (i.e. remove ":10 AM" and all similar occurrences, or preferably to use awk to average the flow rates for each minute or each 15 minutes (i.e. the column right after the time).

Thanks in advance for any help you can offer.
# 2  
Old 06-17-2009
something to start with:
Code:
nawk -F, '$1~":" {match($1,"\:[^:]*$"); $1=substr($1,1,RSTART-1)}1' OFS=, myFile

# 3  
Old 06-17-2009
Thanks for your prompt response, but it looks like I don't have nawk (I'm running Mac OS X). I'll see if I can get it through MacPorts and try again, but if there's any help that can be offered using awk, sed, or tr I know that I have those at my disposal.

EDIT: Installed nawk, and it worked like a charm. Thank you very much.

Last edited by metronomadic; 06-17-2009 at 03:07 PM..
# 4  
Old 06-17-2009
try 'awk' instead of 'nawk'.
# 5  
Old 06-17-2009
To remove the second 12:21:10 use the below sed:

Code:
sed 's/.*:\([^,*]*\) AM/\1/g' file.txt

to get the to total use:-


Code:
awk ' BEGIN{c=0} {a[$1]+=$2;b[$1]+=$3;c++} END{for (i in a) {print "Total", a[i]/c,b[i]/c} ' file.txt

BR

Last edited by ahmad.diab; 06-17-2009 at 03:48 PM..
# 6  
Old 06-17-2009
Quote:
Originally Posted by ahmad.diab
To remove the second 12:21:10 use the below sed:

Code:
sed 's/.*:\([^,*]*\) AM/\1/g' file.txt

to get the to total use:-


Code:
awk ' BEGIN{c=0} {a[$1]+=$2;b[$1]+=$3;c++} END{for (i in a) {print "Total", a[i]/c,b[i]/c} ' file.txt

BR

Thanks Ahmad. I tried the awk code (which I think needs an extra } to close out the for loop?), but I think that might be calculating something else. I am trying to get the average flow (column three) for each minute (or each 15 minute span) of each day. I am not sure I understand the code, but from the output it looks like it is gathering each days worth of records, and dividing them by the number of days?

I don't mean to be a bother, but can you tell me if this is what is going on?
# 7  
Old 06-17-2009
Quote:
Originally Posted by metronomadic
Thanks Ahmad. I tried the awk code (which I think needs an extra } to close out the for loop?), but I think that might be calculating something else. I am trying to get the average flow (column three) for each minute (or each 15 minute span) of each day. I am not sure I understand the code, but from the output it looks like it is gathering each days worth of records, and dividing them by the number of days?

I don't mean to be a bother, but can you tell me if this is what is going on?
sorry Smilie kindly add the bold string below:-

awk -F"," ' BEGIN{c=0} {a[$1]+=$2;b[$1]+=$3;c++} END{for (i in a) {print "Total", a[i]/c,b[i]/c} ' file.txt
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Filter on one column and then perform conditional calculations on another column with a Linux script

Hi, I have a file (stats.txt) with columns like in the example below. Destination IP address, timestamp, TCP packet sequence number and packet length. destIP time seqNo packetLength 1.2.3.4 0.01 123 500 1.2.3.5 0.03 44 1500 1.3.2.5 0.08 44 1500 1.2.3.4 0.44... (12 Replies)
Discussion started by: Zooma
12 Replies

2. Shell Programming and Scripting

awk script to perform an action similar to vlookup between two csv files in UNIX

Hi, I am new to awk/unix and am trying to put together an awk script to perform an action similar to vlookup between the two csv files. Here are the contents of the two files: File 1: Date,ParentID,Number,Area,Volume,Dimensions 2014-01-01,ABC,247,83430.33,857.84,8110.76... (9 Replies)
Discussion started by: Prit Siv
9 Replies

3. Emergency UNIX and Linux Support

Help to make awk script more efficient for large files

Hello, Error awk: Internal software error in the tostring function on TS1101?05044400?.0085498227?0?.0011041461?.0034752266?.00397045?0?0?0?0?0?0?11/02/10?09/23/10???10?no??0??no?sct_det3_10_20110516_143936.txt What it is It is a unix shell script that contains an awk program as well as... (4 Replies)
Discussion started by: script_op2a
4 Replies

4. Shell Programming and Scripting

shell script to take input from a text file and perform check on each servers and copy files

HI all, I want to script where all the server names will be in a text file like server1 server2 server3 . and the script should take servernames from a text file and perform copy of files if the files are not present on those servers.after which it should take next servername till the end of... (0 Replies)
Discussion started by: joseph.dmello
0 Replies

5. Programming

awk script to convert a text file into csv format

hi...... thanks for allowing me to start a discussion i am collecting usb usage details of all users and convert it into csv files so that i can export it into some database.. the input text file is as follows:- USB History Dump by nabiy (c)2008 (1) --- Kingston DataTraveler 130 USB... (2 Replies)
Discussion started by: certteam
2 Replies

6. Shell Programming and Scripting

Need help with a script to process a CSV file using SED and AWK

I get a CSV file every day with 2 columns and multiple rows ex: date1,date2 ( both the fields are varchar fields) This data has to be updated in a table which is being done manually and i want to automate that. 1. I have to select all the data from the prod table( 2 columns { date1,date2}) into... (4 Replies)
Discussion started by: kkb
4 Replies

7. Shell Programming and Scripting

Updating a line in a large csv file, with sed/awk?

I have an extremely large csv file that I need to search the second field, and upon matches update the last field... I can pull the line with awk.. but apparently you cant use awk to directly update the file? So im curious if I can use sed to do this... The good news is the field I want to... (5 Replies)
Discussion started by: trey85stang
5 Replies

8. Shell Programming and Scripting

Remove text from a csv file using sed

I am trying to remove the ita from this file: "1234ita","john","smith" "56789ita","jim","thomas" the command i am using is: sed '/^ita/d' infile.csv > outfile.csv it is running but doing nothing, very rarely use sed so trying to learn it any help would be appreciated (2 Replies)
Discussion started by: Pablo_beezo
2 Replies

9. Shell Programming and Scripting

calculations in awk/sed

Hi friends, I am in a little confusion. I have a comma separated file contains hugh records. say, 88562848,21-JAN-08,2741079, -1188,-7433,TESTING 88558314,21-JAN-08,2741189, -1273,-7976,TESTING I want 4th and 5th field to be multiplied by 100. like, 88562848,21-JAN-08,2741079,... (3 Replies)
Discussion started by: clx
3 Replies

10. Shell Programming and Scripting

How to perform calculations using numbers greater than 2150000000.

Could someone tell me how to perform calculations using numbers greater than 2150000000 in Korn Shell? When I tried to do it it gave me the wrong answer. e.g. I have a ksh file with the contents below: --------------------------------- #!/bin/ksh SUM=`expr 2150000000 + 2` PRODUCT=`expr... (3 Replies)
Discussion started by: stevefox
3 Replies
Login or Register to Ask a Question