File processing is very slow with cut command


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting File processing is very slow with cut command
# 1  
Old 01-26-2010
Data File processing is very slow with cut command

Dear All,

I am using the following script to find and replace the date format in a file. The field18 in the file has the following format: "01/26/2010 11:55:14 GMT+04:00" which I want to convert into the following format "20100126115514" for this purpose I am using the following lines of codes:
Code:
while [ 1 ]
do
  read fileLine || break
  newDate=`echo $fileLine | cut -f18 -d "|" | cut -c1-19`
  month=`echo $newDate | cut -c1-2`
  day=`echo $newDate | cut -c4-5`
  year=`echo $newDate | cut -c7-10`
  hour=`echo $newDate | cut -c12-13`
  min=`echo $newDate | cut -c15-16`
  sec=`echo $newDate | cut -c18-19`
  new_Format=`echo "$year$month$day$hour$min$sec"`
  `echo $fileLine | sed -e "s/$month\\/$day\\/$year $hour:$min:$sec GMT+04:00/$new_Format/"  >> $newFileName`
  let recordCount=$recordCount+1
  move=1
done < $fileName

The main problem is that this way of conversion taking too much time, from logs:

Record Count : 6147 Start at Tue Jan 26 18:43:23 GST 2010 End Time: Tue Jan 26 18:51:04 GST 2010

which means that almost 8 minutes for only 6147 records which is not acceptable.

Is there any way of increasing the speed or using any other solution. Any suggestion is appreciated. Thanks

Regards,

Last edited by Scott; 01-26-2010 at 11:07 AM.. Reason: Please use code tags
# 2  
Old 01-26-2010
No wonder it's slow -- you are running eighteen separate processes per loop! (stick this in the eye of anyone who thinks 'cat foo | bar' isn't a bad habit!) Working on a solution with sed...

---------- Post updated at 09:29 AM ---------- Previous update was at 09:21 AM ----------

This sed command can reorder field 18 in one process instead of eighteen. It does need backreferences though.

Code:
sed -r 's#([0-9]+)/([0-9]+)/([0-9]+) ([0-9]+):([0-9]+):([0-9]+) GMT([+-])([0-9]+):([0-9]+)#\3\1\2\4\5\6#g'

I'd also suggest only breaking if fileLine is an empty string(and setting fileLine to an empty string before you read). some shells return read-error on the very last line.
# 3  
Old 01-26-2010
corona thanks for your reply, i am very new in this therefore i did such mistake. Thanks

---------- Post updated at 09:51 PM ---------- Previous update was at 09:19 PM ----------

it is giving me an error i.e.

"sed: illegal option -- r"

can you please help?
# 4  
Old 01-26-2010
Oh great, you're running on an ugly system whose sed doesn't even have backreferences. Try 'gsed'. What system are you running?

If I'll have to build something else from scratch, seeing what your input actually is would be handy.

---------- Post updated at 01:47 PM ---------- Previous update was at 01:01 PM ----------

Occurred to me this could be done with substrings if everything's fixed width.

---------- Post updated at 02:13 PM ---------- Previous update was at 01:47 PM ----------

With pure substring matching I can do:
Code:
STR="01/26/2010 11:55:14 GMT+04:00"
echo "${STR:6:4}${STR:0:2}${STR:3:2}${STR:11:2}${STR:14:2}${STR:17:2}"


Last edited by Corona688; 01-26-2010 at 03:08 PM..
# 5  
Old 01-26-2010
Quote:
Originally Posted by bilalghazi
Dear All,

I am using the following script to find and replace the date format in a file. The field18 in the file has the following format: "01/26/2010 11:55:14 GMT+04:00" which I want to convert into the following format "20100126115514"
The main problem is that this way of conversion taking too much time
This should be a lot faster:
Code:
awk -F"|" '{
  t=substr($18,1,29)
  split(t,a,"[/ ]")
  print a[3] a[1] a[2] " "a[4] " " a[5]
}' $fileName > $newFileName


Last edited by Franklin52; 01-26-2010 at 05:34 PM.. Reason: Correcting substr function
# 6  
Old 01-27-2010
Dear Corona, while implementing your code i am getting the following error:

Suffix too large - 512 max: s/'01/26/2010

my input is a date from file in the following format "01/26/2010 12:04:52 GMT+04:00" and each line have its own time such as ...

Line 1: 01/26/2010 12:04:52 GMT+04:00
Line 2: 01/26/2010 12:14:13 GMT+04:00

---------- Post updated at 12:35 PM ---------- Previous update was at 09:59 AM ----------

bingo... i use @ in my string matching and it works without error and the time is now around 3 mins (which is not very good but as compare to previous it is great Smilie ). Thanks a lot for your help corona and frankline, really appreciate your time and efforts.

Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Processing too slow with loop

I have 2 files file 1 : contains ALINE ALINE BANG B ON A B.B.V.A. BANG AMER CORG BANG ON MORENA BANG ON MORENAIC BANG ON MORENAICA BANG ON MORENAICA CORP BANG ON MORENAICA N.A file 2 contains and is seprated by ^ delimiter : NATIO MARKET^345432534 (10 Replies)
Discussion started by: nikhil jain
10 Replies

2. UNIX for Beginners Questions & Answers

Cut command: can't make it cut fields

I'm a complete beginner in UNIX (and not a computer science student either), just undergoing a tutoring course. Trying to replicate the instructions on my own I directed output of the ls listing command (lists all files of my home directory ) to My_dir.tsv file (see the screenshot) to make use of... (9 Replies)
Discussion started by: scrutinizerix
9 Replies

3. Programming

awk processing / Shell Script Processing to remove columns text file

Hello, I extracted a list of files in a directory with the command ls . However this is not my computer, so the ls functionality has been revamped so that it gives the filesizes in front like this : This is the output of ls command : I stored the output in a file filelist 1.1M... (5 Replies)
Discussion started by: ajayram
5 Replies

4. Shell Programming and Scripting

Help with File Slow Processing

Hello, Hope you are doing fine. Let me describe the problem, I have a script that calls another script K2Test.sh, this script K2Test.sh (created by another team) takes date as argument and generates approx 1365 files in localcurves directory for given date. Out of these 1365 I am only... (11 Replies)
Discussion started by: srattani
11 Replies

5. Shell Programming and Scripting

Cut Command error cut: Bad range

Hi Can anyone what I am doing wrong while using cut command. for f in *.log do logfilename=$f Log "Log file Name: $logfilename" logfile1=`basename $logfilename .log` flength=${#logfile1} Log "file length $flength" from_length=$(($flength - 15)) Log "from... (2 Replies)
Discussion started by: dgmm
2 Replies

6. Shell Programming and Scripting

cut, sed, awk too slow to retrieve line - other options?

Hi, I have a script that, basically, has two input files of this type: file1 key1=value1_1_1 key2=value1_2_1 key4=value1_4_1 ... file2 key2=value2_2_1 key2=value2_2_2 key3=value2_3_1 key4=value2_4_1 ... My files are 10k lines big each (approx). The keys are strings that don't... (7 Replies)
Discussion started by: fzd
7 Replies

7. Shell Programming and Scripting

Using cut command in a fixed length file

Hi, I have a file which have set of rows and has to create separate files based on the id. Eg: 001_AHaris020 001_ATony030 002_AChris090 002_ASmit060 003_AJhon001 Output: I want three files like 001_A.txt, 002_A.txt and 003_A.txt. 001_A.txt should have ... (4 Replies)
Discussion started by: techmoris
4 Replies

8. SCO

Slow Processing - not matching hardware capabilities

I have been a SCO UNIX user, never an administrator...so I am stumbling around looking for information. I don't know too much about what is onboard in terms of hardware, however; I will try my best. We have SCO 5.07 and have applied MP5. We have a quad core processor with 4 250 GB... (1 Reply)
Discussion started by: atpbrownie
1 Replies

9. Shell Programming and Scripting

Cut too slow

Hi I am using a cut command in the script which is slowing down the performance of the script .can anyone suggest the other ways of doing the same cut command Record_Type=`echo "$line" | cut -c19-20` *******this is slowing down********* i have 4 more cut commands 2 in one loop and 2 in inner... (3 Replies)
Discussion started by: pukars4u
3 Replies

10. UNIX for Advanced & Expert Users

Modified dates to a file without the cut command

how can i write the modified dates of all of the files in my directory to a file. i dont want any of the other junk from ls in there. i cant use the cut command (4 Replies)
Discussion started by: cypher
4 Replies
Login or Register to Ask a Question