Performance issue in UNIX while generating .dat file from large text file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Performance issue in UNIX while generating .dat file from large text file
# 15  
Old 05-01-2009
Quote:
Originally Posted by durden_tyler
Try using perl. It was designed for fast text processing.

So was awk, and it is much easier to learn, and awk scripts are much easier to understand.
# 16  
Old 05-01-2009
As far as I can see this section of code reads all the output data files to find out if they contain a '*/' and then appends a '*/' if there isn't one present.

Quote:
for m_file_name in `echo ${m_a_d92_files[*]}`
do
if [[ `grep "*/" ${m_file_name} | wc -l` = 0 ]]
then
echo "*/" >> ${m_file_name}
fi
done
Earlier in the script we apparently ignored the last line '*/' in the input stream (not proven that that bit of code works).

Providing that '*/' was properly ignored in the input stream (an area of the script which could be improved by using grep -v \^'*/' instead of the very first cat) it is impossible for a '*/' to appear in any of the output files. We can therefore halve the run time by not re-reading the output data before appending the '*/'.

Code:
for m_file_name in `echo ${m_a_d92_files[*]}`
do
                echo "*/" >> ${m_file_name}
done

Untested.
# 17  
Old 05-02-2009
Quote:
Originally Posted by methyl
Code:
for m_file_name in `echo ${m_a_d92_files[*]}`


Using echo is unnecessary and will break the script if any member of m_a_d92_files[*] contains whitespace.
Code:
for m_file_name in "${m_a_d92_files[@]}"

# 18  
Old 05-03-2009
cafjohnson.
Agreed. The script has many areas which could be improved. Apparently the script works with the files provided, but takes too long.
I looked at whether the "card dealing" method for splitting the data could be improved without using a high level language, but there is insufficient information about the data type distribution and no rules stated about the processing order of the data. As far as I can see the core script is slow because it appends to multiple output files.
# 19  
Old 05-03-2009
Quote:
Originally Posted by methyl
cafjohnson.
As far as I can see the core script is slow because it appends to multiple output files.

The script is slow because you are using the shell on a very large file; that is exacerbated by a number of inefficient constructs and poorly written code.

If I knew exactly what you are trying to do, I could suggest an awk script.

Last edited by cfajohnson; 05-03-2009 at 02:54 PM..
# 20  
Old 05-20-2009
pls post few records of the input file to help you .
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Generating xml file from UNIX

i have a unix script which generates the csv file. the data in csv file is dynamic. how can i convert/move the data from csv file to xml. please suggest (1 Reply)
Discussion started by: archana25
1 Replies

2. Answers to Frequently Asked Questions

How to split a dat file based on another file ni UNIX?

i have two files , one is var.txt and another res.dat file var.txt contains informaton like below date,request,sales,item 20171015,1,123456,216 20171015,1,123456,217 20171015,2,345678,214 20171015,3,456789,218 and res.dat contains is a one huge file contains information like... (1 Reply)
Discussion started by: pogo
1 Replies

3. UNIX for Dummies Questions & Answers

Generating a CSV file from a text file

Hi Guys, I have a simple request. I have a file in w3c format. Each file has 2 header lines. Rest of the lines are 16 columns each. They are separated by Tab. I need to discard the first 2 lines and then write each column of the txt file into a seperate column of CSV. I tried the command below... (1 Reply)
Discussion started by: tinkugadu
1 Replies

4. Shell Programming and Scripting

Performance issue in Grepping large files

I have around 300 files(*.rdf,*.fmb,*.pll,*.ctl,*.sh,*.sql,*.prog) which are of large size. Around 8000 keywords(which will be in the file $keywordfile) needed to be searched inside those files. If a keyword is found in a file..I have to insert the filename,extension,catagoery,keyword,occurrence... (8 Replies)
Discussion started by: millan
8 Replies

5. Shell Programming and Scripting

Remove <CR><LF> from the dat file in unix

Hi, The source system has created the file in the dat format and put into the linux directory as mentioned below. I want to do foloowing things. a) Delete the Line started with <CR><LF> in the record b)Also line ...........................................................<CR><LF> ... (1 Reply)
Discussion started by: mr_harish80
1 Replies

6. Shell Programming and Scripting

Severe performance issue while 'grep'ing on large volume of data

Background ------------- The Unix flavor can be any amongst Solaris, AIX, HP-UX and Linux. I have below 2 flat files. File-1 ------ Contains 50,000 rows with 2 fields in each row, separated by pipe. Row structure is like Object_Id|Object_Name, as following: 111|XXX 222|YYY 333|ZZZ ... (6 Replies)
Discussion started by: Souvik
6 Replies

7. UNIX for Dummies Questions & Answers

How do I delete a data string from a .dat file in unix

I have a .dat file in unix and it keeps failing file validation on line x. How do I delete a data string from a .dat file in UNIX? I tried the following: sed -e 'data string' -e file name and it telling me unrecognized command (4 Replies)
Discussion started by: supergirl3954
4 Replies

8. Shell Programming and Scripting

How to read from a .dat file in Unix

Hi All, I have a .dat file named test.dat where I have stored some process IDs. Now I need to pick a process ID, one by one and then fire kill -9 for each of those. The logic should be: 1. open file <filename.dat> 2. read until last line of file 3. if process ID is found fire kill -9... (5 Replies)
Discussion started by: Sibasish
5 Replies

9. Shell Programming and Scripting

How to attach an excel file/ dat file thru unix mails

Hi. I want to attach a .xls or .dat file while sending mail thru unix. I have come across diff attachments sending options, but allthose embeds the content in the mail. I want the attachement to be send as such. Please help me out. regards Diwakar (1 Reply)
Discussion started by: diwakar82
1 Replies

10. UNIX for Dummies Questions & Answers

Unix File System performance with large directories

Hi, how does the Unix File System perform with large directories (containing ~30.000 files)? What kind of structure is used for the organization of a directory's content, linear lists, (binary) trees? I hope the description 'Unix File System' is exact enough, I don't know more about the file... (3 Replies)
Discussion started by: dive
3 Replies
Login or Register to Ask a Question