Sponsored Content
Top Forums Shell Programming and Scripting what is the most effective way to process a large logfile? Post 302096241 by rajeeb_d on Tuesday 14th of November 2006 02:18:55 PM
Old 11-14-2006
Just an idea

Why don't you split the file into small files of 1GB each. Then use Pederarbo's awk script to go through each one of the split files. And Awk being a stream editor there can be nothing faster to work on data than working on data streams.
After you are done with the cleansing of the files you could append them into a single file.

About splitting the files is just an idea and might save because you would be handling small sets of data flowing in one continuous stream than one large one of 10GB.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

most effective search ?

what's the most efficient and effective search for a file in a dir ? I see many guys use this # find - print or something as such ? and sometimes pipe it to something else ? Is there a better way of using "grep" in all of this ? thanks simon2000 (3 Replies)
Discussion started by: simon2000
3 Replies

2. Shell Programming and Scripting

get last section from large logfile

Hi all, I need to pull the (last section) from the header to the end of file. This logfile will have many sections appended, but I only need to capture the last one and email it to someone. Any ideas? Thanks Kathy Example Logfile: ============= = BOX : SAPCFI = JOB : SAPCFI =... (9 Replies)
Discussion started by: kburrows
9 Replies

3. Shell Programming and Scripting

Scripting the process to edit a large file

Hi, I need to make a script to edit a file. File is a large file in below format Version: 2008120101 ;$INCLUDE ./abc/xyz/Delhi ;$INCLUDE ./abc/xyz/London $INCLUDE ./abc/xyz/New York First line in the file is version number which is in year,month,date and serial number format. Each... (5 Replies)
Discussion started by: makkar4u
5 Replies

4. Red Hat

Fork wait in background process - large delay

hi all, We are trying to run a process in the background and in the process we call fork ;and wait for the child process to finish .We find that the died = wait(&status); happens after 10 seconds randomly and sometimes completes in time (within 1 sec) This behavior is seen only when the... (0 Replies)
Discussion started by: vishnu.priya
0 Replies

5. Red Hat

Fork wait in background process - large delay

hi all, We are trying to run a process in the background and in the process we call fork ;and wait for the child process to finish .We find that the died = wait(&status); happens after 10 seconds randomly and sometimes completes in time (within 1 sec) This behavior is seen only when the... (1 Reply)
Discussion started by: vishnu.priya
1 Replies

6. Shell Programming and Scripting

how to track a log file of one process and mail that logfile to a group?

Hi Friendz, I have 14 DB load scripts say 1,2,3....14. I want a script to call each script automatically, and after completion of every script, it needs to track the logfile and mail that log file to a group.and again it should run the next script in sequence 1,2,3...14 Please help me,need... (1 Reply)
Discussion started by: shirdi
1 Replies

7. UNIX for Dummies Questions & Answers

Real and Effective IDs

Can anyone explain me in details of Real and Effective IDs (6 Replies)
Discussion started by: kkalyan
6 Replies

8. Cybersecurity

Password rules not effective

I was looking for a good list of words to exclude people from using as passwords, i.e. those that could be guessed easily. I'm working through a whole bunch of suggestions from skullsecurity.org, but I managed to find this page that seems to suggest I have more options than I thought. :b: I... (1 Reply)
Discussion started by: rbatte1
1 Replies

9. Shell Programming and Scripting

Logfile monitoring with logfile replacement

Bonjour, I've wrote a script to monitor a logfile in realtime. It is working almost perfeclty except for two things. The script use the following technique : tail -fn0 $logfile | \ while read line ; do ... some stuff done First one, I'd like a way to end the monitoring script if a... (3 Replies)
Discussion started by: Warluck
3 Replies

10. Shell Programming and Scripting

Process multiple large files with awk

Hi there, I'm camor and I'm trying to process huge files with bash scripting and awk. I've got a dataset folder with 10 files (16 millions of row each one - 600MB), and I've got a sorted file with all keys inside. For example: a sample_1 200 a.b sample_2 10 a sample_3 10 a sample_1 10 a... (4 Replies)
Discussion started by: camor
4 Replies
SPLIT(1)						    BSD General Commands Manual 						  SPLIT(1)

NAME
split -- split a file into pieces SYNOPSIS
split [-a suffix_length] [-b byte_count[k|m] | -l line_count -n chunk_count] [file [name]] DESCRIPTION
The split utility reads the given file and breaks it up into files of 1000 lines each. If file is a single dash or absent, split reads from the standard input. file itself is not altered. The options are as follows: -a Use suffix_length letters to form the suffix of the file name. -b Create smaller files byte_count bytes in length. If 'k' is appended to the number, the file is split into byte_count kilobyte pieces. If 'm' is appended to the number, the file is split into byte_count megabyte pieces. -l Create smaller files line_count lines in length. -n Split file into chunk_count smaller files. If additional arguments are specified, the first is used as the name of the input file which is to be split. If a second additional argument is specified, it is used as a prefix for the names of the files into which the file is split. In this case, each file into which the file is split is named by the prefix followed by a lexically ordered suffix using suffix_length characters in the range ``a-z''. If -a is not speci- fied, two letters are used as the suffix. If the name argument is not specified, 'x' is used. STANDARDS
The split utility conforms to IEEE Std 1003.1-2001 (``POSIX.1''). HISTORY
A split command appeared in Version 6 AT&T UNIX. The -a option was introduced in NetBSD 2.0. Before that, if name was not specified, split would vary the first letter of the filename to increase the number of possible output files. The -a option makes this unnecessary. BSD
May 28, 2007 BSD
All times are GMT -4. The time now is 05:34 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy