Sponsored Content
Top Forums Shell Programming and Scripting How to make awk command faster for large amount of data? Post 303024246 by brenoasrm on Thursday 4th of October 2018 11:34:49 AM
Old 10-04-2018
Quote:
Originally Posted by Corona688
How big are your files, how long do they take, and how fast is your disk? If you're hitting throughput limits optimizing your programs won't help one iota. If you're not, however, you can process several files at once for large gains.

File sizes varies, for instance, some files have ~300MB, others ~1GB or ~2GB.
My disk is not operating in full capacity while running awk, so I think your solution to read multiple files at once will benefit me.

Although I can't tell what's the datetime just by file name, I can leverage the fact that log files are ordered to choose what files i'll be reading, like it was suggested here. I'll make an exampĺe to illustrate.

I have 832 files in one directory totalizing 100GB, let's say nginx1.gz, nginx2.gz, ..., nginx832.gz.
first line of nginx1.gz has [11/Jul/2018:18:00:01 and the last line [11/Jul/2018:21:00:01
first line of nginx2.gz also has [11/Jul/2018:18:00:01 and the last line [11/Jul/2018:21:00:01

The natural would be nginx2.gz start with the same time or later than nginx1.gz
I could do what you've suggested with this code:

Code:
        gunzip < "$FILE" | awk '$3 > "[20/Jun/2018:22:00:00" { exit } ; {...}'

, but I would avoid to read just 2 hours of logs. So, I thought of reading the first and last line of each file and decide whether I would read the file or not. For instance,
1st line datetime would be in: zcat file1.gz | head -n 1
last line datetime would be in: zcat file1.gz | tail -n 1

So, there are two cases when I could skip reading the files:
1 - When 1st and last line is before the time I want
2 - When 1st line is after the time I want (like you've suggested)

This way I think it'll be much faster. I've finish this modification and I'm testing it right now.
If everything is allright I'll test your solution of reading multiple files at once and give you the feedback of the results.

Thank you very much for your help, I didn't know about wait and it'll probably help me.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk help to make my work faster

hii everyone , i have a file in which i have line numbers.. file name is file1.txt aa bb cc "12" qw xx yy zz "23" we bb qw we "123249" jh here 12,23,123249. is the line number now according to this line numbers we have to print lines from other file named... (11 Replies)
Discussion started by: kumar_amit
11 Replies

2. Programming

Read/Write a fairly large amount of data to a file as fast as possible

Hi, I'm trying to figure out the best solution to the following problem, and I'm not yet that much experienced like you. :-) Basically I have to read a fairly large file, composed of "messages" , in order to display all of them through an user interface (made with QT). The messages that... (3 Replies)
Discussion started by: emitrax
3 Replies

3. AIX

amount of memory allocated to large page

We just set up a system to use large pages. I want to know if there is a command to see how much of the memory is being used for large pages. For example if we have a system with 8GB of RAm assigned and it has been set to use 4GB for large pages is there a command to show that 4GB of the *GB is... (1 Reply)
Discussion started by: daveisme
1 Replies

4. Shell Programming and Scripting

How to tar large amount of files?

Hello I have the following files VOICE_hhhh SUBSCR_llll DEL_kkkk Consider that there are 1000 VOICE files+1000 SUBSCR files+1000DEL files When i try to tar these files using tar -cvf backup.tar VOICE* SUBSCR* DEL* i get the error: ksh: /usr/bin/tar: arg list too long How can i... (9 Replies)
Discussion started by: chriss_58
9 Replies

5. Emergency UNIX and Linux Support

Help to make awk script more efficient for large files

Hello, Error awk: Internal software error in the tostring function on TS1101?05044400?.0085498227?0?.0011041461?.0034752266?.00397045?0?0?0?0?0?0?11/02/10?09/23/10???10?no??0??no?sct_det3_10_20110516_143936.txt What it is It is a unix shell script that contains an awk program as well as... (4 Replies)
Discussion started by: script_op2a
4 Replies

6. Shell Programming and Scripting

Running rename command on large files and make it faster

Hi All, I have some 80,000 files in a directory which I need to rename. Below is the command which I am currently running and it seems, it is taking fore ever to run this command. This command seems too slow. Is there any way to speed up the command. I have have GNU Parallel installed on my... (6 Replies)
Discussion started by: shoaibjameel123
6 Replies

7. Shell Programming and Scripting

Faster way to use this awk command

awk "/May 23, 2012 /,0" /var/tmp/datafile the above command pulls out information in the datafile. the information it pulls is from the date specified to the end of the file. now, how can i make this faster if the datafile is huge? even if it wasn't huge, i feel there's a better/faster way to... (8 Replies)
Discussion started by: SkySmart
8 Replies

8. Shell Programming and Scripting

awk changes to make it faster

I have script like below, who is picking number from one file and and searching in another file, and printing output. Bu is is very slow to be run on huge file.can we modify it with awk #! /bin/ksh while read line1 do echo "$line1" a=`echo $line1` if then echo "$num" cat file1|nawk... (6 Replies)
Discussion started by: mirwasim
6 Replies

9. Shell Programming and Scripting

Perl : Large amount of data put into an array

This basic code works. I have a very long list, almost 10000 lines that I am building into the array. Each line has either 2 or 3 fields as shown in the code snippit. The array elements are static (for a few reasons that out of scope of this question) the list has to be "built in". It... (5 Replies)
Discussion started by: sumguy
5 Replies

10. Shell Programming and Scripting

How to make awk command faster?

I have the below command which is referring a large file and it is taking 3 hours to run. Can something be done to make this command faster. awk -F ',' '{OFS=","}{ if ($13 == "9999") print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12 }' ${NLAP_TEMP}/hist1.out|sort -T ${NLAP_TEMP} |uniq>... (13 Replies)
Discussion started by: Peu Mukherjee
13 Replies
PASTE(1)						      General Commands Manual							  PASTE(1)

NAME
paste - paste multiple files together SYNOPSIS
paste [-s] [-d list] file... OPTIONS
-d Set delimiter used to separate columns to list. -s Print files sequentially, file k on line k. EXAMPLES
paste file1 file2 # Print file1 in col 1, file2 in col 2 paste -s f1 f2 # Print f1 on line 1 and f2 on line 2 paste -d : file1 file2 # Print the lines separated by a colon DESCRIPTION
Paste concatenates corresponding lines of the given input files and writes them to standard output. The lines of the different files are separated by the delimiters given with the option -s. If no list is given, a tab is substituted for every linefeed, except the last one. If end-of-file is hit on an input file, subsequent lines are empty. Suppose a set of k files each has one word per line. Then the paste output will have k columns, with the contents of file j in column j. If the -s flag is given, then the first file is on line 1, the second file on line 2, etc. In effect, -s turns the output sideways. If a list of delimiters is given, they are used in turn. The C escape sequences , , \, and are used for linefeed, tab, backslash, and the null string, respectively. PASTE(1)
All times are GMT -4. The time now is 10:45 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy