Sponsored Content
Top Forums Shell Programming and Scripting How to make awk command faster for large amount of data? Post 303024250 by Don Cragun on Thursday 4th of October 2018 02:06:13 PM
Old 10-04-2018
Expanding on what Corona688 said, your two commands:
Code:
zcat file1.gz | head -n 1
zcat file1.gz | tail -n 1

decompress the file twice (maybe not completing the first decompression), and if you find that there is some data in that file that you need, you'll then decompress it again for your awk script to process.

I would strongly suggest creating a separate text file that contains the timestamp of the first record in each compressed file and the name of that compressed file. (And, add a new entry to the end of that file each time you create a new compress log file.) Then you can look at that (uncompressed) text file to quickly determine which compressed file(s) you need to uncompress and feed to your awk script to get the records you want for any particular timestamp range.
This User Gave Thanks to Don Cragun For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk help to make my work faster

hii everyone , i have a file in which i have line numbers.. file name is file1.txt aa bb cc "12" qw xx yy zz "23" we bb qw we "123249" jh here 12,23,123249. is the line number now according to this line numbers we have to print lines from other file named... (11 Replies)
Discussion started by: kumar_amit
11 Replies

2. Programming

Read/Write a fairly large amount of data to a file as fast as possible

Hi, I'm trying to figure out the best solution to the following problem, and I'm not yet that much experienced like you. :-) Basically I have to read a fairly large file, composed of "messages" , in order to display all of them through an user interface (made with QT). The messages that... (3 Replies)
Discussion started by: emitrax
3 Replies

3. AIX

amount of memory allocated to large page

We just set up a system to use large pages. I want to know if there is a command to see how much of the memory is being used for large pages. For example if we have a system with 8GB of RAm assigned and it has been set to use 4GB for large pages is there a command to show that 4GB of the *GB is... (1 Reply)
Discussion started by: daveisme
1 Replies

4. Shell Programming and Scripting

How to tar large amount of files?

Hello I have the following files VOICE_hhhh SUBSCR_llll DEL_kkkk Consider that there are 1000 VOICE files+1000 SUBSCR files+1000DEL files When i try to tar these files using tar -cvf backup.tar VOICE* SUBSCR* DEL* i get the error: ksh: /usr/bin/tar: arg list too long How can i... (9 Replies)
Discussion started by: chriss_58
9 Replies

5. Emergency UNIX and Linux Support

Help to make awk script more efficient for large files

Hello, Error awk: Internal software error in the tostring function on TS1101?05044400?.0085498227?0?.0011041461?.0034752266?.00397045?0?0?0?0?0?0?11/02/10?09/23/10???10?no??0??no?sct_det3_10_20110516_143936.txt What it is It is a unix shell script that contains an awk program as well as... (4 Replies)
Discussion started by: script_op2a
4 Replies

6. Shell Programming and Scripting

Running rename command on large files and make it faster

Hi All, I have some 80,000 files in a directory which I need to rename. Below is the command which I am currently running and it seems, it is taking fore ever to run this command. This command seems too slow. Is there any way to speed up the command. I have have GNU Parallel installed on my... (6 Replies)
Discussion started by: shoaibjameel123
6 Replies

7. Shell Programming and Scripting

Faster way to use this awk command

awk "/May 23, 2012 /,0" /var/tmp/datafile the above command pulls out information in the datafile. the information it pulls is from the date specified to the end of the file. now, how can i make this faster if the datafile is huge? even if it wasn't huge, i feel there's a better/faster way to... (8 Replies)
Discussion started by: SkySmart
8 Replies

8. Shell Programming and Scripting

awk changes to make it faster

I have script like below, who is picking number from one file and and searching in another file, and printing output. Bu is is very slow to be run on huge file.can we modify it with awk #! /bin/ksh while read line1 do echo "$line1" a=`echo $line1` if then echo "$num" cat file1|nawk... (6 Replies)
Discussion started by: mirwasim
6 Replies

9. Shell Programming and Scripting

Perl : Large amount of data put into an array

This basic code works. I have a very long list, almost 10000 lines that I am building into the array. Each line has either 2 or 3 fields as shown in the code snippit. The array elements are static (for a few reasons that out of scope of this question) the list has to be "built in". It... (5 Replies)
Discussion started by: sumguy
5 Replies

10. Shell Programming and Scripting

How to make awk command faster?

I have the below command which is referring a large file and it is taking 3 hours to run. Can something be done to make this command faster. awk -F ',' '{OFS=","}{ if ($13 == "9999") print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12 }' ${NLAP_TEMP}/hist1.out|sort -T ${NLAP_TEMP} |uniq>... (13 Replies)
Discussion started by: Peu Mukherjee
13 Replies
GZIP(1) 						    BSD General Commands Manual 						   GZIP(1)

NAME
gzip -- compression/decompression tool using Lempel-Ziv coding (LZ77) SYNOPSIS
gzip [-cdfhkLlNnqrtVv] [-S suffix] file [file [...]] gunzip [-cfhkLNqrtVv] [-S suffix] file [file [...]] zcat [-fhV] file [file [...]] DESCRIPTION
The gzip program compresses and decompresses files using Lempel-Ziv coding (LZ77). If no files are specified, gzip will compress from stan- dard input, or decompress to standard output. When in compression mode, each file will be replaced with another file with the suffix, set by the -S suffix option, added, if possible. In decompression mode, each file will be checked for existence, as will the file with the suffix added. Each file argument must contain a separate complete archive; when multiple files are indicated, each is decompressed in turn. In the case of gzcat the resulting data is then concatenated in the manner of cat(1). If invoked as gunzip then the -d option is enabled. If invoked as zcat or gzcat then both the -c and -d options are enabled. When invoked as zcat, ``.Z'' will be appended to all filenames that do not have that suffix. This version of gzip is also capable of decompressing files compressed using compress(1) or bzip2(1). OPTIONS
The following options are available: -1, --fast -2, -3, -4, -5, -6, -7, -8 -9, --best These options change the compression level used, with the -1 option being the fastest, with less compression, and the -9 option being the slowest, with optimal compression. The default compression level is 6. -c, --stdout, --to-stdout This option specifies that output will go to the standard output stream, leaving files intact. -d, --decompress, --uncompress This option selects decompression rather than compression. -f, --force This option turns on force mode. This allows files with multiple links, symbolic links to regular files, overwriting of pre-existing files, reading from or writing to a terminal, and when combined with the -c option, allowing non-compressed data to pass through unchanged. -h, --help This option prints a usage summary and exits. -k, --keep Keep (don't delete) input files during compression or decompression. -L, --license This option prints gzip license. -l, --list This option displays information about the file's compressed and uncompressed size, ratio, uncompressed name. With the -v option, it also displays the compression method, CRC, date and time embedded in the file. -N, --name This option causes the stored filename in the input file to be used as the output file. -n, --no-name This option stops the filename and timestamp from being stored in the output file. -q, --quiet With this option, no warnings or errors are printed. -r, --recursive This option is used to gzip the files in a directory tree individually, using the fts(3) library. -S suffix, --suffix suffix This option changes the default suffix from .gz to suffix. -t, --test This option will test compressed files for integrity. -V, --version This option prints the version of the gzip program. -v, --verbose This option turns on verbose mode, which prints the compression ratio for each file compressed. ENVIRONMENT
If the environment variable GZIP is set, it is parsed as a white-space separated list of options handled before any options on the command line. Options on the command line will override anything in GZIP. SEE ALSO
bzip2(1), compress(1), xz(1), fts(3), zlib(3), compat(5) HISTORY
The gzip program was originally written by Jean-loup Gailly, licensed under the GNU Public Licence. Matthew R. Green wrote a simple front end for NetBSD 1.3 distribution media, based on the freely re-distributable zlib library. It was enhanced to be mostly feature-compatible with the original GNU gzip program for NetBSD 2.0. This implementation of gzip was ported based on the NetBSD gzip, and first appeared in FreeBSD 7.0. AUTHORS
This implementation of gzip was written by Matthew R. Green <mrg@eterna.com.au> with unpack support written by Xin LI <delphij@FreeBSD.org>. BUGS
According to RFC 1952, the recorded file size is stored in a 32-bit integer, therefore, it can not represent files larger than 4GB. This limitation also applies to -l option of gzip utility. BSD
October 9, 2011 BSD
All times are GMT -4. The time now is 06:43 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy