Sponsored Content
Top Forums Shell Programming and Scripting How to make awk command faster for large amount of data? Post 303024186 by Corona688 on Tuesday 2nd of October 2018 01:32:00 PM
Old 10-02-2018
How big are your files, how long do they take, and how fast is your disk? If you're hitting throughput limits optimizing your programs won't help one iota. If you're not, however, you can process several files at once for large gains.
This User Gave Thanks to Corona688 For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk help to make my work faster

hii everyone , i have a file in which i have line numbers.. file name is file1.txt aa bb cc "12" qw xx yy zz "23" we bb qw we "123249" jh here 12,23,123249. is the line number now according to this line numbers we have to print lines from other file named... (11 Replies)
Discussion started by: kumar_amit
11 Replies

2. Programming

Read/Write a fairly large amount of data to a file as fast as possible

Hi, I'm trying to figure out the best solution to the following problem, and I'm not yet that much experienced like you. :-) Basically I have to read a fairly large file, composed of "messages" , in order to display all of them through an user interface (made with QT). The messages that... (3 Replies)
Discussion started by: emitrax
3 Replies

3. AIX

amount of memory allocated to large page

We just set up a system to use large pages. I want to know if there is a command to see how much of the memory is being used for large pages. For example if we have a system with 8GB of RAm assigned and it has been set to use 4GB for large pages is there a command to show that 4GB of the *GB is... (1 Reply)
Discussion started by: daveisme
1 Replies

4. Shell Programming and Scripting

How to tar large amount of files?

Hello I have the following files VOICE_hhhh SUBSCR_llll DEL_kkkk Consider that there are 1000 VOICE files+1000 SUBSCR files+1000DEL files When i try to tar these files using tar -cvf backup.tar VOICE* SUBSCR* DEL* i get the error: ksh: /usr/bin/tar: arg list too long How can i... (9 Replies)
Discussion started by: chriss_58
9 Replies

5. Emergency UNIX and Linux Support

Help to make awk script more efficient for large files

Hello, Error awk: Internal software error in the tostring function on TS1101?05044400?.0085498227?0?.0011041461?.0034752266?.00397045?0?0?0?0?0?0?11/02/10?09/23/10???10?no??0??no?sct_det3_10_20110516_143936.txt What it is It is a unix shell script that contains an awk program as well as... (4 Replies)
Discussion started by: script_op2a
4 Replies

6. Shell Programming and Scripting

Running rename command on large files and make it faster

Hi All, I have some 80,000 files in a directory which I need to rename. Below is the command which I am currently running and it seems, it is taking fore ever to run this command. This command seems too slow. Is there any way to speed up the command. I have have GNU Parallel installed on my... (6 Replies)
Discussion started by: shoaibjameel123
6 Replies

7. Shell Programming and Scripting

Faster way to use this awk command

awk "/May 23, 2012 /,0" /var/tmp/datafile the above command pulls out information in the datafile. the information it pulls is from the date specified to the end of the file. now, how can i make this faster if the datafile is huge? even if it wasn't huge, i feel there's a better/faster way to... (8 Replies)
Discussion started by: SkySmart
8 Replies

8. Shell Programming and Scripting

awk changes to make it faster

I have script like below, who is picking number from one file and and searching in another file, and printing output. Bu is is very slow to be run on huge file.can we modify it with awk #! /bin/ksh while read line1 do echo "$line1" a=`echo $line1` if then echo "$num" cat file1|nawk... (6 Replies)
Discussion started by: mirwasim
6 Replies

9. Shell Programming and Scripting

Perl : Large amount of data put into an array

This basic code works. I have a very long list, almost 10000 lines that I am building into the array. Each line has either 2 or 3 fields as shown in the code snippit. The array elements are static (for a few reasons that out of scope of this question) the list has to be "built in". It... (5 Replies)
Discussion started by: sumguy
5 Replies

10. Shell Programming and Scripting

How to make awk command faster?

I have the below command which is referring a large file and it is taking 3 hours to run. Can something be done to make this command faster. awk -F ',' '{OFS=","}{ if ($13 == "9999") print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12 }' ${NLAP_TEMP}/hist1.out|sort -T ${NLAP_TEMP} |uniq>... (13 Replies)
Discussion started by: Peu Mukherjee
13 Replies
bup-memtest(1)						      General Commands Manual						    bup-memtest(1)

NAME
bup-memtest - test bup memory usage statistics SYNOPSIS
bup memtest [options...] DESCRIPTION
bup memtest opens the list of pack indexes in your bup repository, then searches the list for a series of nonexistent objects, printing memory usage statistics after each cycle. Because of the way Unix systems work, the output will usually show a large (and unchanging) value in the VmSize column, because mapping the index files in the first place takes a certain amount of virtual address space. However, this virtual memory usage is entirely virtual; it doesn't take any of your RAM. Over time, bup uses parts of the indexes, which need to be loaded from disk, and this is what causes an increase in the VmRSS column. OPTIONS
-n, --number=number set the number of objects to search for during each cycle (ie. before printing a line of output) -c, --cycles=cycles set the number of cycles (ie. the number of lines of output after the first). The first line of output is always 0 (ie. the base- line before searching for any objects). --ignore-midx ignore any .midx files created by bup midx. This allows you to compare memory performance with and without using midx. --existing search for existing objects instead of searching for random nonexistent ones. This can greatly affect memory usage and performance. Note that most of the time, bup save spends most of its time searching for nonexistent objects, since existing ones are probably in unmodified files that we won't be trying to back up anyway. So the default behaviour reflects real bup performance more accurately. But you might want this option anyway just to make sure you haven't made searching for existing objects much worse than before. EXAMPLE
$ bup memtest -n300 -c5 PackIdxList: using 1 index. VmSize VmRSS VmData VmStk 0 20824 kB 4528 kB 1980 kB 84 kB 300 20828 kB 5828 kB 1984 kB 84 kB 600 20828 kB 6844 kB 1984 kB 84 kB 900 20828 kB 7836 kB 1984 kB 84 kB 1200 20828 kB 8736 kB 1984 kB 84 kB 1500 20828 kB 9452 kB 1984 kB 84 kB $ bup memtest -n300 -c5 --ignore-midx PackIdxList: using 361 indexes. VmSize VmRSS VmData VmStk 0 27444 kB 6552 kB 2516 kB 84 kB 300 27448 kB 15832 kB 2520 kB 84 kB 600 27448 kB 17220 kB 2520 kB 84 kB 900 27448 kB 18012 kB 2520 kB 84 kB 1200 27448 kB 18388 kB 2520 kB 84 kB 1500 27448 kB 18556 kB 2520 kB 84 kB DISCUSSION
When optimizing bup indexing, the first goal is to keep the VmRSS reasonably low. However, it might eventually be necessary to swap in all the indexes, simply because you're searching for a lot of objects, and this will cause your RSS to grow as large as VmSize eventually. The key word here is eventually. As long as VmRSS grows reasonably slowly, the amount of disk activity caused by accessing pack indexes is reasonably small. If it grows quickly, bup will probably spend most of its time swapping index data from disk instead of actually running your backup, so backups will run very slowly. The purpose of bup memtest is to give you an idea of how fast your memory usage is growing, and to help in optimizing bup for better memory use. If you have memory problems you might be asked to send the output of bup memtest to help diagnose the problems. Tip: try using bup midx -a or bup midx -f to see if it helps reduce your memory usage. Trivia: index memory usage in bup (or git) is only really a problem when adding a large number of previously unseen objects. This is because for each object, we need to absolutely confirm that it isn't already in the database, which requires us to search through all the existing pack indexes to ensure that none of them contain the object in question. In the more obvious case of searching for objects that do exist, the objects being searched for are typically related in some way, which means they probably all exist in a small number of pack- files, so memory usage will be constrained to just those packfile indexes. Since git users typically don't add a lot of files in a single run, git doesn't really need a program like bup midx. bup, on the other hand, spends most of its time backing up files it hasn't seen before, so its memory usage patterns are different. SEE ALSO
bup-midx(1) BUP
Part of the bup(1) suite. AUTHORS
Avery Pennarun <apenwarr@gmail.com>. Bup unknown- bup-memtest(1)
All times are GMT -4. The time now is 09:42 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy