awk is designed to process big flatfiles, it should do well here. Thousands of files of thousands of lines comes to millions of lines, which is not unfeasible.
There's always ways to make things faster, of course. Will you be doing lots of things like
If so, you could do awk '{...}' file1 file2 file3 file4 ... > output
...which would save a lot of time since file1 wouldn't need to be read thousands of times, just once.
If you need different files for each one that's still possible, but would need changes to the code.
Last edited by Corona688; 07-17-2013 at 12:37 PM..
how can i use awk or sed to do a conditional statement, so that
HH:MM
if MM not great than 30 , then MM=00
else MM=30
ie:
10:34 will display 10:30
10:29 will display 10:00
a=$(echo 10:34 | awk ......)
Thanks in advance (10 Replies)
Hello guys,
I want to make a conditional cause in the following file using awk:
awk '{ if ($2 != 0) print $1, $2, $3}' test.csv > test2.csv
FILE EXAMPLE = test.csv
string,number,date
abc,0,20050101
def,1,20060101
ghi,2,20040101
jkl,12,20090101
mno,123,20020101 ... (2 Replies)
Hi Guys,
i have this files:
xyz20080716.log
opqrs20080716.log
abcdef20080716.log
xyz20080717.log
oprs20080717.log
abcde20080717.log
currentdate: 20080717.log
I want to make script to zip the file for past day. Can anyone help for this? i've just learn awk scripting & still confused with... (3 Replies)
Dear all,
I want to use awk to read the three columns in a file called "test" and change them to ( 1.5 1.5 1.5) if any element is found to be greater than three. A part of the file is shown below:
(0.478318 0.391032 -0.14054)
(0.45562 0.392523 -0.121685)
(0.437204 0.392811 -0.106158)... (3 Replies)
I have a column of numbers $2, I would like to add 360 to all numbers that are negative. This method seems a bit convoluted, and does not work (outputs 0):
BEGIN {
A=sprintf("%d", $2);
if(A<0) A=A+360;
BIN++;
}
END { for(A in BIN) print... (5 Replies)
Hi all,
I have a file containing the values that would be use as the basis for printing the lines of another set of files using awk. What I want to do is something like the one below:
stdev.txt
0.21
0.42
0.32
0.25
0.15
file1.txt file2.txt file3.txt ..filen.txt
0.45 0.23 ... (4 Replies)
Hi,
I have a file in the following format:
aabbba 25.31806899
baaabb 38.21808852
cccccu 1.31819523
552258121.31818253
ffddybb 5.41815555
almcamc87561812689
223aqas5.661828345
adacaaaaaaa1821285
adacaaaaaaa1821286
smckaa 3.81828756
ada2512510c1821287
ada2522511c1821328... (4 Replies)
Hello,
How can I use a conditional to produce an output file that varies with respect to the contents of column #4 in the data file:
Data file:
9780020080954 9.95 0.49 AS 23.3729
9780020130857 9.95 0.49 AS 23.3729
9780023001406 22.20 0.25 AOD ... (12 Replies)
Heya
I'm trying to get to know awk a bit better.
So i'm trying to get used to calls saving me a grep invocation just to get a specific part of a single line.
This said, i want to get the current screen resolution according to xrandr's output.
Screen 0: minimum 8 x 8, current 1920 x 1080,... (1 Reply)
I am having a difficult time getting an awk one-liner to work correctly that runs a mathematical operation upon values in a field when matching a given criteria.
I would like to subtract 1 from every value in field $6 that is greater than 12. In this particular case it is only a constant of... (3 Replies)
Discussion started by: jvoot
3 Replies
LEARN ABOUT DEBIAN
bup-margin
bup-margin(1) General Commands Manual bup-margin(1)NAME
bup-margin - figure out your deduplication safety margin
SYNOPSIS
bup margin [options...]
DESCRIPTION
bup margin iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two
entries. This number, n, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids.
For example, one system that was tested had a collection of 11 million objects (70 GB), and bup margin returned 45. That means a 46-bit
hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by
its first 46 bits.
The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits,
that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits
with far fewer objects.
If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running bup margin occasionally to see if
you're getting dangerously close to 160 bits.
OPTIONS --predict
Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer
from the guess. This is potentially useful for tuning an interpolation search algorithm.
--ignore-midx
don't use .midx files, use only .idx files. This is only really useful when used with --predict.
EXAMPLE
$ bup margin
Reading indexes: 100.00% (1612581/1612581), done.
40
40 matching prefix bits
1.94 bits per doubling
120 bits (61.86 doublings) remaining
4.19338e+18 times larger is possible
Everyone on earth could have 625878182 data sets
like yours, all in one repository, and we would
expect 1 object collision.
$ bup margin --predict
PackIdxList: using 1 index.
Reading indexes: 100.00% (1612581/1612581), done.
915 of 1612581 (0.057%)
SEE ALSO bup-midx(1), bup-save(1)BUP
Part of the bup(1) suite.
AUTHORS
Avery Pennarun <apenwarr@gmail.com>.
Bup unknown-bup-margin(1)