Based on your sample file, your key is NOT in the first field, but rather in the SECOND.
This will create 2 files: myInput_dup and myInput_uniq
I was numbering the cols with $0 as the first col, is that not right? Now that I think about it, $0 is the whole line, if I remember right.
Will this work with awk, or do I need nawk?
Quote:
Originally Posted by ctsgnb
Make all lines uniq (make duplicate consecutive line appear only once):
By the way, do you really care about the first field (line number) or can we get rid of it ?
I probably need in index field, but I probably don't need to preserve the values from the input file. I could just do another line of awk to add a new index. LMHmedchem
Hi All,
i have an input below.
As long as "x= 1" , i would want to capture 2 lines using sed or awk
for eg :
0001 x= 1 $---------------------------------..-.--..
0001 tt= 137 171 423 1682 2826 0
Pls help. Thanks in advance.
Note that the number of lines in each block do... (37 Replies)
Hi,
I am processing a file and would like to delete duplicate records as indicated by one of its column. e.g.
COL1 COL2 COL3
A 1234 1234
B 3k32 2322
C Xk32 TTT
A NEW XX22
B 3k32 ... (7 Replies)
i have the long file more than one ns and www and mx in the line like .
i need the first ns record and first www and first mx from line .
the records are seperated with tthe ; i am try ing in awk scripting not getiing the solution.
... (4 Replies)
Hi,
I have a log file having size of 48mb.
For such a large log file. I want to get the message in a particular format which includes only unique error and exception messages.
The following things to be done :
1) To remove all the date and time from the log file
2) To remove all the... (1 Reply)
I need help with a script that will remove all HTML tags from an HTML document and remove any consecutive duplicate lines, and save it as a text document. The user should have the option of including the name of an html file as an argument for the script, but if none is provided, then the script... (7 Replies)
Hello, I'm trying to delete duplicates when there are more than 10 duplicates, based on the value of the first column.
e.g.
a 1
a 2
a 3
b 1
c 1
gives
b 1
c 1
but requires 11 duplicates before it deletes.
Thanks for the help
Video tutorial on how to use code tags in The UNIX... (11 Replies)
Dear members, I need to filter a file based on the 8th column (that is id), and does not mather the other columns, because I want just one id (1 line of each id) and remove the duplicates lines based on this id (8th column), and does not matter wich duplicate will be removed.
example of my file... (3 Replies)
Hello all, I am working on a file like below:
site Date time value1 value2
0023 2014-01-01 00:00 32.0 23.7
0023 2014-01-01 01:00 38.0 29.9
0023 2014-01-01 02:00 85.0 26.6
0023 2014-01-01 03:00 34.0 25.3
0023 2014-01-01 04:00 37.0 23.8
0023 2014-01-01 05:00 80.0 20.3
0023 2014-01-01 06:00... (16 Replies)
Hi,
In an ideal scenario, I will have a listing of db transaction log that gets copied to a DR site and if I have them all, they will be numbered consecutively like below.
1_79811_01234567.arc
1_79812_01234567.arc
1_79813_01234567.arc
1_79814_01234567.arc
1_79815_01234567.arc... (3 Replies)
Hello,
I'm trying to remove the duplicate consecutive lines with specific string "WARNING".
File.txt
abc;
WARNING 2345
WARNING 2345
WARNING 2345
WARNING 2345
WARNING 2345
bcd;
abc;
123
123
123
WARNING 1234
WARNING 2345
WARNING 2345
efgh; (6 Replies)
Discussion started by: Mannu2525
6 Replies
LEARN ABOUT DEBIAN
bup-margin
bup-margin(1) General Commands Manual bup-margin(1)NAME
bup-margin - figure out your deduplication safety margin
SYNOPSIS
bup margin [options...]
DESCRIPTION
bup margin iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two
entries. This number, n, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids.
For example, one system that was tested had a collection of 11 million objects (70 GB), and bup margin returned 45. That means a 46-bit
hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by
its first 46 bits.
The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits,
that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits
with far fewer objects.
If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running bup margin occasionally to see if
you're getting dangerously close to 160 bits.
OPTIONS --predict
Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer
from the guess. This is potentially useful for tuning an interpolation search algorithm.
--ignore-midx
don't use .midx files, use only .idx files. This is only really useful when used with --predict.
EXAMPLE
$ bup margin
Reading indexes: 100.00% (1612581/1612581), done.
40
40 matching prefix bits
1.94 bits per doubling
120 bits (61.86 doublings) remaining
4.19338e+18 times larger is possible
Everyone on earth could have 625878182 data sets
like yours, all in one repository, and we would
expect 1 object collision.
$ bup margin --predict
PackIdxList: using 1 index.
Reading indexes: 100.00% (1612581/1612581), done.
915 of 1612581 (0.057%)
SEE ALSO bup-midx(1), bup-save(1)BUP
Part of the bup(1) suite.
AUTHORS
Avery Pennarun <apenwarr@gmail.com>.
Bup unknown-bup-margin(1)