11-19-2013
Which of the duplicate values do you want to retain? The first? The smallest?
Does the output order matter, esp. reg. median calculation?
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I have a file with lines something like.
......
123_start
......
.......
123_end
....
.....
456_start
......
.....
456_end
....
.....
789_start
....
....
789_end (6 Replies)
Discussion started by: abinash
6 Replies
2. Shell Programming and Scripting
Hello, I am using awk to split a file into multiple files using command:
nawk '{
if ( $1 == "<process" )
{
n=split($2, arr, "\"");
file=arr
}
print > file }' processes.xml
<process name="Process1.process">
... (3 Replies)
Discussion started by: chiru_h
3 Replies
3. Shell Programming and Scripting
Arun kumar something somehting Enterting in to the line
.
.
.
.
Some text text Finshing the sentence
Some other text
.
.
.
.
Again something somehting Enterting in to the line
.
.
.
.
.
.
Again text text Finshing the sentence (6 Replies)
Discussion started by: arukuku
6 Replies
4. Shell Programming and Scripting
Hi,
I've one requirement. I have to split one comma delimited file into multiple files based on one of the column values.
How can I achieve this Unix
Here is the sample data. In this case I have split the files based on date column(c4)
Input file
c1,c2,c3,c4,c5... (1 Reply)
Discussion started by: manasvi24
1 Replies
5. Shell Programming and Scripting
Good day all
I need some helps,
say that I have data like below, each field separated by a tab
DATE NAME ADDRESS
15/7/2012 LX a.b.c
15/7/2012 LX1 a.b.c
16/7/2012 AB a.b.c
16/7/2012 AB2 a.b.c
15/7/2012 LX2 a.b.c... (2 Replies)
Discussion started by: alexyyw
2 Replies
6. UNIX for Dummies Questions & Answers
Hi,
I have a Huge 7 GB file which has around 1 million records, i want to split this file into 4 files to contain around 250k messages each.
Please help me as Split command cannot work here as it might miss tags..
Format of the file is as below
<!--###### ###### START-->... (6 Replies)
Discussion started by: KishM
6 Replies
7. Shell Programming and Scripting
Hi All,
I have the sales_data.csv file in the directory as below.
SDDCCR; SOM ; MD6546474777 ;05-JAN-16
ABC ; KIRAN ; CB789 ;04-JAN-16
ABC ; RAMANA; KS566767477747 ;06-JAN-16
ABC ; KAMESH; A33535335 ;04-JAN-16
SDDCCR; DINESH; GD6674474747 ;08-JAN-16... (4 Replies)
Discussion started by: ROCK_PLSQL
4 Replies
8. Shell Programming and Scripting
Hi,
I have two pipe separated files as below:
head -3 file1.txt
"HD"|"Nov 11 2016 4:08AM"|"0000000018"
"DT"|"240350264"|"56432"
"DT"|"240350264"|"56432"
head -3 file2.txt
"HD"|"Nov 15 2016 2:18AM"|"0000000019"
"DT"|"240350264"|"56432"
"DT"|"240350264"|"56432"
I want to list the... (6 Replies)
Discussion started by: Prasannag87
6 Replies
9. Shell Programming and Scripting
I need to split the file contents with multiple rows based on patterns
Sample:
Input:
ABC101testXYZ102UKMNO1092testing
ABC999testKMNValid
Output:
ABC101test
XYZ102U
KMN1092testing
ABC999test
KMNValid
In this ABC , XYZ and KMN are patterns
Continue here./mod]
Please read forum... (1 Reply)
Discussion started by: Jairaj
1 Replies
10. UNIX for Beginners Questions & Answers
I need to split the file contents with multiple rows based on patterns
Sample:
Input:
ABC101testXYZ102UKMNO1092testing
ABC999testKMNValid
Output:
ABC101test
XYZ102U
KMN1092testing
ABC999test
KMNValid
In this ABC , XYZ and KMN are patterns (6 Replies)
Discussion started by: Jairaj
6 Replies
LEARN ABOUT DEBIAN
bup-margin
bup-margin(1) General Commands Manual bup-margin(1)
NAME
bup-margin - figure out your deduplication safety margin
SYNOPSIS
bup margin [options...]
DESCRIPTION
bup margin iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two
entries. This number, n, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids.
For example, one system that was tested had a collection of 11 million objects (70 GB), and bup margin returned 45. That means a 46-bit
hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by
its first 46 bits.
The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits,
that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits
with far fewer objects.
If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running bup margin occasionally to see if
you're getting dangerously close to 160 bits.
OPTIONS
--predict
Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer
from the guess. This is potentially useful for tuning an interpolation search algorithm.
--ignore-midx
don't use .midx files, use only .idx files. This is only really useful when used with --predict.
EXAMPLE
$ bup margin
Reading indexes: 100.00% (1612581/1612581), done.
40
40 matching prefix bits
1.94 bits per doubling
120 bits (61.86 doublings) remaining
4.19338e+18 times larger is possible
Everyone on earth could have 625878182 data sets
like yours, all in one repository, and we would
expect 1 object collision.
$ bup margin --predict
PackIdxList: using 1 index.
Reading indexes: 100.00% (1612581/1612581), done.
915 of 1612581 (0.057%)
SEE ALSO
bup-midx(1), bup-save(1)
BUP
Part of the bup(1) suite.
AUTHORS
Avery Pennarun <apenwarr@gmail.com>.
Bup unknown- bup-margin(1)