07-11-2012
Yes the data is in the same order as i have provided. I am not getting data from database. This is in text file only.
8 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi
I have the following lines in a file
SANDI108085FRANKLIN WRAP 7285
SANDI109514ZIPLOC STRETCH N SEAL 7285
SANDI110198CHOICE DM 0911
SANDI111144RANDOM WEIGHT BRAND 0704
SANDI111144RANDOM WEIGHT BRAND 0738... (10 Replies)
Discussion started by: dhanamurthy
10 Replies
2. Shell Programming and Scripting
Input:
a
b
b
c
d
d
I need:
a
c
I know how to get this (the lines that have duplicates) :
b
d
sort file | uniq -d
But i need opossite of this. I have searched the forum and other places as well, but have found solution for everything except this variant of the problem. (3 Replies)
Discussion started by: necroman08
3 Replies
3. Shell Programming and Scripting
I have a log file that is about 1.2 million lines long and about 300MB.
we need a way to clean up this file and only keep the last few thousand lines.
if i use tail command we run our of memory as the file is too big.
I do have a key word to match on.
example, we want to keep every line... (8 Replies)
Discussion started by: robsonde
8 Replies
4. UNIX for Dummies Questions & Answers
Can someone tell me how to change the first column in a very large 17k line file from a random 10 digit numeric value to a non numeric value. The format of lines in the file is:
1702938475,SNU022,201004
the first 10 numbers always begin with 170 (6 Replies)
Discussion started by: Bahf1s
6 Replies
5. UNIX for Dummies Questions & Answers
Hey guys & gals,
I am hoping for some advice on a sed or awk command that will
allow to only print lines from a file that contain 3 numeric values.
From previous searches here I saw that ygemici used the sed command
to remove lines containing more than 3 numeric values ;
however how... (3 Replies)
Discussion started by: TAPE
3 Replies
6. UNIX for Dummies Questions & Answers
Hi
My 30 million line file has a header
chr start end strand ref_context repeat_masked s1_smpl_context s1_c_count s1_ct_count s1_non_ct_count s1_m% s1_score s1_snp s1_indels s2_smpl_context s2_c_count s2_ct_count s2_non_ct_count s2_m% s2_score s2_snp s2_indels ... (2 Replies)
Discussion started by: plumb_r
2 Replies
7. Shell Programming and Scripting
Hi,
I have a file (sorted by sort) with 8 tab delimited columns. The first column contains duplicated fields and I need to merge all these identical lines.
My input file:
comp100002 aaa bbb ccc ddd eee fff ggg
comp100003 aba aba aba aba aba aba aba
comp100003 fff fff fff fff fff fff fff... (5 Replies)
Discussion started by: falcox
5 Replies
8. Shell Programming and Scripting
Hi, I'd like to ask if anybody can help improve my code to move 1 million+ files from a directory to another:
find /source/dir -name file* -type f | xargs -I '{}' mv {} /destination/dir
I learned this line of code from this forum as well and it works fine. However, file movement is kinda... (6 Replies)
Discussion started by: agentgrecko
6 Replies
LEARN ABOUT DEBIAN
bup-margin
bup-margin(1) General Commands Manual bup-margin(1)
NAME
bup-margin - figure out your deduplication safety margin
SYNOPSIS
bup margin [options...]
DESCRIPTION
bup margin iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two
entries. This number, n, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids.
For example, one system that was tested had a collection of 11 million objects (70 GB), and bup margin returned 45. That means a 46-bit
hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by
its first 46 bits.
The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits,
that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits
with far fewer objects.
If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running bup margin occasionally to see if
you're getting dangerously close to 160 bits.
OPTIONS
--predict
Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer
from the guess. This is potentially useful for tuning an interpolation search algorithm.
--ignore-midx
don't use .midx files, use only .idx files. This is only really useful when used with --predict.
EXAMPLE
$ bup margin
Reading indexes: 100.00% (1612581/1612581), done.
40
40 matching prefix bits
1.94 bits per doubling
120 bits (61.86 doublings) remaining
4.19338e+18 times larger is possible
Everyone on earth could have 625878182 data sets
like yours, all in one repository, and we would
expect 1 object collision.
$ bup margin --predict
PackIdxList: using 1 index.
Reading indexes: 100.00% (1612581/1612581), done.
915 of 1612581 (0.057%)
SEE ALSO
bup-midx(1), bup-save(1)
BUP
Part of the bup(1) suite.
AUTHORS
Avery Pennarun <apenwarr@gmail.com>.
Bup unknown- bup-margin(1)