Sponsored Content
Full Discussion: problem in binning the data
Top Forums Shell Programming and Scripting problem in binning the data Post 302688035 by RudiC on Friday 17th of August 2012 12:46:41 PM
Old 08-17-2012
If you would like the bins to be different files called bin"n"_fruit, try this:
Code:
awk '{f=sprintf("%d", 1+$4/1000000); fn[f]="bin"f"_fruits"; print $2,$3,$4 >fn[f]}'

yielding
Code:
ls -1 bi*
bin11_fruits
bin1_fruits
bin2_fruits
bin3_fruits
bin6_fruits

Code:
 cat bin2_fruits 
grapes bananas 1000000
apples bananas 1999999

.
The number of simultaneously open files may be limited though; needs some error checking then.
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Problem of Data Loading....

i am studying a script which is used for data loading. it has functions which deletes all the existing data before loading and then loads new fresh data. but i am stuck up at function Replace into table ( col 1,col 2....) Does this signify All Inserts. (1 Reply)
Discussion started by: ankitgupta
1 Replies

2. Shell Programming and Scripting

Please help !!!!Problem with data file

I have a 1 million record file and in there i have 580 bad records meaning like they were spread on to 2 lines and is making my process fail as it is expecting single line records. How can i correct this records spread into 2 lines into single line or how i can writer these records into a new... (34 Replies)
Discussion started by: dsravan
34 Replies

3. Shell Programming and Scripting

Sampling and Binning- Engineering problem

Hi everyone! Can you please help me with some shell scripting? I have an input file input.txt It has 3 columns (Time, Event, Value) Time event Value 03:38:22 A 57 03:38:23 A 56 03:38:24 B 24 03:38:25 C 51 03:38:26 B 7 03:38:26 ... (7 Replies)
Discussion started by: Needhelp2
7 Replies

4. Shell Programming and Scripting

trimming and binning rows

I could not find this on the search.. I want to know how to trim a row so lets say I have a file that looks like this: bob 88888888888888 and I want to trim column 2 (lets say 4 off the front and end) bob 888888 Also, how would I bin column 2 Lets so I want to add and average... (1 Reply)
Discussion started by: phil_heath
1 Replies

5. Shell Programming and Scripting

Binning rows while skipping the first column

Hi I have a file that I want to bin. I am using this code: awk -F'\t' -v r=40 '{for(i=r;i<=NF;i+=r){for(j=0;j<r;j++){sum+=$(i-j)}printf "%s ", sum/r;sum=0}; printf "\n"}' file1 > file2 So basically what this code does is that it will averaging every 40 columns (creating bins of 40). But... (2 Replies)
Discussion started by: phil_heath
2 Replies

6. Shell Programming and Scripting

Data problem

Hi all, suppose i have one file and in that file there are thousand of record like below . if i want to fill the bold position below with some value suppose 000000 then how it would be in shell script ... (1 Reply)
Discussion started by: aishsimplesweet
1 Replies

7. Shell Programming and Scripting

Help to data re-arrangement problem

Input file <data id>="1">\ </data>\ <data id>="2">\ </data>\ <code>="1" target="2">\ </code>\ <data id>="1">\ </data>\ <data id>="2">\ </data>\ <code>="1" target="2">\ </code>\ <data id>="1">\ </data>\ <data id>="2">\ </data>\ <code>="1" target="2">\ </code>\ (2 Replies)
Discussion started by: cpp_beginner
2 Replies

8. Shell Programming and Scripting

Problem tabbing the data.

Hi All, I'm having a problem tabbing the data. can you please help me tab the data. i used sed but it's not working. ex. BUNDLE1MB(6spaces)|Y|ng_oliv10@shellscript.com.ph(6spaces)562.60.61.20(6spaces)562.61.35 here's my code sed "s/ //g" Userid_dsl_npm.txt__200 | sed "s/ |/ /g" |... (2 Replies)
Discussion started by: nikki1200
2 Replies

9. Shell Programming and Scripting

Gnuplot 3d binning

Hello I have a text file with tens of thousands of rows The format is x y where both x and y can be anything between -100 and +100. What I would like to do is have a 3d gnuplot where there are 10,000 squared or bins and each bin will count how many rows have a value that would be... (1 Reply)
Discussion started by: garethsays
1 Replies
bup-margin(1)						      General Commands Manual						     bup-margin(1)

NAME
bup-margin - figure out your deduplication safety margin SYNOPSIS
bup margin [options...] DESCRIPTION
bup margin iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two entries. This number, n, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids. For example, one system that was tested had a collection of 11 million objects (70 GB), and bup margin returned 45. That means a 46-bit hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by its first 46 bits. The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits, that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits with far fewer objects. If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running bup margin occasionally to see if you're getting dangerously close to 160 bits. OPTIONS
--predict Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer from the guess. This is potentially useful for tuning an interpolation search algorithm. --ignore-midx don't use .midx files, use only .idx files. This is only really useful when used with --predict. EXAMPLE
$ bup margin Reading indexes: 100.00% (1612581/1612581), done. 40 40 matching prefix bits 1.94 bits per doubling 120 bits (61.86 doublings) remaining 4.19338e+18 times larger is possible Everyone on earth could have 625878182 data sets like yours, all in one repository, and we would expect 1 object collision. $ bup margin --predict PackIdxList: using 1 index. Reading indexes: 100.00% (1612581/1612581), done. 915 of 1612581 (0.057%) SEE ALSO
bup-midx(1), bup-save(1) BUP
Part of the bup(1) suite. AUTHORS
Avery Pennarun <apenwarr@gmail.com>. Bup unknown- bup-margin(1)
All times are GMT -4. The time now is 07:10 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy