problem in binning the data


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting problem in binning the data
# 1  
Old 08-17-2012
Data problem in binning the data

hi

i have some data like this
input:
Code:
1 apples oranges 234
2 oranges apples 2345
3 grapes bananas 1000000
4 melons banans 10000000
5 bananas apples 5000000
6 mangoes banans 2000000
7 apples bananas 1999999

i want to put all those which are coming between 1 and 999999 in to one bin (2nd bin would start from 1000000 to 1999999 and so on).
my output file should look like this
Code:
bin1_fruits
apples oranges 234
2 oranges apples 2345

Code:
bin2_fruits
grapes bananas 1000000
apples bananas 1999999

how to do this i am trying with perl and if needed i can post that here
# 2  
Old 08-17-2012
modify this code

hi,
This is not the exact output you were expecting, but see if this code can help.


Code:
awk '{
 if ( $4 < 1000000) print "bin1 :" $0
 else if ($4 >= 1000000 && $4 < 2000000) print "bin2 :" $0
 else if ($4 >= 2000000 && $4 < 3000000) print "bin3 :" $0
 else if ($4 >= 3000000 && $4 < 4000000) print "bin4 :" $0
 else print "bin5 :" $0
 }' test3

it gives output like :
Code:
bin1 :1 apples oranges 234
bin1 :2 oranges apples 2345
bin2 :3 grapes bananas 1000000
bin5 :4 melons banans 10000000
bin5 :5 bananas apples 5000000
bin3 :6 mangoes banans 2000000
bin2 :7 apples bananas 1999999

sorry, i couldnt give you the exact code.

Moderator's Comments:
Mod Comment Please view this code tag video for how to use code tags when posting code and data.


---------- Post updated at 08:43 PM ---------- Previous update was at 08:25 PM ----------

We can pipe a sort command along with above code, so that we can get bin sorted like below.

Code:
awk '{
 if ( $4 < 1000000) print "bin1 :" $0
 else if ($4 >= 1000000 && $4 < 2000000) print "bin2 :" $0
 else if ($4 >= 2000000 && $4 < 3000000) print "bin3 :" $0
 else if ($4 >= 3000000 && $4 < 4000000) print "bin4 :" $0
 else print "bin5 :" $0
 }' test3 | sort

and the output will be :
Code:
bin1 :1 apples oranges 234
bin1 :2 oranges apples 2345
bin2 :3 grapes bananas 1000000
bin2 :7 apples bananas 1999999
bin3 :6 mangoes banans 2000000
bin5 :4 melons banans 10000000
bin5 :5 bananas apples 5000000

now the only problem remaining is that the name bin should print once.
This User Gave Thanks to PranavEcstasy For This Post:
# 3  
Old 08-17-2012
If you would like the bins to be different files called bin"n"_fruit, try this:
Code:
awk '{f=sprintf("%d", 1+$4/1000000); fn[f]="bin"f"_fruits"; print $2,$3,$4 >fn[f]}'

yielding
Code:
ls -1 bi*
bin11_fruits
bin1_fruits
bin2_fruits
bin3_fruits
bin6_fruits

Code:
 cat bin2_fruits 
grapes bananas 1000000
apples bananas 1999999

.
The number of simultaneously open files may be limited though; needs some error checking then.
# 4  
Old 08-17-2012
Quote:
Originally Posted by RudiC
If you would like the bins to be different files called bin"n"_fruit, try this:
Code:
awk '{f=sprintf("%d", 1+$4/1000000); fn[f]="bin"f"_fruits"; print $2,$3,$4 >fn[f]}'

yielding
Code:
ls -1 bi*
bin11_fruits
bin1_fruits
bin2_fruits
bin3_fruits
bin6_fruits

Code:
 cat bin2_fruits 
grapes bananas 1000000
apples bananas 1999999

.
The number of simultaneously open files may be limited though; needs some error checking then.
close(fn[f]) after each write.
# 5  
Old 08-17-2012
And if closing files after each write, use >> instead of > and make sure that the files are removed/truncated before running the program.
# 6  
Old 08-17-2012
i thank each one of you but i am really wondering why its taking too much time even for a file having 8 lines of dataSmilie
# 7  
Old 08-17-2012
Within code tags, post the code, the data, and the command that you are using.

Regards,
Alister
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Gnuplot 3d binning

Hello I have a text file with tens of thousands of rows The format is x y where both x and y can be anything between -100 and +100. What I would like to do is have a 3d gnuplot where there are 10,000 squared or bins and each bin will count how many rows have a value that would be... (1 Reply)
Discussion started by: garethsays
1 Replies

2. Shell Programming and Scripting

Problem tabbing the data.

Hi All, I'm having a problem tabbing the data. can you please help me tab the data. i used sed but it's not working. ex. BUNDLE1MB(6spaces)|Y|ng_oliv10@shellscript.com.ph(6spaces)562.60.61.20(6spaces)562.61.35 here's my code sed "s/ //g" Userid_dsl_npm.txt__200 | sed "s/ |/ /g" |... (2 Replies)
Discussion started by: nikki1200
2 Replies

3. Shell Programming and Scripting

Help to data re-arrangement problem

Input file <data id>="1">\ </data>\ <data id>="2">\ </data>\ <code>="1" target="2">\ </code>\ <data id>="1">\ </data>\ <data id>="2">\ </data>\ <code>="1" target="2">\ </code>\ <data id>="1">\ </data>\ <data id>="2">\ </data>\ <code>="1" target="2">\ </code>\ (2 Replies)
Discussion started by: cpp_beginner
2 Replies

4. Shell Programming and Scripting

Data problem

Hi all, suppose i have one file and in that file there are thousand of record like below . if i want to fill the bold position below with some value suppose 000000 then how it would be in shell script ... (1 Reply)
Discussion started by: aishsimplesweet
1 Replies

5. Shell Programming and Scripting

Binning rows while skipping the first column

Hi I have a file that I want to bin. I am using this code: awk -F'\t' -v r=40 '{for(i=r;i<=NF;i+=r){for(j=0;j<r;j++){sum+=$(i-j)}printf "%s ", sum/r;sum=0}; printf "\n"}' file1 > file2 So basically what this code does is that it will averaging every 40 columns (creating bins of 40). But... (2 Replies)
Discussion started by: phil_heath
2 Replies

6. Shell Programming and Scripting

trimming and binning rows

I could not find this on the search.. I want to know how to trim a row so lets say I have a file that looks like this: bob 88888888888888 and I want to trim column 2 (lets say 4 off the front and end) bob 888888 Also, how would I bin column 2 Lets so I want to add and average... (1 Reply)
Discussion started by: phil_heath
1 Replies

7. Shell Programming and Scripting

Sampling and Binning- Engineering problem

Hi everyone! Can you please help me with some shell scripting? I have an input file input.txt It has 3 columns (Time, Event, Value) Time event Value 03:38:22 A 57 03:38:23 A 56 03:38:24 B 24 03:38:25 C 51 03:38:26 B 7 03:38:26 ... (7 Replies)
Discussion started by: Needhelp2
7 Replies

8. Shell Programming and Scripting

Please help !!!!Problem with data file

I have a 1 million record file and in there i have 580 bad records meaning like they were spread on to 2 lines and is making my process fail as it is expecting single line records. How can i correct this records spread into 2 lines into single line or how i can writer these records into a new... (34 Replies)
Discussion started by: dsravan
34 Replies

9. Shell Programming and Scripting

Problem of Data Loading....

i am studying a script which is used for data loading. it has functions which deletes all the existing data before loading and then loads new fresh data. but i am stuck up at function Replace into table ( col 1,col 2....) Does this signify All Inserts. (1 Reply)
Discussion started by: ankitgupta
1 Replies
Login or Register to Ask a Question