Help with sort and keep data record to calculate N50 in c

Help with sort and keep data record to calculate N50 in c
# 1
07-18-2011
Help with sort and keep data record to calculate N50 in c

Input_file_1
Rules:
1. Based on c program to calculate content of each "#". Result getting from the above Input_file_1 are 1,2,3,4,2,7;
2. Sort length on reverse order (descending order). 7, 4, 3, 2, 2, 1, 1;
3. Hope that the program able to store the above record (7, 4, 3, 2, 2, 1, 1) temporary for downstream analysis;
4. Sum all the total of Input_file_1: 7 + 4 + 3 + 2 + 2 + 1 + 1 = 20;
5. Divide (50%) the total sum of Input_file_1 as a threhold value: 20/2 = 10;
6. N50 must be equal to or greater than 50% of the total sum in Input_file_1 (10);
7. 7+4 = 11 (greater than 10);
Desired output result after running c program:
 cpp_beginner View Public Profile for cpp_beginner Find all posts by cpp_beginner
# 2
07-18-2011
I don't understand step 7. why is output 4? I googled N50, it typically means how many of the largest integers need to be added together to equal 50%, so 7+4=11, requires 2 integers and output is 2? I guess you want the smallest member.

I think this is proper solution, finally. Using your example file:

Last edited by neutronscott; 07-18-2011 at 02:28 AM.. Reason: had it totally wrong at first
This User Gave Thanks to neutronscott For This Post:
 neutronscott View Public Profile for neutronscott Visit neutronscott's homepage! Find all posts by neutronscott
# 3
07-18-2011
Hi, friend.
This is one of the thread that mention well about N50 calculation, Calculating an N50 from Velvet output | (R news & tutorials)
The N50 of my example should be 4 instead of 2.
I'm trying with your approaches now with test file.
Hopefully we getting the same approaches
 cpp_beginner View Public Profile for cpp_beginner Find all posts by cpp_beginner
# 4
07-18-2011
As per your PM, the content should handle newlines. Also, I added DEBUG statements so you can view what the program is doing...

This User Gave Thanks to neutronscott For This Post:
 neutronscott View Public Profile for neutronscott Visit neutronscott's homepage! Find all posts by neutronscott
# 5
07-19-2011
Many thanks, neutronscott.
Your program work very fast for huge data
It is amazing.
Do you have any idea how to edit the program to allow it print out only the N50 number instead of whole data analysis detail?
I try to edit it.
But can't work
 cpp_beginner View Public Profile for cpp_beginner Find all posts by cpp_beginner
# 6
07-19-2011
#undef DEBUG
This User Gave Thanks to neutronscott For This Post:
 neutronscott View Public Profile for neutronscott Visit neutronscott's homepage! Find all posts by neutronscott

Sort by record column, Compare with conditons and export the result

Hello, I am new to Unix and would like to seek a help, please. I have 2 files (file_1 and file_2), I need to perform the following actions. 1 ) Sort the both file by the column 26-36 (which is Invoice number) what is sort command with the column sort? 2) Compare the file_1.sorted and...

Help with calculate the total sum of record in column one

Input file: 101M 10M10D20M1I70M 10M10D39M4I48M 10M10D91M 10M10I13M2I7M1I58M 10M10I15M1D66M Output file: 101M 101 0 0 10M10D20M1I70M 100 1 10 10M10D39M4I48M 97 4 10 10M10D91M 101 0 10 10M10I13M2I7M1I58M 88 13 0 10M10I15M1D66M 91 10 1 I'm interested to count how many total of...

awk --> math-operation in data-record and joining with second file data

Hi! I have a pretty complex job - at least for me! i have two csv-files with meassurement-data: fileA ......

Calculate average for repeated ID within a data

I have an awk script that gives the following output: Average end-to-end transmission delay 2.7 to 5.7 is 0.635392 seconds Average end-to-end transmission delay 2.1 to 5.1 is 0.66272 seconds Average end-to-end transmission delay 2.1 to 5.1 is 0.691712 seconds Average end-to-end transmission...

gawk asort to sort record groups based on one subfield

input ("/" delimited fields): style1/book1 (author_C)/editor1/2000 style1/book2 (author_A)/editor2/2004 style1/book3 (author_B)/editor3/2001 style2/book8 (author_B)/editor4/2010 style2/book5 (author_A)/editor2/1998 Records with same field 1 belong to the same group. Using asort (not sort),...

AWK exclude first and last record, sort and print

Hi everyone, I've really searched for a solution to this and this is what I found so far: I need to sort a command output (here represented as a "cat file" command) and from the second down to the second-last line based on the second row and then print ALL the output with the specified section...

sort file specifying record length

I've been searching high and low for this...but, maybe I'm just missing something. I have a file to be sorted that, unfortunately, contains binary data at the end of the line. As you may guess, this binary data may contain a newline character, which messes up the sort. I think I could resolve this...

Help with calculate total sum of same data problem

Long list of input file: AGDRE1 0.1005449050 AGDRE1 2.1005443435 AGDRE1 1.2005449050 AGDRE1 5.1005487870 AASFV3 50.456304789 AASFV3 2.3659706549 AASFV3 6.3489807860 AASFV3 3.0089890148 RTRTRS 5.6546403546 . . Desired output file: AGDRE1 8.5021829410 AASFV3 62.180245240...

Calculate data and make it into new column using awk

Hi everyone, just some simple question... i've been using a awk script to calculate my data... i have 3 files: file a1.txt: 2 3 4 5 3 4 file a2.txt: 4 5 6 7 8

How To Calculate Data

Hi All, I want to calculate the total timing used by total users. Here "OUT" is showing that when an user logged in and "IN" timing is showing at what time user is logged out. If the corresponding IN-OUT is not matching then it consider the time from the mid-night of last day. Then Total...