Help with sort and keep data record to calculate N50 in c
Input_file_1
Rules:
1. Based on c program to calculate content of each "#". Result getting from the above Input_file_1 are 1,2,3,4,2,7;
2. Sort length on reverse order (descending order). 7, 4, 3, 2, 2, 1, 1;
3. Hope that the program able to store the above record (7, 4, 3, 2, 2, 1, 1) temporary for downstream analysis;
4. Sum all the total of Input_file_1: 7 + 4 + 3 + 2 + 2 + 1 + 1 = 20;
5. Divide (50%) the total sum of Input_file_1 as a threhold value: 20/2 = 10;
6. N50 must be equal to or greater than 50% of the total sum in Input_file_1 (10);
7. 7+4 = 11 (greater than 10);
Desired output result after running c program:
Many thanks for any advice.
I don't understand step 7. why is output 4? I googled N50, it typically means how many of the largest integers need to be added together to equal 50%, so 7+4=11, requires 2 integers and output is 2? I guess you want the smallest member.
I think this is proper solution, finally. Using your example file:
Last edited by neutronscott; 07-18-2011 at 02:28 AM..
Reason: had it totally wrong at first
This User Gave Thanks to neutronscott For This Post:
Hi, friend.
This is one of the thread that mention well about N50 calculation, Calculating an N50 from Velvet output | (R news & tutorials)
The N50 of my example should be 4 instead of 2.
I'm trying with your approaches now with test file.
Hopefully we getting the same approaches
Many thanks, neutronscott.
Your program work very fast for huge data
It is amazing.
Do you have any idea how to edit the program to allow it print out only the N50 number instead of whole data analysis detail?
I try to edit it.
But can't work
Thanks for your assist.
Hello,
I am new to Unix and would like to seek a help, please.
I have 2 files (file_1 and file_2), I need to perform the following actions.
1 ) Sort the both file by the column 26-36 (which is Invoice number)
what is sort command with the column sort?
2) Compare the file_1.sorted and... (3 Replies)
I have an awk script that gives the following output:
Average end-to-end transmission delay 2.7 to 5.7 is 0.635392 seconds
Average end-to-end transmission delay 2.1 to 5.1 is 0.66272 seconds
Average end-to-end transmission delay 2.1 to 5.1 is 0.691712 seconds
Average end-to-end transmission... (4 Replies)
input ("/" delimited fields):
style1/book1 (author_C)/editor1/2000
style1/book2 (author_A)/editor2/2004
style1/book3 (author_B)/editor3/2001
style2/book8 (author_B)/editor4/2010
style2/book5 (author_A)/editor2/1998
Records with same field 1 belong to the same group.
Using asort (not sort),... (3 Replies)
Hi everyone,
I've really searched for a solution to this and this is what I found so far:
I need to sort a command output (here represented as a "cat file" command) and from the second down to the second-last line based on the second row and then print ALL the output with the specified section... (7 Replies)
I've been searching high and low for this...but, maybe I'm just missing something. I have a file to be sorted that, unfortunately, contains binary data at the end of the line. As you may guess, this binary data may contain a newline character, which messes up the sort. I think I could resolve this... (5 Replies)
Hi everyone, just some simple question...
i've been using a awk script to calculate my data...
i have 3 files:
file a1.txt:
2
3
4
5
3
4
file a2.txt:
4
5
6
7
8 (1 Reply)
Hi All,
I want to calculate the total timing used by total users. Here "OUT" is showing that when an user logged in and "IN" timing is showing at what time user is logged out. If the corresponding IN-OUT is not matching then it consider the time from the mid-night of last day. Then Total... (0 Replies)