Quote:
Originally Posted by
iling14
Hi Don,
Thanks for your response. I'm currently using redhat. The line max is about 100k lines, while the number of fields are 100.
Yup, my desired output shall be what you have corrected. I manage to do the count for column by column, but wonder how could i perform a for loop one shot for 100 columns.
You didn't answer most of my questions. But, now that I know that you're using a Linux system, I know that your version of
awk will handle pretty much unlimited line lengths.
I'm still trying to get a feel for what the input and output data is going to look like. I'm assuming that the 100k number you gave is the number of lines in your input file (not the number of bytes in the longest line in your input file). Assuming that the numbers in the 1st column of your input are unique and that there are 6 distinct values in the other columns (A, G, C, T, D, and E) that means that we are converting 100k input rows with a maximum line length of about 210 bytes each into six output data lines (plus one header line) with a maximum line length approaching (3 * 99 spaces between fields +1 * 99 single letters for the col2 through col100 data + 99 * (6 digits + 1 comma) * 100k col1 values + 99 * 5 digits for the count values) 69.3 million bytes and an average line length approaching 11.5 million bytes.
Am I in the right ballpark here, or are my assumptions off? If my assumptions are off, where am I guessing wrong? Are there values other than A, G, C, T, D, and E that will appear in col2 through col100 in the input file?
Once you have created this file, do you have something that is going to be able to use this data?