Is it possible to extract rows with the same first column and then calculate its percentage?


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Is it possible to extract rows with the same first column and then calculate its percentage?
# 1  
Old 06-27-2010
Is it possible to extract rows with the same first column and then calculate its percentage?

A short excerpt of my .txt file looks like:

Code:
CXRA3Z2J9MQKR    B
CXRA3Z2J9MQKR    A
CXRA3Z2J9MQKR    C
CXRA3Z2J9MQKR    B
A162JX4ML69UIC    C
A162JX4ML69UIC    A
FZ9Z19TI2XOA5     A
FZ9Z19TI2XOA5     C
FZ9Z19TI2XOA5     B
FZ9Z19TI2XOA5     B
BRNTTJUB8GXE9     A
BRNTTJUB8GXE9     A

This is a 2-part question:

1) Is there a way for me to first extract each unique ID from the first column and all the affiliated rows (all the rows that start with 'CXRA3Z2J9MQKR' and 'A162JX4ML69UIC' etc) to a new .txt file??

2) After that, I need to ultimately calculate the percentage of A's, B's, and C's from each ID (ex: 'CXRA3Z2J9MQKR') in my data. So user 'BRNTTJUB8GXE9' would have: A=100%, B=0%, C=0%.

Is there a way to do math such as calculating percentage (not with numbers, but percentage of letters like in this case) in UNIX?

Thanks in advance for any help or feedback. I'm new to UNIX, and I'm being forced to learn it 'on the job' at my new workplace.

Last edited by Scott; 06-27-2010 at 06:29 AM.. Reason: Code tags, please...
# 2  
Old 06-27-2010
This generates the extract you want - to get analysis put it into Excel. Or download openoffice and use the spreadheet in there:
Code:
awk ' {arr[substr($1,1,1)  $2]++}
        END {for (i in arr) { print i, "\t", arr[i]} } ' inputfile > extractfile.csv

Analysis is possible in UNIX. I think it would just be easier for you in excel.
# 3  
Old 06-27-2010
Hi Jim,

Thanks for your help. But that awk script tallies up all the occurrences of A, B, and C separately and neglects to count up only the A's B's and C's from each unique ID in the 1st column. So basically the script gave me the result of 5 A's, 4 B's, and 3 C's in the excerpt I provided.

Is there a way to only count those based on the unique content in the first column per row? That's why I had thought maybe it's better to first extract all the rows by what their ID is. Is there a quick way to do that as well?

Thanks everyone!
# 4  
Old 06-27-2010
nawk -f px.awk myFile

px.awk:
Code:
{
  kv[$1,$2]++
  k[$1]++
}
END {
  for( kI in k) {
    print kI
    for(kvI in kv) {
      split(kvI,kvA,SUBSEP)
      if (kvA[1] == kI)
        printf("\t[%s]=%.2f%\n", kvA[2], (kv[kvI]/k[kI])*100)
    }
  }
}

# 5  
Old 06-27-2010
Hi vgersh99,

Thanks for your help. I tried your code, and got this in return:

awk: weird printf conversion %
awk: not enough args in printf( [%s]=%.2f%)

Do you know what the problem may be? And I'm jw, but can you explain what the line of code printf("\t[%s]=%.2f%\n" does?

Thanks, I'm starting to see just how powerful UNIX scripting is.
# 6  
Old 06-27-2010
try this instead:
Code:
BEGIN {
  pct=sprintf("%c", 037)
}
{
  # associative array (kv) to be indexed by the values of FIRST and the SECOND columns ($1,$2)
  # the value of the array's cell is number of the combinations of the indecies - incremented by 1 (++) for every occurrence 
  kv[$1,$2]++

  # associative array (k) to be indexed by the value of the FIRST column ($1)
  # the value of the array's cell is number of the encountered unique value of the FIRST column - 
  # incremented by 1 (++) for every occurrence 
  k[$1]++
}
END {
  for( kI in k) {
    print kI
    for(kvI in kv) {
      split(kvI,kvA,SUBSEP)
      if (kvA[1] == kI)
        printf("\t[%s]=%.2f%c\n", kvA[2], (kv[kvI]/k[kI])*100, pct)
    }
  }
}

output given your sample input:
Code:
BRNTTJUB8GXE9
        [A]=100.00%
A162JX4ML69UIC
        [A]=50.00%
        [C]=50.00%
FZ9Z19TI2XOA5
        [A]=25.00%
        [B]=50.00%
        [C]=25.00%
CXRA3Z2J9MQKR
        [A]=25.00%
        [B]=50.00%
        [C]=25.00%


Last edited by vgersh99; 06-28-2010 at 09:36 AM..
# 7  
Old 06-28-2010
Wow thanks vgersh99, that worked flawlessly.

Can you please explain how you came up with it? You don't have to do every line, but just the general gist of the code for learning purposes?? Smilie

---------- Post updated at 09:13 PM ---------- Previous update was at 09:01 PM ----------

I'm wondering specifically what you did so that it only counts all the unique items in first column but at the same time also including with it the responses from the 2nd column.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Calculate percentage difference between two columns

I have a input text file in this format: ITEM1 10.9 20.1 ITEM2 11.6 12 ITEM3 14 15.7 ITEM5 20 50.6 ITEM6 25 23.6 I want to print those lines which have more than 5% difference between second and third columns. (8 Replies)
Discussion started by: ctrld
8 Replies

2. Shell Programming and Scripting

Calculate Percentage

Hello, Ive got a bunch of numbers here e.g: 6065 6094 6348 6297 6161 6377 6338 6290 How do I find out if there is a difference between 10% or more between one of these numbers ? I am trying to do this in Bash.. but no luck so far.. Does anyone have an Idea ?? Thanx, - Pascal... (9 Replies)
Discussion started by: denbekker
9 Replies

3. Shell Programming and Scripting

Calculate percentage of a value accross m

I have 100 csv files like: file_city_1 file_city_2 file_city_3 file_city_4 City name is variable, there is 25 cities, each city has 4 region. Each of the 4 region contain some statistics like: parameter1 : number1 parameter1 : number2 ..... parameter50 : number50 ... (7 Replies)
Discussion started by: Meacham12
7 Replies

4. UNIX for Dummies Questions & Answers

awk - Extract 4 lines in Column to Rows Tab Delimited between tags

I have tried the following to no avail. xargs -n8 < test.txt awk '{if(NR%6!=0){p=""}else{p="\n"};printf $0" "p}' Mod_Alm_log.txt > test.txt I have tried different variations of the above, the problem is mixes lines together. And it includes the tags "%a and %A" I need them to be all tab... (16 Replies)
Discussion started by: mytouchsr
16 Replies

5. Shell Programming and Scripting

How to calculate what percentage of X value is there in the file?

Input File: 5081 2058 175 8282 2358 7347 6612 3459 END OF INPUT FILE I need to know how to calculate minimum,maximum,average of the values in the file and also what percentage is the values over some user defined value for example 1000 and what percentage of value is over 5000. By... (2 Replies)
Discussion started by: aroragaurav.84
2 Replies

6. Shell Programming and Scripting

Script shell, how to calculate percentage?

hello, please can you help me. jj and kk are two numbers which are the result of an sql program. I would like to calculate the ratio jj/kk*100. I have done this: ratio=$((jj/kk * 100)) or ratio=`expr $jj \/ expr $kk) but the result is 0 What can i do? Thanks for help. (3 Replies)
Discussion started by: likeaix
3 Replies

7. Shell Programming and Scripting

Need an AWK script to calculate the percentage

Hi I need a awk script to calculate percentage. I have to pass the pararmeters in to the awk script and calculate the percentage. Sum = 50 passed = 43 failed = 7 I need to pass these value in to the awk script and calculate the percentage. Please advice me. (8 Replies)
Discussion started by: bobprabhu
8 Replies

8. Shell Programming and Scripting

How can i calculate percentage ??

i have 3 files like total.dat=18 equal.dat=14 notequal.dat=16 i need find the equal percentange means: equalpercentage = ($equal.dat / $total.dat * 100) How i can do this ? I tried some of the answers to calculate the percentage in this forums.but it couldn't worked.Some one please... (6 Replies)
Discussion started by: bobprabhu
6 Replies

9. Shell Programming and Scripting

How to calculate the percentage for the values in column

Hi, I am having the file which contains the following two columns. 518 _factorial 256 _main 73 _atol 52 ___do_global_ctors 170 ___main 52 ___do_g How can calculate the percentage of each value in the first column ? first need to get the sum of the first column and... (3 Replies)
Discussion started by: saleru_raja
3 Replies

10. Programming

how do I calculate percentage ?

int percent (int a, int b) { if (b/a*100 > 25) return TRUE; else return FALSE; } I want to calculate what percentage of a is b. say if b = 48, a = 100 so b is 48% of a but wouldnt b/a give me 0 ??? what can be done ?? (6 Replies)
Discussion started by: the_learner
6 Replies
Login or Register to Ask a Question