Problem with getting awk to multiply a field by a value set based on condition of another field
Hi,
So awk is driving me crazy on this one. I have searched everywhere and read man, docs and every related post Google can find and still no luck. The actual files I need to run this on are sensitive in nature, but it is the same thing as if I needed to calculate weighted grades for multiple students using all assignments as input. (Sample below)
The input file is a csv. I took all of the assignment names from $3 and calculated the highest grade, lowest grade and average grade for each one. the code I used to do this is:
Code:
awk -F , 'NR>1 { if(!($3 in course)) { low[$3] = high[$3] = $4 }
if ($4 < low[$3]) low[$3] = $4;
if ($4 > high[$3]) high[$3] = $4;
sum[$3] += $4;
++course[$3] }
END { OFS="\t"; print "Name", "Low", "High", "Avg";
for (k in course)
printf "%s\t%d\t%d\t%.2f\n", k, low[k], high[k], sum[k]/course[k] }' data.csv
That gave me the desired out put. Using the same input file, I want to now group by the Student name and then give weight to each assignment grade.
so the logic is this: If $2 = "Homework" then $4 = $4*0.10 and so on for each assignment Category. The all I need to do is sum all of the $4 for each student individually. On top of this, I also need to assign a letter grade based on the value of each student's total class grade, so I am not sure even how to proceed there at all.
However, I cannot make this happen and return all kinds of goofy results, including somehow printing the output headers several times with nothing else and they are set before anything else and are no where near a loop. So, I am totally confused. Here is the most recent failure:
Code:
awk -F, '{print "Name\tPercent\tGrade\n"}
NR>1{for(i=1;i<=NR;i++)
{ if ( $2 == "Quiz" ) w=0.4 ;
if ( $2 == "Lab" ) w=0.3 ;
if ( $2 == "Homework" ) w=0.1 ;
if ( $2 == "Final" ) w=0.15 ;
if ( $2 == "Survey" ) w=0.15 ;
} }
END {
a[$1]=$2;
b[$4]=$4*$w
for (k in a) printf "%s\t%d\n", k, a[k] ;
}' data.csv
So, if anyone can help me out here, I would appreciate it.
The Desired output is:
Name Percent Letter Grade
INPUT FILE:
data.csv
Code:
Student Category Assignment Score Possible
Chelsey Final FINAL 82 100
Sam Final FINAL 58 100
Andrew Final FINAL 99 100
Ava Final FINAL 99 100
Shane Final FINAL 90 100
Chelsey Homework H01 90 100
Chelsey Homework H02 89 100
Chelsey Homework H03 77 100
Chelsey Homework H04 80 100
Chelsey Homework H05 82 100
Chelsey Homework H06 84 100
Chelsey Homework H07 86 100
Sam Homework H01 19 100
Sam Homework H02 82 100
Sam Homework H03 95 100
Sam Homework H04 46 100
Sam Homework H05 82 100
Sam Homework H06 97 100
Sam Homework H07 52 100
Andrew Homework H01 25 100
Andrew Homework H02 47 100
Andrew Homework H03 85 100
Andrew Homework H04 65 100
Andrew Homework H05 54 100
Andrew Homework H06 58 100
Andrew Homework H07 52 100
Ava Homework H01 55 100
Ava Homework H02 95 100
Ava Homework H03 84 100
Ava Homework H04 74 100
Ava Homework H05 95 100
Ava Homework H06 84 100
Ava Homework H07 55 100
Shane Homework H01 50 100
Shane Homework H02 60 100
Shane Homework H03 70 100
Shane Homework H04 60 100
Shane Homework H05 70 100
Shane Homework H06 80 100
Shane Homework H07 90 100
Chelsey Lab L01 91 100
Chelsey Lab L02 100 100
Chelsey Lab L03 100 100
Chelsey Lab L04 100 100
Chelsey Lab L05 96 100
Chelsey Lab L06 80 100
Chelsey Lab L07 81 100
Sam Lab L01 41 100
Sam Lab L02 85 100
Sam Lab L03 99 100
Sam Lab L04 99 100
Sam Lab L05 0 100
Sam Lab L06 0 100
Sam Lab L07 0 100
Andrew Lab L01 87 100
Andrew Lab L02 45 100
Andrew Lab L03 92 100
Andrew Lab L04 48 100
Andrew Lab L05 42 100
Andrew Lab L06 99 100
Andrew Lab L07 86 100
Ava Lab L01 66 100
Ava Lab L02 77 100
Ava Lab L03 88 100
Ava Lab L04 99 100
Ava Lab L05 55 100
Ava Lab L06 66 100
Ava Lab L07 77 100
Shane Lab L01 90 100
Shane Lab L02 0 100
Shane Lab L03 100 100
Shane Lab L04 50 100
Shane Lab L05 40 100
Shane Lab L06 60 100
Shane Lab L07 80 100
Chelsey Quiz Q01 100 100
Chelsey Quiz Q02 100 100
Chelsey Quiz Q03 98 100
Chelsey Quiz Q04 93 100
Chelsey Quiz Q05 99 100
Chelsey Quiz Q06 88 100
Chelsey Quiz Q07 100 100
Sam Quiz Q01 91 100
Sam Quiz Q02 85 100
Sam Quiz Q03 33 100
Sam Quiz Q04 64 100
Sam Quiz Q05 54 100
Sam Quiz Q06 95 100
Sam Quiz Q07 68 100
Andrew Quiz Q01 25 100
Andrew Quiz Q02 84 100
Andrew Quiz Q03 59 100
Andrew Quiz Q04 93 100
Andrew Quiz Q05 85 100
Andrew Quiz Q06 94 100
Andrew Quiz Q07 58 100
Ava Quiz Q01 88 100
Ava Quiz Q02 99 100
Ava Quiz Q03 44 100
Ava Quiz Q04 55 100
Ava Quiz Q05 66 100
Ava Quiz Q06 77 100
Ava Quiz Q07 88 100
Shane Quiz Q01 70 100
Shane Quiz Q02 90 100
Shane Quiz Q03 100 100
Shane Quiz Q04 100 100
Shane Quiz Q05 80 100
Shane Quiz Q06 80 100
Shane Quiz Q07 80 100
Chelsey Survey WS 5 5
Sam Survey WS 5 5
Andrew Survey WS 5 5
Ava Survey WS 5 5
Shane Survey WS 5 5
Hi some quick thought on the last code snippet:
In awk the middle section is processed per line, so you should leave out: for(i=1;i<=NR;i++)
The results should be stored in arrays so they can be used in the END section.
The END section contains code after all lines have been read in the middle section,
so the following has no business there:
Ahh!! Thanks for that info. It explains why I get the repeated output headers then. Also, I did not realize that about the END statement. I understood it as you do not perform anything in BEGIN, but never knew that calcs and stuff should be done before END.
My biggest issue is that I am not sure of how to take an associative array and have another array stored within it. Basically I need to have it be Student_Name[Assignment_Category{Assignment Scores] where it would look like this:
That actually helps alot, as I had tried something like that. What I tried to do was:
Code:
SUM[$1] += $4*0.1
That did not work, but by storing the weights in in an array, I see how it could be workable. My only question is how to store/use multiple values.
If $2 = Homework then $4 needs to be multiplied by 0.1
if $2 = Quiz then $4 needs to be multiplied by 0.4
if $2 = Lab then $4 needs to be multiplied by 0.3
if $2 = Final then $4 needs to be multiplied by 0.15
if $2 = Survey then $4 needs to be multiplied by 0.05
Then, all I would need to do is some $4 with the new values and have the overall percent for each student.
That actually helps alot, as I had tried something like that. What I tried to do was:
Code:
SUM[$1] += $4*0.1
That did not work, but by storing the weights in in an array, I see how it could be workable. My only question is how to store/use multiple values.
If $2 = Homework then $4 needs to be multiplied by 0.1
if $2 = Quiz then $4 needs to be multiplied by 0.4
if $2 = Lab then $4 needs to be multiplied by 0.3
if $2 = Final then $4 needs to be multiplied by 0.15
if $2 = Survey then $4 needs to be multiplied by 0.05
Then, all I would need to do is some $4 with the new values and have the overall percent for each student.
The weight per category is delivered in file1, in the shape you showed post #1:
Request your help to change the field color based on condition , if it is otherthan 0. using html in unix.
Here is my condition
for(i=1;i<=NF;i++)
{
print "<td> "$i"</td>
}
Please use CODE tags when displaying sample input, output, and code segments. (17 Replies)
In the awk below I am trying to copy the entire contents of $6 there may be multiple values seperated by a ;, to $8, if $8 is . (lines 1 and 3 are examples). If that condition $8 is not . (line2 is an example) then that line is skipped and printed as is. The awk does execute but prints the output... (3 Replies)
In the perl below, which does execute, I am having trouble with the else in Rule 3. The digit in f{8} is extracted and used to update f accordinly along with the value in f.
There can be either - * or + before the number that is extracted but the same logic applies, that is if the value is greater... (5 Replies)
I am trying to output a tab-delimited result that uses the data from a tab-delimited file to combine and subtract specific lines.
If $4 matches in each line then the first matching sequential $6 value is added to $2, unless the value is 1, then the original $2 is used (like in the case of line... (3 Replies)
In the tab-delimeted input file below I am trying to use awk to update the value in $2 if TYPE=ins in bold, by adding the value of
HRUN= in italics. In the below since in line 1 TYPE=ins the 117282541 value in $2 has 6 added because that is the value of HRUN=.
Hopefully the awk is a start but I... (2 Replies)
I am trying to confirm the counts from another code and tried the below awk, but the syntax is incorrect. Basically, outputting the counts of each condition in $8. Thank you :)
awk '$8==/TYPE=snp/ /TYPE=ins/ /TYPE=del/ {count++} END{print count}'... (6 Replies)
Hi everybody,
I'm trying to replace the $98 field with "T" if the last field (108th) is T
I've tried
awk 'BEGIN{OFS=FS="|"} {if ($108=="T")sub($98,"T"); print}' test.txt
but that doesn't do anything
also tried
awk 'BEGIN{OFS=FS="|"}{ /*T.$/ sub($98,"T")} { print}' test.txt
but... (2 Replies)
Hi
i am new to scripting. i have a file file.dat with content as :
CONTENT_STORAGE PERCENTAGE FLAG:
/storage_01 64% 0
/storage_02 17% 1
I need to update the value of FLAG for a particular CONTENT_STORAGE value
I have written the following code
#!/bin/sh
threshold=20... (1 Reply)
So, I need to do some summing. I have an Apache log file with the following as a typical line:
127.0.0.1 - frank "GET /apache_pb.gif HTTP/1.0" 200 2326
Now, what I'd like to do is a per-minute sum. So, I can have awk tell me the individual minutes, preserving the dates(since this is a... (7 Replies)
I want to find the top N entries for a certain field based on the values of another field.
For example if N=3, we want the 3 best values for each entry:
Entry1 ||| 100
Entry1 ||| 95
Entry1 ||| 30
Entry1 ||| 80
Entry1 ||| 50
Entry2 ||| 40
Entry2 ||| 20
Entry2 ||| 10
Entry2 ||| 50... (1 Reply)