11-10-2008
Calculate the Median, first quartile and third quartile using AWK
Hi all,
I have a data range as follow:
28
33
42
12
9
68
81
55
6
47
Since I want to create Box & Whisker Plot, I need to calculate the median, first quartile and third quartile of above data using AWK.( so far I can only writing a code for determine smallest value & largest value using AWK)
Anyone can help me...Please..
Thanks
6 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi,
I have below awk statement and I need to convert the second field ( substr($0,8,6))from minutes to hours with 2 decimail place. How can I achieve this?
/usr/bin/awk '{print substr($0,23,4),substr($0,8,6)}' /tmp/MANAGER_LIST.$$ >> /tmp/NEWMANAGER_LIST.$$
Thanks for any help! (4 Replies)
Discussion started by: whatisthis
4 Replies
2. Shell Programming and Scripting
hi!
i have a file like the attachement.
you can see on the last column, there is a marker from 1 to 64 for each time.
I'd like to have the median for each marker: i want to get a median every 128 values
the result is : for an hour and marker x, i have the median value
thank you for... (5 Replies)
Discussion started by: riderman
5 Replies
3. Shell Programming and Scripting
Input file:
21.08
21.06
20.98
20.65
18.52
16.34
13.58
12.2
10.66
10.22
9.8
8.6
7.4
3.9
3.5
Desired output file: (10 Replies)
Discussion started by: perl_beginner
10 Replies
4. Shell Programming and Scripting
Is there a way in awk to compute the median of a set of numbers in a file in the following format.
34
67
78
100
23
45
67 (3 Replies)
Discussion started by: Lucky Ali
3 Replies
5. Shell Programming and Scripting
Hi All,
I have some data like below.
Step1,Param1,Param2,Param3
1,2,3,4
2,3,4,5
2,4,5,6
3,0,1,2
3,0,0,0
3,2,1,3
........
so on
Where I need to find the median(arithmetic) of each column from Param1...to..Param3 for each set of Step1 values.
(Sort each specific column, if the... (5 Replies)
Discussion started by: ks_reddy
5 Replies
6. Shell Programming and Scripting
I use the following script to print the sum and how could I extend this to print medians instead? thanks
name s1 s2 s3 s4
g1 2 8 6 5
g1 5 7 9 9
g1 6 7 8 9
g2 8 8 8 8
g2 7 7 7 7
g2 10 10 10 10
g3 3 12 1 24
g3 5 5 24 48
g3 12 3 12 12
g3 2 3 3 3
output
name s1 s2 s3 s4
g1 5 7 8 9... (5 Replies)
Discussion started by: quincyjones
5 Replies
LEARN ABOUT DEBIAN
fastx_quality_stats
FASTX_QUALITY_STATS(1) User Commands FASTX_QUALITY_STATS(1)
NAME
fastx_quality_stats - FASTX Statistics
DESCRIPTION
usage: fastx_quality_stats [-h] [-N] [-i INFILE] [-o OUTFILE] Part of FASTX Toolkit 0.0.13.2 by A. Gordon (gordon@cshl.edu)
[-h] = This helpful help screen. [-i INFILE] = FASTQ input file. default is STDIN. [-o OUTFILE] = TEXT output file. default is
STDOUT. [-N] = New output format (with more information per nucleotide/cycle).
The *OLD* output TEXT file will have the following fields (one row per column):
column = column number (1 to 36 for a 36-cycles read solexa file)
count = number of bases found in this column.
min = Lowest quality score value found in this column.
max = Highest quality score value found in this column.
sum = Sum of quality score values for this column.
mean = Mean quality score value for this column.
Q1 = 1st quartile quality score.
med = Median quality score.
Q3 = 3rd quartile quality score.
IQR = Inter-Quartile range (Q3-Q1).
lW = 'Left-Whisker' value (for boxplotting).
rW = 'Right-Whisker' value (for boxplotting).
A_Count = Count of 'A' nucleotides found in this column. C_Count = Count of 'C' nucleotides found in this column. G_Count = Count
of 'G' nucleotides found in this column. T_Count = Count of 'T' nucleotides found in this column. N_Count = Count of 'N' nucleo-
tides found in this column. max-count = max. number of bases (in all cycles)
The *NEW* output format:
cycle (previously called 'column') = cycle number max-count For each nucleotide in the cycle (ALL/A/C/G/T/N):
count = number of bases found in this column.
min = Lowest quality score value found in this column.
max = Highest quality score value found in this column.
sum = Sum of quality score values for this column.
mean = Mean quality score value for this column.
Q1 = 1st quartile quality score.
med = Median quality score.
Q3 = 3rd quartile quality score.
IQR = Inter-Quartile range (Q3-Q1).
lW = 'Left-Whisker' value (for boxplotting).
rW = 'Right-Whisker' value (for boxplotting).
SEE ALSO
The quality of this automatically generated manpage might be insufficient. It is suggested to visit
http://hannonlab.cshl.edu/fastx_toolkit/commandline.html
to get a better layout as well as an overview about connected FASTX tools.
fastx_quality_stats 0.0.13.2 May 2012 FASTX_QUALITY_STATS(1)