Simplifying awk/sed short pipeline


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Simplifying awk/sed short pipeline
# 1  
Old 11-20-2017
Simplifying awk/sed short pipeline

I have a file like this:
Code:
FileName,Well,Sample Description,Size [bp],Calibrated Conc. [ng/µl],Assigned Conc. [ng/µl],Peak Molarity [nmol/l],Area,% Integrated Area,Peak Comment,Observations
2017-11-15 - 13.49.50.D1000,EL1,Electronic Ladder,25,5.22,,321,0.803,,,Lower Marker
2017-11-15 - 13.49.50.D1000,EL1,Electronic Ladder,50,2.25,,69.3,0.347,11.11,,
2017-11-15 - 13.49.50.D1000,EL1,Electronic Ladder,100,2.37,,36.5,0.365,11.71,,
2017-11-15 - 13.49.50.D1000,EL1,Electronic Ladder,200,2.47,,19.0,0.380,12.20,,
2017-11-15 - 13.49.50.D1000,EL1,Electronic Ladder,300,2.55,,13.1,0.392,12.56,,
2017-11-15 - 13.49.50.D1000,EL1,Electronic Ladder,400,2.57,,9.87,0.395,12.66,,
2017-11-15 - 13.49.50.D1000,EL1,Electronic Ladder,500,2.71,,8.33,0.416,13.36,,
2017-11-15 - 13.49.50.D1000,EL1,Electronic Ladder,700,2.46,,5.41,0.379,12.15,,
2017-11-15 - 13.49.50.D1000,EL1,Electronic Ladder,1000,2.89,,4.44,0.444,14.25,,
2017-11-15 - 13.49.50.D1000,EL1,Electronic Ladder,1500,6.50,6.50,6.67,1.000,,,Upper Marker
2017-11-15 - 13.49.50.D1000,A1,,25,8.64,,531,1.329,,,Lower Marker
2017-11-15 - 13.49.50.D1000,A1,,78,1.62,,31.9,0.249,59.03,,
2017-11-15 - 13.49.50.D1000,A1,,99,1.13,,17.5,0.173,40.97,,
2017-11-15 - 13.49.50.D1000,A1,,1500,6.50,6.50,6.67,1.000,,,Upper Marker
2017-11-15 - 13.49.50.D1000,B1,,25,9.27,,570,1.426,,,Lower Marker
2017-11-15 - 13.49.50.D1000,B1,,43,1.40,,50.4,0.215,19.94,,
2017-11-15 - 13.49.50.D1000,B1,,66,1.24,,29.1,0.191,17.72,,
2017-11-15 - 13.49.50.D1000,B1,,85,0.866,,15.7,0.133,12.37,,
2017-11-15 - 13.49.50.D1000,B1,,111,1.11,,15.3,0.170,15.81,,
2017-11-15 - 13.49.50.D1000,B1,,189,0.525,,4.27,0.081,7.50,,
2017-11-15 - 13.49.50.D1000,B1,,226,0.395,,2.70,0.061,5.65,,
2017-11-15 - 13.49.50.D1000,B1,,270,0.333,,1.90,0.051,4.76,,
2017-11-15 - 13.49.50.D1000,B1,,717,0.380,,0.816,0.058,5.43,,
2017-11-15 - 13.49.50.D1000,B1,,969,0.758,,1.20,0.117,10.83,,
2017-11-15 - 13.49.50.D1000,B1,,1500,6.50,6.50,6.67,1.000,,,Upper Marker
2017-11-15 - 13.49.50.D1000,C1,,25,8.39,,516,1.291,,,Lower Marker
2017-11-15 - 13.49.50.D1000,C1,,44,0.597,,20.8,0.092,3.40,,
2017-11-15 - 13.49.50.D1000,C1,,95,1.10,,17.7,0.169,6.26,,
2017-11-15 - 13.49.50.D1000,C1,,116,0.856,,11.3,0.132,4.87,,
2017-11-15 - 13.49.50.D1000,C1,,388,15.0,,59.5,2.309,85.47,,
2017-11-15 - 13.49.50.D1000,C1,,1500,6.50,6.50,6.67,1.000,,,Upper Marker
2017-11-15 - 13.49.50.D1000,D1,,25,6.10,,375,0.939,,,Lower Marker
2017-11-15 - 13.49.50.D1000,D1,,221,2.83,,19.7,0.435,7.41,,
2017-11-15 - 13.49.50.D1000,D1,,554,28.7,,79.8,4.418,75.22,,
2017-11-15 - 13.49.50.D1000,D1,,808,5.37,,10.2,0.826,14.06,,
2017-11-15 - 13.49.50.D1000,D1,,1500,6.50,6.50,6.67,1.000,,,Upper Marker
2017-11-15 - 13.49.50.D1000,D1,,2645,1.27,,0.736,0.195,3.31,,
2017-11-15 - 13.49.50.D1000,E1,,25,8.31,,511,1.279,,,Lower Marker
2017-11-15 - 13.49.50.D1000,E1,,399,48.7,,188,7.494,66.32,,
2017-11-15 - 13.49.50.D1000,E1,,564,13.6,,37.1,2.092,18.51,,
2017-11-15 - 13.49.50.D1000,E1,,793,11.1,,21.6,1.715,15.17,,
2017-11-15 - 13.49.50.D1000,E1,,1500,6.50,6.50,6.67,1.000,,,Upper Marker
2017-11-15 - 13.49.50.D1000,F1,,25,9.27,,570,1.425,,,Lower Marker
2017-11-15 - 13.49.50.D1000,F1,,403,53.3,,203,8.193,65.36,,
2017-11-15 - 13.49.50.D1000,F1,,570,16.2,,43.8,2.493,19.89,,
2017-11-15 - 13.49.50.D1000,F1,,796,12.0,,23.2,1.849,14.75,,
2017-11-15 - 13.49.50.D1000,F1,,1500,6.50,6.50,6.67,1.000,,,Upper Marker
2017-11-15 - 13.49.50.D1000,G1,,25,13.3,,816,2.041,,,Lower Marker
2017-11-15 - 13.49.50.D1000,G1,,106,0.740,,10.8,0.114,0.66,,
2017-11-15 - 13.49.50.D1000,G1,,413,66.7,,249,10.257,59.46,,
2017-11-15 - 13.49.50.D1000,G1,,586,18.9,,49.7,2.913,16.89,,
2017-11-15 - 13.49.50.D1000,G1,,813,16.8,,31.7,2.577,14.94,,
2017-11-15 - 13.49.50.D1000,G1,,937,9.03,,14.8,1.390,8.06,,
2017-11-15 - 13.49.50.D1000,G1,,1500,6.50,6.50,6.67,1.000,,,Upper Marker
2017-11-15 - 13.49.50.D1000,H1,,25,28.6,,1760,4.407,,,edited Lower Marker
2017-11-15 - 13.49.50.D1000,H1,,41,0.781,,29.3,0.120,1.35,,
2017-11-15 - 13.49.50.D1000,H1,,55,4.88,,136,0.751,8.43,,
2017-11-15 - 13.49.50.D1000,H1,,86,4.55,,81.4,0.701,7.86,,
2017-11-15 - 13.49.50.D1000,H1,,377,21.5,,87.8,3.305,37.09,,
2017-11-15 - 13.49.50.D1000,H1,,525,8.07,,23.7,1.241,13.93,,
2017-11-15 - 13.49.50.D1000,H1,,749,18.2,,37.3,2.793,31.34,,
2017-11-15 - 13.49.50.D1000,H1,,1500,6.50,6.50,6.67,1.000,,,Upper Marker

And I am using the following script to :
1) remove first line
2)Identify rows where columns 4 displays values between 410 and 570, with values >0.5 in column 5.
3)If more than one meet the requirements, "combine" rows and add up values in column 5 and sort the entries.
Code:
sed "1d" input.txt | gawk -F "," "!/EL/{ if ( $4 > 410 && $4 < 570 && $5 > 0.5 ) print $2, $5; else print $2, 0 }" | gawk "{a[$1]+=$2}END{for(i in a){print i, a[i]}}" | sed "s/\([A-z]\)\([1-9]\) /\10\2 /" | sort > output.txt

I then find the minimal value listed on the second column -excluding zeros:
Code:
gawk "NR == 1 || $2 < min && $2 > 0 {line = $2; min = $2}END{print line}" output.txt > minimum.txt

And multiply that value for 15 and divide it for each instance listed on the first output file:
Code:
gawk "FNR == NR{est = $1 * 15; next}{if ($2 > 0) print $1, $2, est / $2; else print $0, 0}" minimum.txt output.txt > final.txt

I really would like to improve this small pipeline. I dont want to keep using | to "stitch" together different AWK and sed scripts -cannot use anything else but gawk/sed. I also dont want to generate multiple temporary files (output and minimum) -which I am pretty sure are not needed (I just could not think of a better way to do it.
I am using GnuWin32/cmd to run the pipeline using a .bat file -not allowed to use CygWin on this box.
Any help will be greatly appreciated
# 2  
Old 11-20-2017
First of all, you must put in 'ticks' not "quotes" around the embedded g/awk code, because in the latter $1 $2 ... are substituted with positional shell parameters. The shell does this before it passes the code string to g/awk.
This User Gave Thanks to MadeInGermany For This Post:
# 3  
Old 11-20-2017
Probability is high that ALL your code can be combined into one single awk script. Unfortunately your min works only if the first line in output.txt has non-null value, so with your sample data the rest of the algorithm (which I don't really understand) doesn't work out, always printing the input line followed by a zero.
Any corrections to the algorithm?
# 4  
Old 11-20-2017
Quote:
First of all, you must put in 'ticks' not "quotes" around the embedded g/awk code, because in the latter $1 $2 ... are substituted with positional shell parameters. The shell does this before it passes the code string to g/awk.
I get the following message if I use single quotes:
Code:
gawk: '!/EL/{
gawk: ^ invalid char ''' in expression
sed: couldn't write 72 items to stdout: Invalid argument

It runs ok using double quotes

Quote:
Unfortunately your min works only if the first line in output.txt has non-null value, so with your sample data the rest of the algorithm (which I don't really understand) doesn't work out, always printing the input line followed by a zero.
You right! Let me work a bit more on that. Any suggestions to improve the first script so I dont keep using | to stitch it all together?
Code:
sed "1d" input.txt | gawk -F "," "!/EL/{ if ( $4 > 410 && $4 < 570 && $5 > 0.5 ) print $2, $5; else print $2, 0 }" | gawk "{a[$1]+=$2}END{for(i in a){print i, a[i]}}" | sed "s/\([A-z]\)\([1-9]\) /\10\2 /" | sort > output.txt

# 5  
Old 11-20-2017
How about
Code:
awk -F "," '

NR==1 ||                                                        # get rid of first line (replace first sed command)
/EL/    {next                                                   # get rid of lines containing "EL"
        }

        {if ($4 <= 410 || $4 >= 570 || $5 <= 0.5) $5 = 0        # implement condition for $4 and $5 values
         a[$2] += $5                                            # sum up $5 values (replace second awk script)
         if ($5?$5:MIN < MIN) MIN = $5                          # determine minimum value (replace third awk script)
        }

END     {MIN *= 15                                              # implement minimum multiplication
         for (i in a)   {T = a[i]                               # run through all collected index (= $2) values
                         IX = substr(i,1,1) "0" substr (i,2)    # modify $2 string (replace second sed script)
                         print  IX, T, T?MIN/T:0                # print them all
                        }
        }
' file
A01 0 0
B01 0 0
C01 0 0
D01 28.7 4.21777
E01 13.6 8.90074
F01 0 0
G01 66.7 1.81484
H01 8.07 15

Should a sorting step be necessary, pipe the output through sort, or use gawk's sorting algorithms (which my mawk doesn't provide).
# 6  
Old 12-04-2017
Thanks Rudy!
Worked like a charm
# 7  
Old 12-04-2017
You might want to give this version a try:
Code:
rev=""
if [ $# -gt 0 ]
  then
  if [ $1 = '-r' ]
  then
    rev="r"
#   echo $rev
    shift
  fi
fi
ls -l $* |grep -v "total " |sort +4n$rev -5 +8 |more -e

It lists all files (or star name convention) in increasing size or, with "-r" as the first parm, in decreasing size. Ties are alphabetical by file name.



Moderator's Comments:
Mod Comment Obviously belongs to a different thread.UNIX commands to display the biggest file by size in a directory

Last edited by RudiC; 12-05-2017 at 06:41 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Simplifying awk script using multiple "|"

I have the following script: awk -F "," '{ if ( $4 > 450 && $4 < 550 && $5 > 0.5 ) print $2, $5; else print $2, "0" }' test.txt | awk '{a+=$2}END{for(i in a){print i, a}}' | sort -nk 1.2 | sed 1,2d and a bunch of files that look like the test file attached here. I am outputting all... (2 Replies)
Discussion started by: Xterra
2 Replies

2. UNIX for Dummies Questions & Answers

Submitting awk script into cluster short.q

Hello, I want to submit my awk script into cluster queue as my job takes about forty minutes to finish so I can not run it on the main node. My awk script is like the following and I have three files. so, I write : qsub -q short.q Myscript.awk file1 file2 file3 It submits the work into... (1 Reply)
Discussion started by: Homa
1 Replies

3. Shell Programming and Scripting

Simplifying sed/tr

Hi all, I don't have much experience with shell scripting and I was wondering if there's a shorter way to write this. Basically, given a list of strings separated by new lines, I want to prepend each string with a prefix and separate the strings with commas i.e. stra strb strc becomes... (3 Replies)
Discussion started by: vshan
3 Replies

4. Shell Programming and Scripting

need help simplifying an if statement

the code below is a small fragment of the actual line, in fact i have about 20 values i'm comparing and want to know if it can be simplified. other than the x.xx.xx format of the value they have nothing in common if || || ; then do this else do this fiany suggestions? (6 Replies)
Discussion started by: crimso
6 Replies

5. Shell Programming and Scripting

Looking for a short way to summarise many sed commands

Hello, I have a large number of sed commands that I execute one after the other, simply because I don't know if there's a shorter way to do it. I hope someone can help me save some time :-) These are my commands: 1.) remove all " in the file: sed -e 's/\"//g' file 2.) insert ( and... (3 Replies)
Discussion started by: Bloomy
3 Replies

6. UNIX for Advanced & Expert Users

Simplifying my script

Hi, Is there a way to simplify the below script? Because I am having problems executing this if I added this to CRON. Also, you may notice that its objective is to put all information in one file (rm1.txt). And in addition file "sRMR_6.txt" to sRMR_23.txt" changes its information everyday.... (4 Replies)
Discussion started by: vibora
4 Replies

7. Shell Programming and Scripting

Double Spacing complex sed pipeline

my script: FILE="$1" echo "You Entered $FILE" if ; then tmp=$(cat $FILE | sed '/./!d' | sed -n '/regex/,/regex/{/regex/d;p}'| sed -n '/---/,+2!p' | sed -n '/#/!p' | sed 's/^*//' | sed -e\ s/*:// | sed -n '/==> /!p' | sed -n '/--> /!p' | sed -n '/regex/,+1!p' | sed -n '/======/!p' | sed -n... (1 Reply)
Discussion started by: omgsomuchppl
1 Replies

8. Shell Programming and Scripting

simplifying awk

tcpdump -nr testdump|awk '!/:/;gsub(/^+|+$/,""){print $3};a!~$0;{a=$0};{print $3};!/length/;/./;!/11\:/;!/8 7 6 5 4 3 2 1/;!/UDP/{b=$0} END {for (j=i-1; j>=0; ) print b };{for (i=NF; i>0; i--) printf("%s ",i);printf ("\n")}' *NOTE IN j>=0 the ; ) was given a space since a smiley is showing up...... (1 Reply)
Discussion started by: sil
1 Replies

9. Shell Programming and Scripting

Simplifying the For loop

Is there a way to simplify stating 1 through to 10, for example in a for loop construct? for x in 1 2 3 4 5 6 7 8 9 10 do .... done I have tried (1-10) with no luck.. thanks (2 Replies)
Discussion started by: sirtrancealot
2 Replies

10. Shell Programming and Scripting

Help in simplifying a sed command

I need to do a substitution: CPF to ,C,P,F, CPM to ,C,P,M, SPF to ,S,P,F etc. I can do each of them with separate substitutions e.g. s/CPF/,C,P,F/ but I'd like to know if there is a more elegant solution. In general, how can I use the results of the search in the substitution, ... (3 Replies)
Discussion started by: wvdeijk
3 Replies
Login or Register to Ask a Question