Fixing a shell script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Fixing a shell script
# 8  
Old 10-10-2015
I have added other missing bases and AAcids to your awk code. Here is what I have now:
Code:
awk  '
BEGIN           {split ("UU[CU] UA[UC] GC[ACGU] GG[ACGU] CC[ACGU] AC[ACGU] GU[ACGU] (CG[ACGU]|AG[AG]) (CU[ACGU]|UU[AG]) (UC[ACGU]|AG[CU]) AU[ACU] AUG (UA[AG]|UAG) CA[AU] CA[AG] AA[UC] AA[AG] GA[CU] GA[AG] UG[CU] UGG", TMP1)
    for (i=split ("Phe Tyr Ala Gly Pro Thr Val Arg Leu Ser Ile Met STOP His Gln Asn Lys Asp Glu Cys Trp", TMP2); i > 0; i--)      {AACID[TMP2[i]]
                                                                                                 BASES[TMP2[i]]=TMP1[i]
    }
    if (DEBUG) {for (t in TMP1) print TMP2[t], TMP1[t]}
}


/^[     ]*$/    {EMP++
                 next
}
/^>/            {$0=""
}

                {print > "DNA.OUT"
                    gsub (/A/, "U")
                    gsub (/C/, "c")
                    gsub (/G/, "C")
                    gsub (/T/, "A")
                    gsub (/c/, "G")
                 print > "RNA.OUT"
                 gsub (/.../, "& ")
                 for (a in AACID) ACNT[a] += gsub (BASES[a], a)
                 print > "AminoAcids"
                }

                END             {print "lines: ", NR-EMP
                    print "empty: ", EMP
                    for (a in ACNT) print a, ACNT[a]
                }
' file

Please pardon my question, but how do I run this from the command line? My input file that has the DNA sequences is called "input.txt"

This is how I tried to run it:
Code:
faizlo@faizlo $ awk -f rudic.awk input.txt 
awk: rudic.awk:1: awk  '
awk: rudic.awk:1:      ^ invalid char ''' in expression
awk: rudic.awk:1: awk  '
awk: rudic.awk:1:      ^ syntax error

I also tried to add:
Code:
#!/usr/bin/awk -f

at the beginning of the code but got the smae error(s.)

** rudic.awk is the script that has your awk script.
# 9  
Old 10-10-2015
Change the last line of rudic.awk from:
Code:
' file

to:
Code:
' input.txt

Then execute the script with:
Code:
sh rudic.awk

or make
Code:
rudic.awk

executable and execute it directly:
Code:
chmod +x rudic.awk
./rudic.awk

This User Gave Thanks to Don Cragun For This Post:
# 10  
Old 10-11-2015
Interesting. I have done the executable step before but did not work!
It does now.
Thank you both for your help. I appreciate it.

---------- Post updated at 10:09 PM ---------- Previous update was at 10:03 PM ----------

I have one more question!

What should I do if I want the frequency from each line (sequence,) and not the total frequency of all sequences in the input file?
# 11  
Old 10-11-2015
Would this do:
Code:
awk  -vDEBUG=1 '
BEGIN           {C1 = split ("UU[CU] UA[UC] GC[ACGU] GG[ACGU] CC[ACGU] AC[ACGU] GU[ACGU] CG[ACGU]|AG[AG] CU[ACGU]|UU[AG] "\
                                "UC[ACGU]|AG[CU] AU[ACU] AUG UA[AG]|UAG CA[AU] CA[AG] AA[UC] AA[AG] GA[CU] GA[AG] UG[CU] UGG", TMP1)
                 for (C2=i=split ("Phe Tyr Ala Gly Pro Thr Val Arg Leu Ser Ile Met STOP His Gln Asn Lys Asp Glu Cys Trp", TMP2); i > 0; i--)    {AACID[TMP2[i]]
                                                                                                                                                 BASES[TMP2[i]]=TMP1[i]
                                                                                                                                                }
                 if (DEBUG) {print C1, C2; for (t in TMP1) print TMP2[t], TMP1[t]}
                }


/^[     ]*$/    {EMP++
                 next
                }
/^>/            {$0=""
                }

                {print > "DNA.OUT"
                 gsub (/A/, "U")
                 gsub (/C/, "c")
                 gsub (/G/, "C")
                 gsub (/T/, "A")
                 gsub (/c/, "G")
                 print > "RNA.OUT"
                 gsub (/.../, "& ")
                 for (a in AACID)       {TMP = gsub (BASES[a], a)
                                         print NR, a, TMP                
                                         ACNT[a] += TMP
                                        }
                 print > "AminoAcids"
                }

END             {print "lines: ", NR-EMP
                 print "empty: ", EMP
                 for (a in ACNT) print a, ACNT[a]
                }
' file

?

---------- Post updated at 10:44 ---------- Previous update was at 10:25 ----------

A bit more structured approach:
Code:
awk  -vDEBUG=1 '
BEGIN           {Str1 = "UU[CU] UA[UC] GC[ACGU] GG[ACGU] CC[ACGU] AC[ACGU] GU[ACGU] CG[ACGU]|AG[AG] CU[ACGU]|UU[AG] "\
                        "UC[ACGU]|AG[CU] AU[ACU] AUG UA[AG]|UAG CA[AU] CA[AG] AA[UC] AA[AG] GA[CU] GA[AG] UG[CU] UGG" 
                 Str2 = "Phe Tyr Ala Gly Pro Thr Val Arg Leu Ser Ile Met STOP His Gln Asn Lys Asp Glu Cys Trp"
                 C1 = split (Str1, TMP1)
                 for (C2=i=split (Str2, TMP2); i > 0; i--)      {AACID[TMP2[i]]
                                                                 BASES[TMP2[i]]=TMP1[i]
                                                                }
                 if (DEBUG) {print C1, C2; for (t in TMP1) print TMP2[t], TMP1[t]}

                 C1 = split ("ACGTc", PAT, "")
                 C2 = split ("UcCAG", REP, "")
                 if (DEBUG) {print C1, C2; for (p in PAT) print PAT[p], REP[p]}
                }


/^[     ]*$/    {EMP++
                 next 
                }
/^>/            {$0=""
                }

                {print > "DNA.OUT" 
                 for (i=1; i<=C1; i++)  gsub (PAT[i], REP[i])
                 print > "RNA.OUT" 
                 gsub (/.../, "& ")
                 for (a in AACID)       {TMP = gsub (BASES[a], a)
                                         print NR, a, TMP
                                         ACNT[a] += TMP  
                                        }
                 print > "AminoAcids"
                }

END             {print "lines: ", NR-EMP
                 print "empty: ", EMP   
                 for (a in ACNT) print a, ACNT[a]
                }
' file

(The parentheses in Str1 were a heritage from a former version - not needed anymore)

Last edited by RudiC; 10-11-2015 at 06:18 AM..
This User Gave Thanks to RudiC For This Post:
# 12  
Old 10-11-2015
@RudiC:
Thank you so much for your help. I can't appreciate it more.
It will take me some time to understand the script as awk seems to be huge indeed.
Thank you so much again.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Getting started with fixing bugs for Linux

Okay I want to try my luck at fixing bugs for the Fedora OS, but I guess this question deals with any Linux distro or any open source OS for that matter. I want to know how I can start fixing bugs on the OS level. For example the particular bug that I want to target is this logout bug I mean... (6 Replies)
Discussion started by: sreyan32
6 Replies

2. Shell Programming and Scripting

Help fixing awk code

can someone please help me spot and fix the issue with the following code: awk -F, -v SEARCHPATT="(Wed|Tue)" -v ADDISTR="Mon|Tue|Wed|Thu|Fri|Sat|Sun" -vVF="$VALFOUND" "BEGIN{ {D = D = 1 D = D = 2 } $0 ~ "," VF "," {L = 1 ... (9 Replies)
Discussion started by: SkySmart
9 Replies

3. UNIX for Advanced & Expert Users

Help with fixing screen position

Hey guys, I am trying to make print a pattern with * on a 10*10 two dimensional array in a for loop and I want the incoming 10*10 to overlap the previous 10*10 so that the * look like it is moving. is there a way to fix the screen position? ever time it prints a 10*10 the screen moves. ... (3 Replies)
Discussion started by: amit14august
3 Replies

4. AIX

Fixing security problem

Hi I use Rapid 7 to check some servers ( AIX 5.3 ) for security problems. There are 2 problems I don't know to deal with 1. Problem : TCP Sequence Number Approximation Vulnerability Solution : _Enable TCP MD5 Signature 2. Problem : HTTP Basic Authentication Enable Solution : _ Use... (5 Replies)
Discussion started by: bobochacha29
5 Replies

5. Homework & Coursework Questions

Help fixing my database script

1. The problem statement, all variables and given/known data: I need help I get a variant of syntax errors when compiling my script to maintain a database. It's a simple database meant to create/view/maintain vehicles. 2. Relevant commands, code, scripts, algorithms: my if statements have... (5 Replies)
Discussion started by: gamernerd101
5 Replies

6. Solaris

help needed for fixing zfs bug

Hi Experts I've problem in a my office server (solaris 10 - x86) version. x4600 M2 hardware This system is getting rebooted because of zfs bug I've applied patch using live upgrade with live new environment created and applied the patch which oracle suggested( 144501-19), it asks for... (3 Replies)
Discussion started by: SunSolars_admin
3 Replies

7. Shell Programming and Scripting

help fixing awk statement

awk "BEGIN {if($MessageREAD<$ThresholdW) {print \"OK\" ; exit 0} else if(($MessageREAD>=$ThresholdW) && ($MessageREAD<$ThresholdC)) {print \"WARNING\" ; exit 1}" else if($MessageREAD<=$ThresholdC) {print \"CRITICAL\" ;... (4 Replies)
Discussion started by: SkySmart
4 Replies

8. Shell Programming and Scripting

Fixing the width of a word

Is there a way to fix the width of the word being printed to a file? I am trying to create an output to a file with columns , like a spread sheet. I have used "\t" to adjust the columns but still it does not show well in the file, mainly due to the variable length values in the column so \t does... (1 Reply)
Discussion started by: davidtd
1 Replies

9. Linux

fixing with sed

I am trying to replace the value of $f3 but its not working . I don't know what I am missing here . cat dim_copy.20080516.sql | grep -i "create view" | grep -v OPSDM002 | while read f1 f2 f3 f4 f5 f6 f7 f8 f9 do echo " $f3 " sed -e... (13 Replies)
Discussion started by: capri_drm
13 Replies
Login or Register to Ask a Question