Fixing a shell script

10-10-2015

Registered User

68, 2

Join Date: Sep 2008

Last Activity: 22 January 2020, 2:31 AM EST

Posts: 68

Thanks Given: 15

Thanked 2 Times in 2 Posts

I have added other missing bases and AAcids to your awk code. Here is what I have now:

Code:

awk  '
BEGIN           {split ("UU[CU] UA[UC] GC[ACGU] GG[ACGU] CC[ACGU] AC[ACGU] GU[ACGU] (CG[ACGU]|AG[AG]) (CU[ACGU]|UU[AG]) (UC[ACGU]|AG[CU]) AU[ACU] AUG (UA[AG]|UAG) CA[AU] CA[AG] AA[UC] AA[AG] GA[CU] GA[AG] UG[CU] UGG", TMP1)
    for (i=split ("Phe Tyr Ala Gly Pro Thr Val Arg Leu Ser Ile Met STOP His Gln Asn Lys Asp Glu Cys Trp", TMP2); i > 0; i--)      {AACID[TMP2[i]]
                                                                                                 BASES[TMP2[i]]=TMP1[i]
    }
    if (DEBUG) {for (t in TMP1) print TMP2[t], TMP1[t]}
}


/^[     ]*$/    {EMP++
                 next
}
/^>/            {$0=""
}

                {print > "DNA.OUT"
                    gsub (/A/, "U")
                    gsub (/C/, "c")
                    gsub (/G/, "C")
                    gsub (/T/, "A")
                    gsub (/c/, "G")
                 print > "RNA.OUT"
                 gsub (/.../, "& ")
                 for (a in AACID) ACNT[a] += gsub (BASES[a], a)
                 print > "AminoAcids"
                }

                END             {print "lines: ", NR-EMP
                    print "empty: ", EMP
                    for (a in ACNT) print a, ACNT[a]
                }
' file

Please pardon my question, but how do I run this from the command line? My input file that has the DNA sequences is called "input.txt"

This is how I tried to run it:

Code:

faizlo@faizlo $ awk -f rudic.awk input.txt 
awk: rudic.awk:1: awk  '
awk: rudic.awk:1:      ^ invalid char ''' in expression
awk: rudic.awk:1: awk  '
awk: rudic.awk:1:      ^ syntax error

I also tried to add:

Code:

#!/usr/bin/awk -f

at the beginning of the code but got the smae error(s.)

** rudic.awk is the script that has your awk script.

faizlo

View Public Profile for faizlo

Find all posts by faizlo

10-10-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Change the last line of rudic.awk from:

Code:

' file

to:

Code:

' input.txt

Then execute the script with:

Code:

sh rudic.awk

or make

Code:

rudic.awk

executable and execute it directly:

Code:

chmod +x rudic.awk
./rudic.awk

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10-11-2015

Registered User

68, 2

Join Date: Sep 2008

Last Activity: 22 January 2020, 2:31 AM EST

Posts: 68

Thanks Given: 15

Thanked 2 Times in 2 Posts

Interesting. I have done the executable step before but did not work!
It does now.
Thank you both for your help. I appreciate it.

---------- Post updated at 10:09 PM ---------- Previous update was at 10:03 PM ----------

I have one more question!

What should I do if I want the frequency from each line (sequence,) and not the total frequency of all sequences in the input file?

faizlo

View Public Profile for faizlo

Find all posts by faizlo

10-11-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Would this do:

Code:

awk  -vDEBUG=1 '
BEGIN           {C1 = split ("UU[CU] UA[UC] GC[ACGU] GG[ACGU] CC[ACGU] AC[ACGU] GU[ACGU] CG[ACGU]|AG[AG] CU[ACGU]|UU[AG] "\
                                "UC[ACGU]|AG[CU] AU[ACU] AUG UA[AG]|UAG CA[AU] CA[AG] AA[UC] AA[AG] GA[CU] GA[AG] UG[CU] UGG", TMP1)
                 for (C2=i=split ("Phe Tyr Ala Gly Pro Thr Val Arg Leu Ser Ile Met STOP His Gln Asn Lys Asp Glu Cys Trp", TMP2); i > 0; i--)    {AACID[TMP2[i]]
                                                                                                                                                 BASES[TMP2[i]]=TMP1[i]
                                                                                                                                                }
                 if (DEBUG) {print C1, C2; for (t in TMP1) print TMP2[t], TMP1[t]}
                }


/^[     ]*$/    {EMP++
                 next
                }
/^>/            {$0=""
                }

                {print > "DNA.OUT"
                 gsub (/A/, "U")
                 gsub (/C/, "c")
                 gsub (/G/, "C")
                 gsub (/T/, "A")
                 gsub (/c/, "G")
                 print > "RNA.OUT"
                 gsub (/.../, "& ")
                 for (a in AACID)       {TMP = gsub (BASES[a], a)
                                         print NR, a, TMP                
                                         ACNT[a] += TMP
                                        }
                 print > "AminoAcids"
                }

END             {print "lines: ", NR-EMP
                 print "empty: ", EMP
                 for (a in ACNT) print a, ACNT[a]
                }
' file

?

---------- Post updated at 10:44 ---------- Previous update was at 10:25 ----------

A bit more structured approach:

Code:

awk  -vDEBUG=1 '
BEGIN           {Str1 = "UU[CU] UA[UC] GC[ACGU] GG[ACGU] CC[ACGU] AC[ACGU] GU[ACGU] CG[ACGU]|AG[AG] CU[ACGU]|UU[AG] "\
                        "UC[ACGU]|AG[CU] AU[ACU] AUG UA[AG]|UAG CA[AU] CA[AG] AA[UC] AA[AG] GA[CU] GA[AG] UG[CU] UGG" 
                 Str2 = "Phe Tyr Ala Gly Pro Thr Val Arg Leu Ser Ile Met STOP His Gln Asn Lys Asp Glu Cys Trp"
                 C1 = split (Str1, TMP1)
                 for (C2=i=split (Str2, TMP2); i > 0; i--)      {AACID[TMP2[i]]
                                                                 BASES[TMP2[i]]=TMP1[i]
                                                                }
                 if (DEBUG) {print C1, C2; for (t in TMP1) print TMP2[t], TMP1[t]}

                 C1 = split ("ACGTc", PAT, "")
                 C2 = split ("UcCAG", REP, "")
                 if (DEBUG) {print C1, C2; for (p in PAT) print PAT[p], REP[p]}
                }


/^[     ]*$/    {EMP++
                 next 
                }
/^>/            {$0=""
                }

                {print > "DNA.OUT" 
                 for (i=1; i<=C1; i++)  gsub (PAT[i], REP[i])
                 print > "RNA.OUT" 
                 gsub (/.../, "& ")
                 for (a in AACID)       {TMP = gsub (BASES[a], a)
                                         print NR, a, TMP
                                         ACNT[a] += TMP  
                                        }
                 print > "AminoAcids"
                }

END             {print "lines: ", NR-EMP
                 print "empty: ", EMP   
                 for (a in ACNT) print a, ACNT[a]
                }
' file

(The parentheses in Str1 were a heritage from a former version - not needed anymore)

Last edited by RudiC; 10-11-2015 at 06:18 AM..

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

10-11-2015

Registered User

68, 2

Join Date: Sep 2008

Last Activity: 22 January 2020, 2:31 AM EST

Posts: 68

Thanks Given: 15

Thanked 2 Times in 2 Posts

@RudiC:
Thank you so much for your help. I can't appreciate it more.
It will take me some time to understand the script as awk seems to be huge indeed.
Thank you so much again.

faizlo

View Public Profile for faizlo

Find all posts by faizlo

Shell Programming and Scripting

Fixing a shell script

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Getting started with fixing bugs for Linux

Discussion started by: sreyan32

2. Shell Programming and Scripting

Help fixing awk code

Discussion started by: SkySmart

3. UNIX for Advanced & Expert Users

Help with fixing screen position

Discussion started by: amit14august

4. AIX

Fixing security problem

Discussion started by: bobochacha29

5. Homework & Coursework Questions

Help fixing my database script

Discussion started by: gamernerd101

6. Solaris

help needed for fixing zfs bug

Discussion started by: SunSolars_admin

7. Shell Programming and Scripting

help fixing awk statement

Discussion started by: SkySmart

8. Shell Programming and Scripting

Fixing the width of a word

Discussion started by: davidtd

9. Linux

fixing with sed

Discussion started by: capri_drm