Sponsored Content
Full Discussion: Help with a bash loop script
Top Forums UNIX for Beginners Questions & Answers Help with a bash loop script Post 303037497 by dre on Saturday 3rd of August 2019 07:33:38 AM
Old 08-03-2019
Quote:
Originally Posted by bakunin
I will continue with this requirement: if i understand you correctly you want to count all the occurrences of the sequence in each gene, like this (the numbers are made up):

Code:
gene1: 37
gene2: 21
...

for all the genes in your your bacterial genome file. Is that correct?

If so, here is a shell algorithm which will do that:

read two lines (?) from the genome file, the first one holds the name of the gene:

Code:
> gene1

the second one holds the gene sequence itself:

Code:
GAAACTCGGTGTTGGCTTACCGGTCATTCCGAGCGTC[....]

we will try to "subtract" (that is: cut out from the string) one occurrence of the pattern we look for from the gene: if it changes we have found such a pattern - we increase the counter and repeat that. Once the string remains unchanged we could find no more occurrence of the pattern, so we end this and output the final result. You need "parameter expansion" for this and i suggest you read it up because this is a versatile tool to your toolkit. I put the code in form of a function which you can call:

Code:
pGetNumber ()
{
local chGene="$1"                         # content of the gene, first parameter
local chMotif="$2"                         # pattern we look for, second parameter
local iCnt=0                                   # counter

while [ "${chGene/${chMotif}/}" != "${chGene}" ] ; do
     (( iCnt++ ))
     chGene="${chGene/${chMotif}/}"
done

printf "%u\n" $iCnt

return 0
}

Notice that to read the genome file correctly we need a few additional bits of information: 1) is the name of the gene always on a line starting with a ">"? 2) the genes content is in one line in your sample. Is that always so or could that be broken into several lines?

I assume for the moment that 1) is the case and the answer to 2) is that it always on one line. Note that the script will break if this is not the case but it could be easily adapted.

Now let us include that into the script start i showed you already:

Code:
#!/bin/bash

pGetNumber ()
{
local chGene="$1"                         # content of the gene, first parameter
local chMotif="$2"                         # pattern we look for, second parameter
local iCnt=0                                   # counter

while [ "${chGene/${chMotif}/}" != "${chGene}" ] ; do
     (( iCnt++ ))
     chGene="${chGene/${chMotif}/}"
done

printf "%u\n" $iCnt

return 0
}

# ------------------------ main ()
declare targetdir="/home/youruser/mywork/motifs"         # directory for motifs
declare countfile="motif_count.txt"                      # count file
declare chMotif=""                                       # buffer
declare chGene=""                                       # buffer
declare chGeneName=""                          # buffer
declare chAllmotifs="ATG GGGGG ATTTT"                    # list of motifs to process
declare fInput="/path/to/your/genome.file"              # the input file with your genome

mkdir -p "$targetdir"
> "${targetdir}/${countfile}"

while read chGeneName ; do
     read chGene
     chGeneName="${chGeneName#> }"                     # cut off the "> " from the name
     for chMotif in $chAllmotifs ; do
          printf "%20s: %u\n" "$chGeneName" $(pGetNumber "$chGene" "$chMotif") >> "${targetdir}/${countfile}.${chMotif}"
     done
done < "$fInput"

exit 0

More to come, but you should answer the questions i asked.

I hope this helps.

bakunin

I appreciate very much the guidance and assistance you are giving me on the code. The following are the answers to your questions:
a.
Quote:
So, let us start: you want a directory to put your results there and you want to create it. Question: what should happen if the directory already exists, i.e. from a former run of the script? Use it again?
Create a new one? Overwrite the files there? Number the files so that results from different runs can exist alongside?
If the directory exists I would like the files present to be overwritten.
b.
Quote:
Finally, a bit more information about your environment: OS, version, .... - might also help because some systems have special provisions others do lack.
The current OS I am using is Ubuntu 16.04 LTS
c.
Quote:
1) is the name of the gene always on a line starting with a ">"? 2) the genes content is in one line in your sample. Is that always so or could that be broken into several lines?
The name of the gene is always on a line with a ">" and the gene contents can be broken into several lines.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

loop does not execute in bash script?

I have a very basic bash shell script, which has many "while... done; for .... done" loop clauses, like the following ~~ #!/bin/bash while blablalba; do .... done < /tmp/file for line in `cat blablabla`; do grep $line /tmp/raw ; done > /tmp/1; while blablalba2; do .... done <... (2 Replies)
Discussion started by: fedora
2 Replies

2. Shell Programming and Scripting

error in bash script 'if' loop

SEND_MESSAGE=test echo $SEND_MESSAGE if then echo `date` > update_dt_ccaps.lst echo "The file transfer failed" >> update_dt_ccaps.lst SEND_MESSAGE=false fi The above code is showing error in bash shell as : ./test: line 5: [: test: integer expression expected ... (2 Replies)
Discussion started by: DILEEP410
2 Replies

3. Shell Programming and Scripting

Whitespace in filenames in for loop in bash script

I'm trying to search all .odt files in a directory for a string in the text of the file. I've found a bash script that works, except that it can't handle whitespace in the filenames. #!/bin/bash if ; then echo "Usage: searchodt searchterm" exit 1 fi for file in $(ls *.odt); do ... (4 Replies)
Discussion started by: triplemaya
4 Replies

4. Shell Programming and Scripting

Getting error on for loop - bash script

Hi, I am working on bash script after a long time. I am getting error near done statement while running a for loop snippet. The error says "Syntax error near unexpcted token 'done'" please suggest what could be wrong. here is the snippet elements=${#option_arr} //an array of values... (1 Reply)
Discussion started by: arundhati_s
1 Replies

5. Shell Programming and Scripting

Expect script called in loop from Bash Script

Having issues with an expect script. I've been scripting bash, python, etc... for a couple years now, but just started to try and use Expect. Trying to create a script that takes in some arguments, and then for now, just runs a pwd command(for testing, final will be command I pass). Here is... (0 Replies)
Discussion started by: cbo0485
0 Replies

6. Shell Programming and Scripting

Bash script - loop question

Hi Folks, I have a loop that goes through an array and the output is funky. sample: array=( 19.239.211.30 ) for i in "${array}" do echo $i iperf -c $i -P 10 -x CSV -f b -t 50 | awk 'END{print '$i',$6}' >> $file done Output: 19.239.211.30 19.2390.2110.3 8746886 seems that when... (2 Replies)
Discussion started by: nitrohuffer2001
2 Replies

7. Shell Programming and Scripting

While loop with input in a bash script

I have the following while loop that I put in a script, demo.sh: while read rna; do aawork=$(echo "${rna}" | sed -n -e 's/\(...\)\1 /gp' | sed -f rna.sed) echo "$aawork" | sed 's/ //g' echo "$aawork" | tr ' ' '\012' | sort | sed '/^$/d' | uniq -c | sed 's/*\(*\) \(.*\)/\2: \... (3 Replies)
Discussion started by: faizlo
3 Replies

8. Shell Programming and Scripting

Loop through multiple files in bash script

Hi Everybody, I'm a newbie to shell scripting, and I'd appreciate some help. I have a bunch of .txt files that have some unwanted content. I want to remove lines 1-3 and 1028-1098. #!/bin/bash for '*.txt' in <path to folder> do sed '1,3 d' "$f"; sed '1028,1098 d' "$f"; done I... (2 Replies)
Discussion started by: BabyNuke
2 Replies

9. Shell Programming and Scripting

How to use grep in a loop using a bash script?

Dear all, Please help with the following. I have a file, let's call it data.txt, that has 3 columns and approx 700,000 lines, and looks like this: rs1234 A C rs1236 T G rs2345 G T Please use code tags as required by forum rules! I have a second file, called reference.txt,... (1 Reply)
Discussion started by: aberg
1 Replies

10. UNIX for Beginners Questions & Answers

Help with date in bash script for loop from YYYYMMDDHHMM

Hi everyone I need some help I want to create an script which does some processing it takes the two arguments 201901010000 and 201901020200 - so YYYMMDDHHMM I want to split processing into hours from start until end, I dont get why this works but when I add to a future variable... (1 Reply)
Discussion started by: kl1ngac1k
1 Replies
All times are GMT -4. The time now is 06:11 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy