Create a single bash script that does the following:
a. Print out the number of occurrences for each motif that is found in the bacterial genome and output to a file called motif_count.txt
b. Create a fasta file for each motif (so 3 in total) which contains all of the genes and their corresponding sequences that have that motif. Each file should be named after the motif (ie ATG.txt) and outputted to a new directory called motifs
motif file is a txt file with these motifs : ATG, GGGGG, ATTTT
the bacterial genome file is fasta file with the following lines
Moderator's Comments:
edit by bakunin: please use CODE-tags to let data, code and terminal output stand out. Thank you.
Here we go again analysing FASTA-files. I wonder if we are going to be mentioned in the dozens of biological research papers we have helped to create over time.
Quote:
Originally Posted by dre
Create a single bash script that does the following:
Sorry, but this is not the way that works.
First: you come across like a teacher posing the homework for us. Actually we are all professionals here and have done our homework meticulously which is why we don't have to do more homework any more. In case it was homework given to you by your teacher: there is a special forum for this with special rules in place. Please re-create the thread there and provide the necessary information.
Second, in case this is not homework but actually your work: we do not insist on a lot of social conventions here (after all, we are sysadmins - being somewhat autistic is part of the requirement for this job), but still a well-placed "please" here and there, along with some of the common niceties called "good manners" oils the social machinery. If you ever wonder who writes the answers to your questions: it is not some clever machine and a fat server - it is actually living, breathing persons.
Third and foremost: we are here to help you help yourself. If you want a shell-script - WRITE IT! Attempts, however failing, will count. If you tried and it didn't work, post what you did and the error you got trying to run it. We will, regardless of how long it will take, explain to you what was wrong and how to do it better (actually quite like the way i explain to you right now why you need to change your problem statement) until you understand. We will also point out ways to better solve the problem, different tools, suggest sources to find valuable information and so on and so on. But we will NOT write your code for you. We are a help forum, not your unpaid programming staff.
I hope this helps.
bakunin
These 2 Users Gave Thanks to bakunin For This Post:
2. Please understand that forum members may not have working knowledge of biology. So when you say, "Print out the number of occurrences for each motif that is found in the bacterial genome", this makes no sense to me (and may not to a lot others too)
3. A good post requesting assistance should in my humble opinion have the following information:
Clearly state the problem without any ambiguity. Break down your problem into the smallest part where you need help. Posting a question 100 lines long will yield no result. People will yawn and go back to doing their day job.
Show your attempt at solving the problem. This will help members focus on the exact place where you need help; and not have to dig about and assume where you might be facing the issue.
One line about your OS, your shell, preferred scripting language..
Sorry didn't mean to offend anyone and thanks for the advice.&lt;br /&gt;<br /><br />
&lt;br /&gt;<br /><br />
&lt;br /&gt;<br /><br />
Quote:
Originally Posted by bakunin
Here we go again analysing FASTA-files. I wonder if we are going to be mentioned in the dozens of biological research papers we have helped to create over time.&lt;br /&gt;<br /><br />
&lt;br /&gt;<br /><br />
&lt;br /&gt;<br /><br />
&lt;br /&gt;<br /><br />
Sorry, but this is not the way that works.&lt;br /&gt;<br /><br />
&lt;br /&gt;<br /><br />
First: you come across like a teacher posing the homework for us. Actually we are all professionals here and have done our homework meticulously which is why we don't have to do more homework any more. In case it was homework given to &lt;i&gt;you&lt;/i&gt; by &lt;i&gt;your&lt;/i&gt; teacher: there is a special forum for this with special rules in place. Please re-create the thread there and provide the necessary information.&lt;br /&gt;<br /><br />
&lt;br /&gt;<br /><br />
Second, in case this is not homework but actually your work: we do not insist on a lot of social conventions here (after all, we are sysadmins - being somewhat autistic is part of the requirement for this job), but still a well-placed &amp;quot;please&amp;quot; here and there, along with some of the common niceties called &amp;quot;good manners&amp;quot; oils the social machinery. If you ever wonder who writes the answers to your questions: it is not some clever machine and a fat server - it is actually living, breathing persons.&lt;br /&gt;<br /><br />
&lt;br /&gt;<br /><br />
Third and foremost: we are here to help you help yourself. If you want a shell-script - WRITE IT! Attempts, however failing, will count. If you tried and it didn't work, post what you did and the error you got trying to run it. We will, regardless of how long it will take, explain to you what was wrong and how to do it better (actually quite like the way i explain to you right now why you need to change your problem statement) until you understand. We will also point out ways to better solve the problem, different tools, suggest sources to find valuable information and so on and so on. But we will NOT write your code for you. We are a help forum, not your unpaid programming staff.&lt;br /&gt;<br /><br />
&lt;br /&gt;<br /><br />
I hope this helps.&lt;br /&gt;<br /><br />
&lt;br /&gt;<br /><br />
bakunin
&lt;br /&gt;<br /><br />
&lt;br /&gt;<br /><br />
&lt;span style=&quot;color:#738fbf;&quot;&gt;&lt;span style=&quot;font-size:1;&quot;&gt;--- Post updated at 08:32 PM ---&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;<br /><br />
&lt;br /&gt;<br /><br />
<br />
[ QUOTE=dre;303037350]Create a single bash script that does the following:&lt;br /&gt;<br /><br />
a. Print out the number of occurrences for each motif that is found in the bacterial genome and output to a file called motif_count.txt&lt;br /&gt;<br /><br />
b. Create a fasta file for each motif (so 3 in total) which contains all of the genes and their corresponding sequences that have that motif. Each file should be named after the motif (ie ATG.txt) and outputted to a new directory called motifs&lt;br /&gt;<br /><br />
&lt;br /&gt;<br /><br />
motif file is a txt file with these motifs : ATG, GGGGG, ATTTT&lt;br /&gt;<br /><br />
the bacterial genome file is fasta file with the following lines&lt;br /&gt;<br /><br />
Moderator's Comments:
edit by bakunin: please use CODE-tags to let data, code and terminal output stand out. Thank you.
[/QUOTE]<br />
<br />
<span style="color:#738fbf;"><span style="font-size:1;">--- Post updated at 08:54 PM ---</span></span><br />
<br />
This is my attempt<br /> #!/bin/bash<br />
<br />
# create a new directory motifs if it doesn't exist<br />
mkdir -p motifs<br />
cd motifs<br />
touch motif_count.txt<br />
> motif_count.txt<br />
for motif in ATG GGGGG ATTTT<br />
do<br />
echo $motif >> motif_count.txt<br />
grep $motif -c r_bifella.fasta >> motif_count.txt<br />
touch $motif.fasta<br />
grep $motif -B 1 r_bifella.fasta > $motif.fasta<br />
done<br />
<br /> ]<br />
<br />
<br />
Quote:
Originally Posted by dre
Create a single bash script that does the following:<br />
a. Print out the number of occurrences for each motif that is found in the bacterial genome and output to a file called motif_count.txt<br />
b. Create a fasta file for each motif (so 3 in total) which contains all of the genes and their corresponding sequences that have that motif. Each file should be named after the motif (ie ATG.txt) and outputted to a new directory called motifs<br />
<br />
motif file is a txt file with these motifs : ATG, GGGGG, ATTTT<br />
the bacterial genome file is fasta file with the following lines<br />
Moderator's Comments:
edit by bakunin: please use CODE-tags to let data, code and terminal output stand out. Thank you.
Sorry didn't mean to offend anyone and thanks for the advice.
Fair enough, no problem. You are welcome.
Quote:
Originally Posted by dre
This is my attempt
So, that is a start. Let us go over it. Notice that i will address some general points about script development which may not help you for this script but in the long run.
First, you should make it a habit to declare the variables you use. Not because this is always necessary (unlike in C or other languages you can "make variables up" on the fly simply by using it) but because this is a good start to think over the algorithm you are going to employ, what information every part of it needs and so on. Furthermore you get some documentation for free.
So, let us start: you want a directory to put your results there and you want to create it. Question: what should happen if the directory already exists, i.e. from a former run of the script? Use it again? Create a new one? Overwrite the files there? Number the files so that results from diferent runs can exist alongside?
Second: this is a typo:
Don't worry - typos happen to all of us. But wouldn't it be nice to avoid such typos? Actually you can, by using a variable instead of a fixed name. And, by the way, is it really a good idea to put the directory in the current directory? Wouldn't it be better to create a directory in your HOME, regardless of where you currently are when you call the script? So, how about doing it like this:
You see, now you can use "$targetdir" in your script and if you want to change the location you will have to do it only in one place - and it is easy to understand where that is because of the comment! Well written scripts are easy to read and easy to maintain.
Another point: don't use "cd" in a script! Use absolute pathes so that the script works regardless of where you stand. or from where you call it always in the same way.
you actually do not need the first line because the redirection will create the (empty) file if it doesn't exist. Adding the path (instead of the "cd") we get:
Let us pause here, it is getting quite late for me. More on the script tomorrow, but you might want to go over the question above and us what you wnat the script to do. Further, you may want to explain what your script does not do or does wrongly. Finally, a bit more information about your environment: OS, version, .... - might also help because some systems have special provisios others do lack.
a. Print out the number of occurrences for each motif that is found in the bacterial genome and output to a file called motif_count.txt
I will continue with this requirement: if i understand you correctly you want to count all the occurrences of the sequence in each gene, like this (the numbers are made up):
for all the genes in your your bacterial genome file. Is that correct?
If so, here is a shell algorithm which will do that:
read two lines (?) from the genome file, the first one holds the name of the gene:
the second one holds the gene sequence itself:
we will try to "subtract" (that is: cut out from the string) one occurrence of the pattern we look for from the gene: if it changes we have found such a pattern - we increase the counter and repeat that. Once the string remains unchanged we could find no more occurrence of the pattern, so we end this and output the final result. You need "parameter expansion" for this and i suggest you read it up because this is a versatile tool to your toolkit. I put the code in form of a function which you can call:
Notice that to read the genome file correctly we need a few additional bits of information: 1) is the name of the gene always on a line starting with a ">"? 2) the genes content is in one line in your sample. Is that always so or could that be broken into several lines?
I assume for the moment that 1) is the case and the answer to 2) is that it always on one line. Note that the script will break if this is not the case but it could be easily adapted.
Now let us include that into the script start i showed you already:
More to come, but you should answer the questions i asked.
I will continue with this requirement: if i understand you correctly you want to count all the occurrences of the sequence in each gene, like this (the numbers are made up):
for all the genes in your your bacterial genome file. Is that correct?
If so, here is a shell algorithm which will do that:
read two lines (?) from the genome file, the first one holds the name of the gene:
the second one holds the gene sequence itself:
we will try to "subtract" (that is: cut out from the string) one occurrence of the pattern we look for from the gene: if it changes we have found such a pattern - we increase the counter and repeat that. Once the string remains unchanged we could find no more occurrence of the pattern, so we end this and output the final result. You need "parameter expansion" for this and i suggest you read it up because this is a versatile tool to your toolkit. I put the code in form of a function which you can call:
Notice that to read the genome file correctly we need a few additional bits of information: 1) is the name of the gene always on a line starting with a ">"? 2) the genes content is in one line in your sample. Is that always so or could that be broken into several lines?
I assume for the moment that 1) is the case and the answer to 2) is that it always on one line. Note that the script will break if this is not the case but it could be easily adapted.
Now let us include that into the script start i showed you already:
More to come, but you should answer the questions i asked.
I hope this helps.
bakunin
I appreciate very much the guidance and assistance you are giving me on the code. The following are the answers to your questions:
a.
Quote:
So, let us start: you want a directory to put your results there and you want to create it. Question: what should happen if the directory already exists, i.e. from a former run of the script? Use it again?
Create a new one? Overwrite the files there? Number the files so that results from different runs can exist alongside?
If the directory exists I would like the files present to be overwritten.
b.
Quote:
Finally, a bit more information about your environment: OS, version, .... - might also help because some systems have special provisions others do lack.
The current OS I am using is Ubuntu 16.04 LTS
c.
Quote:
1) is the name of the gene always on a line starting with a ">"? 2) the genes content is in one line in your sample. Is that always so or could that be broken into several lines?
The name of the gene is always on a line with a ">" and the gene contents can be broken into several lines.
Hi everyone
I need some help
I want to create an script which does some processing
it takes the two arguments 201901010000 and 201901020200 - so YYYMMDDHHMM
I want to split processing into hours from start until end,
I dont get why this works but when I add to a future variable... (1 Reply)
Dear all,
Please help with the following.
I have a file, let's call it data.txt, that has 3 columns and approx 700,000 lines, and looks like this:
rs1234 A C
rs1236 T G
rs2345 G T
Please use code tags as required by forum rules!
I have a second file, called reference.txt,... (1 Reply)
Hi Everybody,
I'm a newbie to shell scripting, and I'd appreciate some help. I have a bunch of .txt files that have some unwanted content. I want to remove lines 1-3 and 1028-1098.
#!/bin/bash
for '*.txt' in <path to folder>
do
sed '1,3 d' "$f";
sed '1028,1098 d' "$f";
done
I... (2 Replies)
I have the following while loop that I put in a script, demo.sh:
while read rna; do
aawork=$(echo "${rna}" | sed -n -e 's/\(...\)\1 /gp' | sed -f rna.sed)
echo "$aawork" | sed 's/ //g'
echo "$aawork" | tr ' ' '\012' | sort | sed '/^$/d' | uniq -c | sed 's/*\(*\) \(.*\)/\2: \... (3 Replies)
Hi Folks,
I have a loop that goes through an array and the output is funky.
sample:
array=( 19.239.211.30 )
for i in "${array}"
do
echo $i
iperf -c $i -P 10 -x CSV -f b -t 50 | awk 'END{print '$i',$6}' >> $file
done
Output:
19.239.211.30
19.2390.2110.3 8746886
seems that when... (2 Replies)
Having issues with an expect script. I've been scripting bash, python, etc... for a couple years now, but just started to try and use Expect. Trying to create a script that takes in some arguments, and then for now, just runs a pwd command(for testing, final will be command I pass).
Here is... (0 Replies)
Hi,
I am working on bash script after a long time. I am getting error near done statement while running a for loop snippet. The error says "Syntax error near unexpcted token 'done'"
please suggest what could be wrong. here is the snippet
elements=${#option_arr} //an array of values... (1 Reply)
I'm trying to search all .odt files in a directory for a string in the text of the file.
I've found a bash script that works, except that it can't handle whitespace in the filenames.
#!/bin/bash
if ; then
echo "Usage: searchodt searchterm"
exit 1
fi
for file in $(ls *.odt); do
... (4 Replies)
SEND_MESSAGE=test
echo $SEND_MESSAGE
if
then
echo `date` > update_dt_ccaps.lst
echo "The file transfer failed" >> update_dt_ccaps.lst
SEND_MESSAGE=false
fi
The above code is showing error in bash shell as :
./test: line 5: [: test: integer expression expected
... (2 Replies)
I have a very basic bash shell script, which has many "while... done; for .... done" loop clauses, like the following
~~
#!/bin/bash
while blablalba; do
....
done < /tmp/file
for line in `cat blablabla`; do grep $line /tmp/raw ; done > /tmp/1;
while blablalba2; do
....
done <... (2 Replies)