Show the input to awk, i.e. the output of voronota get-balls-from-atoms-file --annotated for a certain .pdb file. Does it have the structure you gave in your input data samples?
The output of the output of voronota get-balls-from-atoms-file --annotated for a certain .pdb file or input to awk looks like this (a sample, not full dataset) by your given code script:
but I want the output to awk through grep -o -i "$AAA" | wc -l | be a string that is later converted to integer, but if it is possible to avoid it - then that would be great.
I need to extract count of amino acids (ARG, SER, ASP in this case) but maybe it is possible from the script you shown before?
Hope this is what you asked
Last edited by Scrutinizer; 06-06-2019 at 12:38 PM..
Reason: quote tags -> code tags and icode tags
Hmmm - I don't see any "ALA" in your recent sample data file - what be the desired result from it? I see one each of the 4 / ARG, 5 / SER, and 6 / ASP combinations...
Please be aware that everybody in here can only see (and work with) what you explicitly (!) write / post, don't assume ANY knowledge of the topic - genetics / biology assumed, in this case - that allows inference of non- given background info from allusions in the text. Describe the (data) problem as profoundly as possible, backed by representative, consistent, and as broad as possible input and desired output samples.
Don't "change horses" between posts - does your sample in post #9 lead to the "false" output in post #7? Or, where should the
Hmmm - I don't see any "ALA" in your recent sample data file - what be the desired result from it? I see one each of the 4 / ARG, 5 / SER, and 6 / ASP combinations...
As I told in the posts before the AAA value can be any of the 20 amino acids (ALA, ARG, ASN, ASP, CYS, GLN, GLY, GLU, HIS, ILE, LEU, LYS, MET, PHE, PRO, SER, THR, TRP, TYR or VAL). In the first example it was ALA which I posted, now I posted with ARG, SER and ASP, but basically the most important value is AAA which can get any value of the 20 amino acids and for that I need to calculate the count for that specific amino acid (it can be chosen as ALA, ARG, ASN or any other of the 20 and for that I need to calculate the count of that amino acid without duplication). To make it clearer from the recent example I need to get only 1 ARG from the same numbering as r(4) that is given 11 times. For SER and ARG it also has to be 1 each even though they are repeated r<5>6 times and r<6>8 times respectively. However in the data file these specific AAA occurences are repeated in the data set with different integer r<x> values where x is from 1 to 1000.
Understood. I'll try to explain my situation in as much details as possible. The script I have now is:
It is run through terminal at MAC OS Mojave. When I write ./BSA (the name of the script) in terminal it asks me to enter the amino acid (that is a capitalised three letters code such as ALA, ARG, ASN, ASP, CYS, GLN, GLY, GLU, HIS, ILE, LEU, LYS, MET, PHE, PRO, SER, THR, TRP, TYR or VAL) as an input: amino acid: It takes the value of AAA in the script
Let's for this case choose to enter ALA that becomes $AAA in the script, so it would in terminal be like this: amino acid: ALA
Then I press enter and get the output to be:
The output from the script above is how many ALA values occur per one .pdb complex. In total there are 28 .pdb files/complexes. That's why we have 28 lines of the output. However that is not what I want for ALA values per complex. The output I expect should be something like this:
Here ALA values are calculated without duplication. To understand better how to achieve this let's look at the shortened output example of the first .pdb file (complex) using command voronota get-balls-from-atoms-file --annotated that includes 40 ALA values:
As you can see from the voronota output example (in quotes) there are 40 lines with ALA name in it. Thus the output I am getting now from the 1st script as shown above is 40. However the problem with this is that there are only 8 specific ALA values. What I mean by that is that there are 5 times of ALA value repeated and this repetition is shown as r<10> 5 times, same goes for ALA at r<26>, ALA for r<56> and so on (look quote) but I want that those 5 times of r<10> for ALA, r<26> for ALA and so on would be counted as 1 ALA: 1 ALA for 5 times of r<10>, 1 ALA for 5 times of r<26>, etc. and then all those ALA be added together to give 8 ALA values for the first .pdb file instead of 40. Also please note that 1 specific ALA value here comes from 5 times of r<x> where x is a number from 1 to 1000. However it might be that 1 ALA value can come from 2, 3, 4, 6 ,7 , 8 or more times of r<x> that are associated with ALA in the line. Above it is 5 ALA per 5 lines with r<x>, but it can be 8 ALA per 8 r<x> lines or other integer values. However I need to get 1 ALA per 5 times of r<x>, 8 times of r<x> or 1 ALA per less or more of r<x>
The script was then changed to:
However the output I get now is this:
The problem here is that all specific .pdb files analysed count all occurrences of ALA as 1 per complex (HS_some_complex.pdb) which means 1 ALA for all 40 times of r<10,26,56,88,etc> in the first .pdb file and so on for the other 27 .pdb complexes. That's not what I need. I want ALA to be calculated as occurring 8 times for the first complex as explained above and not 40 which I am getting from my first script. Thus the question is how is that possible? Should I change grep command or awk or both?
I hope now it is clearer but do let me know if you are still not understanding something
Last edited by Scrutinizer; 06-06-2019 at 12:40 PM..
Reason: quote tags -> code tags
Hmmm, I think I found a logical error in my proposal: adding the $i after the awk script made it immediately read the respective .pdb file, not voronota's output from that file. Remove the $i:
and report back.
Still, I'm convinced there will be an apter / better solution to the overall problem dealing with ALL .pdb files, and ALL amino acids in one go if needed...
And, please use CODE, not ICODE, tags for data as well. You may want to edit your former post.
Thank you it works and I edited my previous post. I also have a similar question for this code then:
.pdb.txt file looks like this:
You have 7 lines of r<x> where x is a number from 1 to 100 and 7 occurrences of ALA. However, how I could change grep to awk in the code so it would count 4 ALA instead of 7?
Last edited by Scrutinizer; 06-06-2019 at 12:23 PM..
Reason: quote tags -> code tags
Hello,
Giving those commands:
cat > myfile
1
2
3
^D
cat myfile | awk '{ s=s+$1 ; print s}'
The output is:
1
3
6
It seems like this command iterates each time on a different row so $1 is the first field of each row.. But what caused it to refer to each row ?.
What I mean... (3 Replies)
So while this seemed totally trivial it turned out to be much more difficult than I had thought.
I have a file with 3 rows, and I "just" want to add each field n number of times. E.g.
> cat file.txt
0.5
-0.1
0.6
for n=3 into:
cat newfile.txt
0.5 0.5 0.5
-0.1 -0.1 -0.1
0.6 0.6 0.6
I... (4 Replies)
Hi all,
I download and install lessfs for deduplication,
I copy files in /SharedFiles directory and lessfs work right and not store again copy files,
but, when i delete all files in /SharedFiles , not return free space to total space,
files not show in /SharedFiles , but not copy new files in... (3 Replies)
Hey,
I am making a Facebook like Page system as my first project, So far it's been bate in mind I did it from my 3DS at the same time as my PC gets replaced, So far it's turned out great. Now I am on to creation the blocking system I need to get the code to say If the user already likes the... (0 Replies)
i have many questions concerning the FD.
it was stated that "to redirect Error to output std, you have to write the following code"
# ls -alt FileNotThere File > logfile 2>&1
# cat logfile
ls: cannot access FileNotThere: No such file or directory
-rw-r--r-- 1 root root 0 2010-02-26... (9 Replies)
Hi all,
Actually 2 files are there - file1, file2.
file1 contains --->
london
mosco
america
russia
mosco
file2 contains -->
europe
india
japan
mosco
england
london
Question is I want to print all the city names without duplication cities in those... (10 Replies)
I have a file, let's say 1.jpg, and I have a text file that contains a list of filenames I would like to duplicate 1.jpg as (i.e., 2.jpg, 3.jpg, 4.jpg, etc.). The filenames that I want to create are all on separate lines, one per line.
I'm sure there's a simple solution, but I'm not claiming to... (7 Replies)
Hi, I'm stuck in an awkward situation please help :)
I have two identical Seagate 80GB harddrives.
My objective is a bit strange.
1.I want to have a cloned disk as bootable backup
2.When booting using the master drive, I also want to mount the cloned backup disk so I can do incremental... (6 Replies)
I was wondering if it was possible to tell awk to print the output of a command in the print.
.... | awk '{print $0}'
I would like it to print the date right before $0, so something like (this doesn't work though)
.... | awk '{print date $0}' (4 Replies)