Hi, I would like to count the number of ALA occurences without having them to be repeated. In the script I have written now it has 40 repetitions of ALA but it has to be 8. ALA is chosen as one of the 20 values it can have when the script asks for the input of AAA, which for this example is chosen to be ALA.
The script I have:
Code:
#!/bin/bash
read -p "amino acid: " AAA
if [[ "ALA ARG ASN ASP CYS GLN GLY GLU HIS ILE \
LEU LYS MET PHE PRO SER THR TRP TYR VAL" =~ $AAA ]]
then
for i in HS_data_*.txt;
do
cat $i | grep -o -i $AAA | wc -l | awk '{print $1}'
# awk -F"[ ]" -v SRCH="$AAA" '$0 ~ SRCH && !OCC[$6]++ {CNT++ } END {print CNT+0}' $i
done
else
exit 1
fi
The input of one of HS_data_*.txt file is this:
Code:
ATOM 2351 N ALA B 10 13.856 10.830 -20.161 1.00 27.93 N
ATOM 2352 CA ALA B 10 13.893 11.449 -18.853 1.00 27.45 C
ATOM 2353 C ALA B 10 13.899 10.389 -17.757 1.00 29.99 C
ATOM 2354 O ALA B 10 14.653 10.538 -16.788 1.00 30.44 O
ATOM 2355 CB ALA B 10 12.686 12.323 -18.679 1.00 26.90 C
ATOM 2423 N ALA B 26 11.645 18.555 7.864 1.00 32.06 N
ATOM 2424 CA ALA B 26 11.938 19.955 7.579 1.00 35.40 C
ATOM 2425 C ALA B 26 13.080 20.496 8.431 1.00 37.27 C
ATOM 2426 O ALA B 26 13.742 21.478 8.087 1.00 39.36 O
ATOM 2427 CB ALA B 26 10.716 20.815 7.844 1.00 34.56 C
ATOM 2643 N ALA B 56 5.654 16.636 -19.419 1.00 27.14 N
ATOM 2644 CA ALA B 56 4.306 16.969 -19.795 1.00 27.77 C
ATOM 2645 C ALA B 56 4.139 18.435 -20.144 1.00 29.41 C
ATOM 2646 O ALA B 56 3.619 18.808 -21.204 1.00 30.63 O
ATOM 2647 CB ALA B 56 3.373 16.628 -18.664 1.00 28.99 C
ATOM 2887 N ALA B 88 -3.023 7.753 -19.907 1.00 20.84 N
ATOM 2888 CA ALA B 88 -3.018 7.206 -18.575 1.00 17.38 C
ATOM 2889 C ALA B 88 -1.627 6.647 -18.364 1.00 18.59 C
ATOM 2890 O ALA B 88 -1.086 5.920 -19.197 1.00 14.88 O
ATOM 2891 CB ALA B 88 -4.015 6.090 -18.472 1.00 18.60 C
ATOM 3187 N ALA B 130 -4.398 5.962 -24.620 1.00 22.40 N
ATOM 3188 CA ALA B 130 -3.225 5.141 -24.341 1.00 20.70 C
ATOM 3189 C ALA B 130 -3.170 4.921 -22.854 1.00 19.83 C
ATOM 3190 O ALA B 130 -3.725 5.716 -22.066 1.00 17.31 O
ATOM 3191 CB ALA B 130 -1.913 5.797 -24.700 1.00 22.82 C
ATOM 3516 N ALA B 177 0.656 -7.277 -20.930 1.00 19.87 N
ATOM 3517 CA ALA B 177 -0.367 -8.059 -20.250 1.00 19.38 C
ATOM 3518 C ALA B 177 -0.263 -9.541 -20.590 1.00 20.35 C
ATOM 3519 O ALA B 177 0.029 -9.962 -21.720 1.00 19.92 O
ATOM 3520 CB ALA B 177 -1.747 -7.592 -20.659 1.00 15.99 C
ATOM 3541 N ALA B 181 -4.381 -14.273 -14.076 1.00 16.90 N
ATOM 3542 CA ALA B 181 -4.649 -13.158 -13.194 1.00 16.14 C
ATOM 3543 C ALA B 181 -3.446 -12.893 -12.306 1.00 18.15 C
ATOM 3544 O ALA B 181 -2.692 -13.819 -12.014 1.00 20.60 O
ATOM 3545 CB ALA B 181 -5.817 -13.463 -12.335 1.00 15.23 C
ATOM 3626 N ALA B 194 8.308 -12.434 -17.665 1.00 29.11 N
ATOM 3627 CA ALA B 194 9.387 -12.364 -18.631 1.00 28.89 C
ATOM 3628 C ALA B 194 10.604 -11.653 -18.089 1.00 31.02 C
ATOM 3629 O ALA B 194 10.592 -11.177 -16.949 1.00 31.88 O
ATOM 3630 CB ALA B 194 8.920 -11.616 -19.844 1.00 25.66 C
As you can see from the input ALA is repeated 40 times but 5 times each, so a total of 8 times. The 4th column gives the ALA value, while 6th column shows how many times the same ALA is repeated. For example ALA at 10 (6th column) is repeated 5 times, ALA at 26 is repeated 5 times, ALA at 56 is also repeated 5 times, etc.
The output has to count ALA 8 times instead of 40 which is the current case with my script (bold: cat $i | grep -o -i $AAA | wc -l | awk '{print $1}').
Also I was trying to figure out how to count ALA 8 times using strictly the # awk -F"[ ]" -v SRCH="$AAA" '$0 ~ SRCH && !OCC[$6]++ {CNT++ } END {print CNT+0}' $i command (commented), however I am struggling to get the correct awk command.
Thus, I would like to ask of few questions:
1) How could I make the bolded command count ALA 8 times instead of 40?
2) How could I make strictly the awk command (commented) count ALA also 8 times instead of 5 as it does now which does not make sense as there are much more ALA words?
Hi,
For example lets consider i have word like this:cell
I have some text that is stored in table.
These are few sentences.
TRAP also regulates translation of trpE by promoting formation of an cell.
In addition initiation of pabA, trpP and ycbK by directly blocking cells.
I... (0 Replies)
hi I hace a string
"abc,def,ghi,abc,def ,ghi,abc,def,ghi,abc,def ,ghi,abc"
i replaced commas with spaces, now i want to calculate nof occurences of "abc" word.
thanks in advance
Satya (6 Replies)
I have enclosed the script. I am able to find the files that contain my search string but when I try to count the occurences within the file I get zero always. Any help on this.
#!/usr/bin/perl
my $find = $ARGV;
my $replace = $ARGV;
my $glob = $ARGV;
@filelist = <*$glob>;
# process each... (22 Replies)
I want to count the number of occurences of say "200" in a file but that file also contains various stuff including dtaes like 2007 or smtg like 200.1 so count i am getting by doing grep -c "word" file is wrong
Please help!!!!! (8 Replies)
Hi,
i am in need of an awk script to accomplish the following:
Input table looks like:
Student1 arts
Student2 science
Student3 arts
Student4 science
Student5 science
Student6 science
Student7 science
Student8 science
Student9 science
Student10 science
Student11 science... (8 Replies)
Hi,
Please help me in finding the number of occurences of the string.
Example: Apple, green, blue, Apple, Orange, green, blue are the strings can be even in the next line.
The o/p should look as:
Word Count
----- -----
Apple 2
green 2
Orange 1
blue 2
Thanks (2 Replies)
hi,
I have a text..and i need to find a pattern in the text and count to the no of times the pattern occured.
i have used grep command ..but the problem is , it shows the occurrences of the pattern but doesn't count no of times the pattern occuries. (5 Replies)
line number:status, market, keystation
1,SENT,EBS,1 : 1
2,DONE,REU,1 : 1
3,SENT,EBS,2 : 1
4,DONE,EBS,1 : 0
5,SENT,EBS,2 : 0
6,SENT,EBS,2 : 0
7,SENT,EBS,2 : 0
8,SENT,EBS,1 : 1
for each status, market combination I want to keep a tally of active orders. i.e if an order is SENT, then +1, if... (8 Replies)
I have some text files in a folder f1 with 10 columns. The first five columns of a file are shown below.
aab abb 263-455 263 455
aab abb 263-455 263 455
aab abb 263-455 263 455
bbb abb 26-455 26 455
bbb abb 26-455 26 455
bbb aka 264-266 264 266
bga bga 230-232 230 ... (10 Replies)