Duplication | awk | result


Login or Register to Reply

 
Thread Tools Search this Thread
# 15  
You culd try replacing

Code:
cat $i | grep -o -i $AAA | wc -l | awk '{print $1}'

immediately by
Code:
awk -F"[<>]" -v SRCH="$AAA" '$0 ~ SRCH && !OCC[$4]++ {CNT++ } END {print CNT+0}'   $i

INCLUDING the $i.
# 16  
Thanks. Can you also explain me what
Code:
-v SRCH="$AAA" '$0 ~ SRCH && !OCC[$4]++ {CNT++ } END {print CNT+0}'

does, especially !OCC and CNT? I cannot figure it out :/
# 17  
Code:
-v SRCH="$AAA"       # This is the default interface between shell and awk to convey e.g. shell variables ($AAA) into awk ones (SRCH)
'$0 ~ SRCH           #  if the entire line / record matches the contents of the SRCH variable
&&                   # AND
!OCC[$4]++           # if the array element OCC(urrences) indexed by field 4 (the r<...> value) is not yet defined (be aware it is post- incremented and thus defined from then on)  
{CNT++ }             # increment the CNT (count) variable
END {print CNT+0}'   # print the count - if undefined as no $AAA was found, the +0 turns it into an integer 0.

# 19  
Quote:
Originally Posted by RudiC
Code:
-v SRCH="$AAA"       # This is the default interface between shell and awk to convey e.g. shell variables ($AAA) into awk ones (SRCH)
'$0 ~ SRCH           #  if the entire line / record matches the contents of the SRCH variable
&&                   # AND
!OCC[$4]++           # if the array element OCC(urrences) indexed by field 4 (the r<...> value) is not yet defined (be aware it is post- incremented and thus defined from then on)  
{CNT++ }             # increment the CNT (count) variable
END {print CNT+0}'   # print the count - if undefined as no $AAA was found, the +0 turns it into an integer 0.

Dear RudiC,

Can you please explain this part
Code:
 !OCC[$4]++

a bit more? I dont understand when you mention that it is not yet defined, what you mean by that?

Sincerely,
Aurimas
# 20  
An array element, as is any variable in awk, is created on first reference as an empty entity. Empty, as does 0, represents the boolean value FALSE. So OCC[$4] on its first occurrence, is created, empty, and represents FALSE. The "!" inverts that value, so !OCC[$4] is TRUE, leading to the execution of this branch on its first occurrence. Then ++ does a post-increment on the value so it will be FALSE from then on.
# 21  
Thank you. Also I have another question. Basically for the input file (calculate_contacts_HS_*.pdb.txt) I want that all lines that have ALA in first or second column sum the third column numerical values of those lines. In the input file given below it would have to be 3 lines that contain ALA in 2nd column (but it can also be in 1st) and 3rd column summed would be: 1.32859+6.90889+0.0962809=8.333761. The example of input file is this:

Code:
c<B>r<3>a<2300>R<LEU>A<CD1> c<B>r<56>a<2643>R<ALA>A<N> 1.32859 4.26827 . .
c<B>r<3>a<2300>R<LEU>A<CD1> c<B>r<56>a<2644>R<ALA>A<CA> 6.90889 4.0211 . .
c<B>r<3>a<2300>R<LEU>A<CD1> c<B>r<56>a<2646>R<ALA>A<O> 0.0962809 4.95892 . .

and the script is this:

Code:
#!/bin/bash
read -p "amino acid: " AAA
if [[ "ALA ARG ASN ASP CYS GLN GLY GLU HIS ILE \
	   LEU LYS MET PHE PRO SER THR TRP TYR VAL" =~ $AAA ]]
then 
	for i in calculate_contacts_HS_*.pdb.txt; 
		do
			awk '{ if ($0 ~ /$AAA/) sum += $3} END {print sum}' $i
        done
else
	exit 1
fi

but for this script I don't get any value. When I run the script through command the output I get is this:
Code:

Because in my script AAA is assigned to one of the 20 values (ALA, ASN, ARG, ASP, CYS, GLU, GLN, LEU, ILE, LYS, MET, etc.) therefore $AAA can be assigned to different values, but for this case let's assume that $AAA = ALA.

I will look forward to your responses.

Sincerely,
Aurimas

Last edited by Aurimas; 4 Days Ago at 07:40 AM..
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
Line duplication with awk?!
Glorp
So while this seemed totally trivial it turned out to be much more difficult than I had thought. I have a file with 3 rows, and I "just" want to add each field n number of times. E.g. > cat file.txt 0.5 -0.1 0.6 for n=3 into: cat newfile.txt 0.5 0.5 0.5 -0.1 -0.1 -0.1 0.6 0.6 0.6 I...... UNIX for Beginners Questions & Answers
4
UNIX for Beginners Questions & Answers
De-Duplication Problem
saeedha
Hi all, I download and install lessfs for deduplication, I copy files in /SharedFiles directory and lessfs work right and not store again copy files, but, when i delete all files in /SharedFiles , not return free space to total space, files not show in /SharedFiles , but not copy new files in...... Linux
3
Linux
How to avoid duplication within 2 files?
balan_mca
Hi all, Actually 2 files are there - file1, file2. file1 contains ---> london mosco america russia mosco file2 contains --> europe india japan mosco england london Question is I want to print all the city names without duplication cities in those...... Shell Programming and Scripting
10
Shell Programming and Scripting
File Duplication Script?
futurestar
I have a file, let's say 1.jpg, and I have a text file that contains a list of filenames I would like to duplicate 1.jpg as (i.e., 2.jpg, 3.jpg, 4.jpg, etc.). The filenames that I want to create are all on separate lines, one per line. I'm sure there's a simple solution, but I'm not claiming to...... Shell Programming and Scripting
7
Shell Programming and Scripting
File Duplication
raguramtgr
hi all how to find the file duplication in a windows 2000 server as usual replies are sincerely appreciated. thanks raguram R... Windows & DOS: Issues & Discussions
3
Windows & DOS: Issues & Discussions

Featured Tech Videos