Duplication | awk | result


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Duplication | awk | result
# 15  
Old 06-03-2019
You culd try replacing

Code:
cat $i | grep -o -i $AAA | wc -l | awk '{print $1}'

immediately by
Code:
awk -F"[<>]" -v SRCH="$AAA" '$0 ~ SRCH && !OCC[$4]++ {CNT++ } END {print CNT+0}'   $i

INCLUDING the $i.
# 16  
Old 06-03-2019
Thanks. Can you also explain me what
Code:
-v SRCH="$AAA" '$0 ~ SRCH && !OCC[$4]++ {CNT++ } END {print CNT+0}'

does, especially !OCC and CNT? I cannot figure it out :/
# 17  
Old 06-03-2019
Code:
-v SRCH="$AAA"       # This is the default interface between shell and awk to convey e.g. shell variables ($AAA) into awk ones (SRCH)
'$0 ~ SRCH           #  if the entire line / record matches the contents of the SRCH variable
&&                   # AND
!OCC[$4]++           # if the array element OCC(urrences) indexed by field 4 (the r<...> value) is not yet defined (be aware it is post- incremented and thus defined from then on)  
{CNT++ }             # increment the CNT (count) variable
END {print CNT+0}'   # print the count - if undefined as no $AAA was found, the +0 turns it into an integer 0.

# 18  
Old 06-06-2019
Thanks
# 19  
Old 06-12-2019
Quote:
Originally Posted by RudiC
Code:
-v SRCH="$AAA"       # This is the default interface between shell and awk to convey e.g. shell variables ($AAA) into awk ones (SRCH)
'$0 ~ SRCH           #  if the entire line / record matches the contents of the SRCH variable
&&                   # AND
!OCC[$4]++           # if the array element OCC(urrences) indexed by field 4 (the r<...> value) is not yet defined (be aware it is post- incremented and thus defined from then on)  
{CNT++ }             # increment the CNT (count) variable
END {print CNT+0}'   # print the count - if undefined as no $AAA was found, the +0 turns it into an integer 0.

Dear RudiC,

Can you please explain this part
Code:
 !OCC[$4]++

a bit more? I dont understand when you mention that it is not yet defined, what you mean by that?

Sincerely,
Aurimas
# 20  
Old 06-12-2019
An array element, as is any variable in awk, is created on first reference as an empty entity. Empty, as does 0, represents the boolean value FALSE. So OCC[$4] on its first occurrence, is created, empty, and represents FALSE. The "!" inverts that value, so !OCC[$4] is TRUE, leading to the execution of this branch on its first occurrence. Then ++ does a post-increment on the value so it will be FALSE from then on.
# 21  
Old 06-13-2019
Thank you. Also I have another question. Basically for the input file (calculate_contacts_HS_*.pdb.txt) I want that all lines that have ALA in first or second column sum the third column numerical values of those lines. In the input file given below it would have to be 3 lines that contain ALA in 2nd column (but it can also be in 1st) and 3rd column summed would be: 1.32859+6.90889+0.0962809=8.333761. The example of input file is this:

Code:
c<B>r<3>a<2300>R<LEU>A<CD1> c<B>r<56>a<2643>R<ALA>A<N> 1.32859 4.26827 . .
c<B>r<3>a<2300>R<LEU>A<CD1> c<B>r<56>a<2644>R<ALA>A<CA> 6.90889 4.0211 . .
c<B>r<3>a<2300>R<LEU>A<CD1> c<B>r<56>a<2646>R<ALA>A<O> 0.0962809 4.95892 . .

and the script is this:

Code:
#!/bin/bash
read -p "amino acid: " AAA
if [[ "ALA ARG ASN ASP CYS GLN GLY GLU HIS ILE \
	   LEU LYS MET PHE PRO SER THR TRP TYR VAL" =~ $AAA ]]
then 
	for i in calculate_contacts_HS_*.pdb.txt; 
		do
			awk '{ if ($0 ~ /$AAA/) sum += $3} END {print sum}' $i
        done
else
	exit 1
fi

but for this script I don't get any value. When I run the script through command the output I get is this:
Code:

Because in my script AAA is assigned to one of the 20 values (ALA, ASN, ARG, ASP, CYS, GLU, GLN, LEU, ILE, LYS, MET, etc.) therefore $AAA can be assigned to different values, but for this case let's assume that $AAA = ALA.

I will look forward to your responses.

Sincerely,
Aurimas

Last edited by Aurimas; 06-13-2019 at 07:40 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Unexpected result from awk

Hello, Giving those commands: cat > myfile 1 2 3 ^D cat myfile | awk '{ s=s+$1 ; print s}' The output is: 1 3 6 It seems like this command iterates each time on a different row so $1 is the first field of each row.. But what caused it to refer to each row ?. What I mean... (3 Replies)
Discussion started by: uniran
3 Replies

2. UNIX for Beginners Questions & Answers

Line duplication with awk?!

So while this seemed totally trivial it turned out to be much more difficult than I had thought. I have a file with 3 rows, and I "just" want to add each field n number of times. E.g. > cat file.txt 0.5 -0.1 0.6 for n=3 into: cat newfile.txt 0.5 0.5 0.5 -0.1 -0.1 -0.1 0.6 0.6 0.6 I... (4 Replies)
Discussion started by: Glorp
4 Replies

3. Linux

De-Duplication Problem

Hi all, I download and install lessfs for deduplication, I copy files in /SharedFiles directory and lessfs work right and not store again copy files, but, when i delete all files in /SharedFiles , not return free space to total space, files not show in /SharedFiles , but not copy new files in... (3 Replies)
Discussion started by: saeedha
3 Replies

4. Programming

Table Duplication in PHP

Hey, I am making a Facebook like Page system as my first project, So far it's been bate in mind I did it from my 3DS at the same time as my PC gets replaced, So far it's turned out great. Now I am on to creation the blocking system I need to get the code to say If the user already likes the... (0 Replies)
Discussion started by: AimyThomas
0 Replies

5. UNIX for Advanced & Expert Users

File Descriptor redirection and duplication

i have many questions concerning the FD. it was stated that "to redirect Error to output std, you have to write the following code" # ls -alt FileNotThere File > logfile 2>&1 # cat logfile ls: cannot access FileNotThere: No such file or directory -rw-r--r-- 1 root root 0 2010-02-26... (9 Replies)
Discussion started by: ahmad.zuhd
9 Replies

6. Shell Programming and Scripting

How to avoid duplication within 2 files?

Hi all, Actually 2 files are there - file1, file2. file1 contains ---> london mosco america russia mosco file2 contains --> europe india japan mosco england london Question is I want to print all the city names without duplication cities in those... (10 Replies)
Discussion started by: balan_mca
10 Replies

7. Shell Programming and Scripting

File Duplication Script?

I have a file, let's say 1.jpg, and I have a text file that contains a list of filenames I would like to duplicate 1.jpg as (i.e., 2.jpg, 3.jpg, 4.jpg, etc.). The filenames that I want to create are all on separate lines, one per line. I'm sure there's a simple solution, but I'm not claiming to... (7 Replies)
Discussion started by: futurestar
7 Replies

8. UNIX for Advanced & Expert Users

mount LVM duplication drives

Hi, I'm stuck in an awkward situation please help :) I have two identical Seagate 80GB harddrives. My objective is a bit strange. 1.I want to have a cloned disk as bootable backup 2.When booting using the master drive, I also want to mount the cloned backup disk so I can do incremental... (6 Replies)
Discussion started by: onthetopo
6 Replies

9. HP-UX

awk to output cmd result

I was wondering if it was possible to tell awk to print the output of a command in the print. .... | awk '{print $0}' I would like it to print the date right before $0, so something like (this doesn't work though) .... | awk '{print date $0}' (4 Replies)
Discussion started by: IMTheNachoMan
4 Replies

10. Windows & DOS: Issues & Discussions

File Duplication

hi all how to find the file duplication in a windows 2000 server as usual replies are sincerely appreciated. thanks raguram R (3 Replies)
Discussion started by: raguramtgr
3 Replies
Login or Register to Ask a Question