Diamond operator in Until Statement (perl)


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Diamond operator in Until Statement (perl)
# 1  
Old 12-11-2008
Diamond operator in Until Statement (perl)

Hello:
I have the following perl script which is giving me trouble inside the second elsif statement. The purpose of the script is to go through a file and print out only those lines which contain pertinent information. The tricky part came when I realized that certain items actually spanned two or three lines. (For example, "FUNCTION:" information in the sample file).

The elsif condition is intended to recognize the beginning of the desired information (which it does) and the until condition is intended to recognize the end of the desired information. Strangely, the script appears to loop through the until construct the appropriate number of times, but it prints the initial line every time instead of the successive line. I suspect I'm using the diamond operator incorrectly.

Here is a sample of the file. :
Code:
ID   MYPR_HUMAN              Reviewed;         277 AA.
AC   P60201; P04400; P06905; Q502Y1;
DT   01-JAN-1988, integrated into UniProtKB/Swiss-Prot.
DT   23-JAN-2007, sequence version 2.
DT   25-NOV-2008, entry version 62.
DE   RecName: Full=Myelin proteolipid protein;
DE            Short=PLP;
DE   AltName: Full=Lipophilin;
GN   Name=PLP1; Synonyms=PLP;
OS   Homo sapiens (Human).
CC   -!- FUNCTION: Involved in the transport of proteins between the
CC       endosomes and the trans Golgi network (By similarity).
CC   -!- SUBCELLULAR LOCATION: Cell membrane; Lipid-anchor; Cytoplasmic
CC       side (Potential).
CC   -!- TISSUE SPECIFICITY: Ubiquitous.
CC   -!- SIMILARITY: Belongs to the small GTPase superfamily. Rab family.

Code:
#!/usr/local/bin/perl

use strict;

my @files;
my $file;
my $IN;

# Create array of files in current directory
my @files = `ls Batch*`;

foreach $file (@files) {
    open ($IN, $file) or die $!;
    my $i = 0;
    while (<$IN>) {
        if (/^(ID   )([A-Za-z0-9_]*)/) {
            print "GENE: $2\n";
        }
        elsif (/(DE   RecName: Full=)(.*);/) {
            print "NAME: $2\n";
        } 
        elsif ((/^(CC   -!- )(FUNCTION:.*)/) || (/(CC   -!- )(TISSUE SPECIFICITY:.*)/)) {
            print $2;
            until (<$IN> =~ /(-!-)/) {
                print $_;
            }
            print "\n";
        }
    }
    close $IN;
}

For what it's worth, I can get it working perfectly using the code below inside the elsif. I'm just curious why my initial attempt is failing. (Note: This second example has a regEx inside the until construct, but that shouldn't make the difference).

Code:
            print $2;
            my $nxtLine = <$IN>;
            until ($nxtLine =~ /(-!-)/) {
                $nxtLine =~ /CC *(.*)/;
                print $1;
                $nxtLine = <$IN>;

# 2  
Old 12-11-2008
Well, I think I figured it out. Essentially, the diamond operator just grabs the next line of input and assigns it to $_, but the diamond operator is not itself the input. I was treating it like a line of input.
Nevertheless, any criticism or critiques on my code or suggestions for alternate methods is certainly still welcome.
# 3  
Old 12-12-2008
Another approach:

Code:
perl -ne'BEGIN { $Sep = "=" x 65 }
  /^ID   (\w+)/ and $Gene = $1;
  /^DE   RecName: Full=([^;]*)/ and $Name = $1;
  if (/^CC   -!- FUNCTION:/../CC   -!- SUBCELLULAR/) {
    $Func .= $1."\n   " if 
	  !/SUBCELLULAR/ && (/(FUNCTION:.*)/ || /CC(.*)/)
	}
  if (/^CC   -!- TISSUE SPECIFICITY:/../^CC   -!- SIMILARITY:/) {
    $Spec .= $1."\n   " if 
	  !/CC   -!- SIMILARITY:/ && (/(TISSUE SPECIFICITY:.*)/ || /CC(.*)/)
	}
  printf "$Sep\n\nGENE: %s\n\nNAME: %s\n\n%s\n%s\n\n", $Gene, $Name, 
    $Func, $Spec and ($Func, $Spec) = undef if eof
  '  Batch*

# 4  
Old 12-12-2008
radoulov: Thanks for the reply. I can tell you spent some time on that.
Unfortunately, I'm pretty new to perl and I've never used the command line option (which I think that is), so it might take me a while to decipher what's going on.

If you don't mind, I have a couple questions to start with:

1) How does this line of code work:
Code:
/^ID   (\w+)/ and $Gene = $1;

I think it's saying: if the regEx finds a match, assign what's in the parenthesis to $Gene.

If that's it, it's very similar to what I was doing here:
Code:
if (/^(ID   )([A-Za-z0-9_]*)/) {
            print "GENE: $2\n";
        }

I just don't understand how you are able to do make a conditional statement without an 'if ' construct. Can you please elaborate on how this works?

2)My second question is about this line:
Code:
if (/^CC   -!- FUNCTION:/../CC   -!- SUBCELLULAR/) {

It seems like this regEx contains four slashes. I've never seen one like that. I assume this is used to get the text in between FUNCTION and SUBCELLULAR, but again, I just don't understand how it works.

Thanks again for your reply, and thanks in advance for your help with these questions. I know that I'm going to learn a lot from this exercise and I can tell that your approach is very efficient.
# 5  
Old 12-12-2008
Quote:
Originally Posted by erichpowell
f you don't mind, I have a couple questions to start with:

1) How does this line of code work:
Code:
/^ID   (\w+)/ and $Gene = $1;

I think it's saying: if the regEx finds a match, assign what's in the parenthesis to $Gene.
[...]
I just don't understand how you are able to do make a conditional statement
without an 'if ' construct. Can you please elaborate on how this works?
Sure.
The && and || (and and and or) are also called "Short Circuit" operators.
From Perl Idioms Explained - && and || "Short Circuit" operators (actually the Camel Book):

Quote:
Camel II tells us that the term "Short-Circuit"
refers to the fact that they "determine the truth of the statement
by evaluating the fewest number of operands possible."
So essentially these operators stop the chain of evaluation
of an expression as early as possible.
That is another key to their value.
So only if the fisrt expression returns true (i.e. if there is a match)
evaluate the second one (i.e. set $Gene to $1).

Quote:
Code:
if (/^CC   -!- FUNCTION:/../CC   -!- SUBCELLULAR/) {

It seems like this regEx contains four slashes.
I've never seen one like that.
I assume this is used to get the text
in between FUNCTION and SUBCELLULAR, but again,
I just don't understand how it works.
Right, it's simply the range .. operator:

Code:
$ print '
junk
start
yes
yes
end
junk
'|perl -ne'print if/start/../end/'
start
yes
yes
end

(see perldoc perlop | less -p'range operator' for more).
# 6  
Old 12-12-2008
Back to your original question (which you think you solved)...

Quote:
Originally Posted by erichpowell
Code:
            print $2;
            my $nxtLine = <$IN>;
            until ($nxtLine =~ /(-!-)/) {
                $nxtLine =~ /CC *(.*)/;
                print $1;
                $nxtLine = <$IN>;

First, I recommend against using a $ inside a file descriptor ($IN). This makes it look like something is happening that isn't: namely, file descriptors ARE NOT SCALAR VARIABLES in Perl. They are a class of variables in their own right. What's really happening is, I think, $IN is evaluating into undef, so that you have:
Code:
 while (<>) { 
...
 my $nxtLine = <>;

and so on.

As you noted, the <> is an operator that reads a line from the filedescriptor in between the brackets, or if none given, from the next command given on the command line, or if none are present, from STDIN (standard input, that is, piped or redirected input).

So $IN I think is going to make things confusing. Just use "IN" here.

Or, better yet,use the command line and the aforementioned feature:
[code]
$ your_perl_script.pl Batch*
[code]
And in your code:
Code:
while (<>) { 
...
 my $nxtLine = <>;
...

Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

To print diamond asterisk pattern based on inputs

I have to print the number of stars that increases on each line from the minimum number until it reaches the maximum number, and then decreases until it goes back to the minimum number. After printing out the lines of stars, it should also print the total number of stars printed. I have tried... (13 Replies)
Discussion started by: rohit_shinez
13 Replies

2. Shell Programming and Scripting

Like operator in IF statement

how can i use like operator in IF statement. Below is correct format, please guide if ; then CT_ACT_FILE_NAME=`echo FINACLE' else CT_ACT_FILE_NAME=`echo not listed' fi ---------- Post updated at 04:58 PM ---------- Previous update was at 04:56 PM ---------- Please use CODE... (6 Replies)
Discussion started by: rizwan.shaukat
6 Replies

3. Shell Programming and Scripting

awk diamond code golf (just for fun!)

Hey guys, This is purely just a little bit of fun with awk. I realize this this isn't that constructive so please remove if need be. Your goal: Create a one line awk script that generates a diamond shape of any size. Both the size of the diamond (measured by its middle line) and the... (7 Replies)
Discussion started by: pilnet101
7 Replies

4. UNIX for Dummies Questions & Answers

awk if statement / equals operator

Hi, I was hoping someone could explain this please :) I'm using bash, scientific linux... and I don't know what else you need to know. With awk '{ if( 0.3 == 0.1*3) print $1}' file.dat nothing will be printed since apparently the two numbers do not equate. (Using 0.3 != 0.1*3 is seen... (4 Replies)
Discussion started by: Golpette
4 Replies

5. Shell Programming and Scripting

How to write If statement using && and operator in Unix

Hi What is the syntax for if statement using && and || operator? if && ] || here its giving me an error to this if statement any suggestion?? (2 Replies)
Discussion started by: Avi
2 Replies

6. Shell Programming and Scripting

Perl Diamond Operator

I know that when using 'while (<FILE>) {}', Perl reads only one line of the file at one time, and store it in '$_'. Can I change some parameters so that 'while (<>) {}' can read more than one lines, like 2 or 5 lines at one time? Thanks for the help! (1 Reply)
Discussion started by: zx1106
1 Replies

7. Shell Programming and Scripting

cannot properly employ "or" operator in an if statement (bash)

Hi, I have a variable, $sername, and I would like to display this variable only if it *does not* contain either of these two tags: *DTI*FA* or *DIFF*FA*. I think the syntax for my 'or' operator is off. The variable $sername is continuously changing in an outer loop (not shown), but at the... (4 Replies)
Discussion started by: goodbenito
4 Replies

8. Shell Programming and Scripting

chomp like Perl operator in Bash

I am sure there should exist a chomp like Perl operator in Bash using which I can literally remove new line characters as show below: Any clue? (3 Replies)
Discussion started by: paragkalra
3 Replies

9. Shell Programming and Scripting

scalar variable assignment in perl + { operator

When reading over some perl code in a software document, I came across an assignment statement like this $PATH = ${PROJECT}/......./.... In this particular form of scalar variable assignment, what does the curly braces operators do ? Also, what is the benefit in doing scalar assignment this... (3 Replies)
Discussion started by: JamesGoh
3 Replies
Login or Register to Ask a Question