Perl issue - please help!


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl issue - please help!
# 1  
Old 05-20-2009
Perl issue - please help!

Hello. I've been writing some code in Perl to read in strings from html files and have been having issues. In the html file, each "paragraph" is a certain file on the website. I need to find every one of the files that is a certain type, in this case, having green color....therefore bgcolor=#ddffff. Then once I find all of those, I'm having problems, because I find them and it only returns that line. I need my code to return the entire paragraph, because the string I need to return is in each paragraph that contains #ddffff and is usually approx. 7 lines below. Example:

Code:
</tr>
<tr bgcolor="#ddffff"><td><a target=_top href=http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=34305><font color="green" size=-1>Lotus japonicus</font></a></td>
<td><font size=-1>&nbsp;</font></td>
<td><a target=_top href=http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=15617><font size=-1>NC_002694</font></a></td>
<td><font size=-1>150519&nbsp;nt&nbsp;</font></td>
<td><a target=_top href=http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Protein+Table&list_uids=15617><font size=-1>82</font></a></td>
<td><a target=_top href=http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Structural+RNA+Table&list_uids=15617><font size=-1>45</font></a></td>
<td><a target=_top href=http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene&Cmd=Search&TermToSearch=NC_002694[accn]><font size=-1>128</font></a></td>
<td><font size=-1>Mar 1 2001</font></td>
<td><font size=-1>Jan 30 2008</font></td>


This is one of the "paragraphs" I would need, because it does in fact have bgcolor="#ddffff". From this paragraph, I then need to return and print the NC_'number' that is in the middle of it. How do I do this when the string matching of "#ddffff" only returns the line that that text is specifically on. Any help would be great!

Last edited by Neo; 05-20-2009 at 07:24 PM.. Reason: code tags
# 2  
Old 05-20-2009
Algorithm

Well, I'll say that a possible workaround is to first look for the the first paragraph, then save it in an array and finally do the grep

If the grep returns nothing, then proceed to fill the array with the next paragraph and repeat the grep

Doing those processes in a for, that goes through all the lines seems feasible.

just a thoughtSmilie
# 3  
Old 05-20-2009
#!/usr/bin/perl
my $data_file = 'genomehtml2.txt';
open DATA, "$data_file" or die "can't open $data_file $!";
my @array_of_lines = <DATA>;
foreach my $line (@array_of_lines)
{
if ($line =~ m/#ddffff/i)
{
print "This line: $line\n";
}
}
close(DATA);



This is what I have so far...and this is returning the first line of each paragraph that has "#ddffff". I just don't know where to put in the code to get the NC numbers...I also have some code I've tried using grep:

#!/usr/bin/perl
my $data_file = 'genomehtml2.txt';
open DATA, "$data_file" or die "can't open $data_file $!";
my @array_of_lines = <DATA>;
my @grepColor = grep(/#ddffff/, @array_of_lines)
my @grepFiles = grep(/NC_/, @array_of_lines)

I don't really know where to go with this one as much......any coding ideas?
# 4  
Old 05-20-2009
I believe the structure <DATA> only returns ONE line at a time...

You need to put it in a loop...

$state = 0;
$line = <DATA>;
$state = 1 if $line =~ /#ddffff/i;
while (<DATA>){
$keep_line = $_ if ($state && $_ =~ /NC_/);
# Now do something with $keep_line to persist it...
$state = 1 if $_ =~ /#ddffff/i;
$state = 0 if $_ =~ /^\s+&/;
}

Assuming blank lines between paragraphs, set a $state variable to 1 (some TRUE) value if you encounter a /#bbffff/ line... Now with the state set to TRUE, look for your NC_ pattern and save the line. The next blank, set state back to zero so that the next paragraph will NOT be searched unless you find #ddffff etc. There's probably a more elegant way to do it, but this should get you started...
# 5  
Old 05-20-2009
Your html sample is pretty small, but see how this works:

Code:
my @NC = ();
my $data_file = 'genomehtml2.txt';
open (my $IN, $data_file) or die "can't open $data_file $!";
OUTTER: while(<$IN>){
   if(/<tr bgcolor="#ddffff">/){
      INNER: while(<$IN>) {
         if(/\b(NC_\d+)\b/){
            push @NC, $1;
            next OUTTER;
         }
      }
   }
}
print "$_\n" for @NC;

# 6  
Old 05-20-2009
Hey thanks guys. KevinADC, I just tried your code and it worked great, but could you put up what exactly you were thinking when you put it together. Just wanted to know as a learning experience. I understand a large majority of it, but a full description would be great. Thanks!
# 7  
Old 05-20-2009
another way
Code:
my @NC = ();
my $data_file = 'file';
open (my $IN, $data_file) or die "can't open $data_file $!";
while(<$IN>){   
   if(/<tr bgcolor="#ddffff">/){ $f=1;}
   if ($f && /\b(NC_\d+)\b/){
     push @NC, $1;    
     $f=0;
   }
}
print "$_\n" for @NC;

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Programming

Perl - EMail issue - NEED Help

I have a perl that is sending emails in a bad format: "begin 644 Included.doc M*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ M*BHJ*BHJ*BHJ*BH*4U5#0T534T953"!-1$XG<R!F;W(@07)C:&EV92!022`M M($-A;F-E;`HJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ*BHJ... (1 Reply)
Discussion started by: mrn6430
1 Replies

2. Shell Programming and Scripting

Out of memory issue in perl

I am getting a out of memory issue while executing the perl program. Per version : /opt/acc_perl/lib/site_perl/5.14.2 Read in 54973 total records Read in 54973 table records from table. Out of memory! so the job get failed due to out of memory. need to get rid of the out of memory... (3 Replies)
Discussion started by: ramkumar15
3 Replies

3. Shell Programming and Scripting

Perl format issue

Input : day :15 and count -100 printf ("%6.6ld %10.10s %s\n",day,count) any idea what would be the format it will be. (3 Replies)
Discussion started by: ramkumar15
3 Replies

4. Shell Programming and Scripting

PERL - issue with OPEN

Hi, I have a menu script written in PERL which calls some shell scripts and displays the return. I'm having a problem with OPEN. A section of the code is below: `./scriptlist.ksh 1`; open OUTPUT, "</home/$SCRIPTUSER/output"; { local $/ = undef; $_ =... (2 Replies)
Discussion started by: chris01010
2 Replies

5. Shell Programming and Scripting

perl command issue

Hi, Please could someone advise on a perl command : export ENVPROP="$HOME/cfg/environment.properties.template" export LM_LICENSE=`awk -F= '!/^#/ && /LM_LICENSE/{print $2}' environment.properties` echo $LM_LICENSE $DATA_FILE/licenses/sample.demo.lic perl -i -npe... (1 Reply)
Discussion started by: venhart
1 Replies

6. Shell Programming and Scripting

wc -l command issue with perl

Hi Team, the Following program execute with out error but the out is not save with create2.txt. kindly help me!!! print "Enter your Number \n"; my $name = <STDIN>; if ($name =="*91111*") { @dirlist1 = `wc -l $name > create2.txt`; } else {print "do not match";} (3 Replies)
Discussion started by: adaleru
3 Replies

7. Shell Programming and Scripting

Perl Issue

Hi, I got this script from the web, this generates an LDAP report in CSV format. #!/usr/bin/perl # # Copyright (c) 2004 # Ali Onur Cinar &060;cinar&064;zdo.com&062; # # License: # # Permission to use, copy, modify, and distribute this software and its # documentation for... (23 Replies)
Discussion started by: raj001
23 Replies

8. Shell Programming and Scripting

Perl Script Issue - Please Help * Thanks!!!

Please help me with my script please. I am trying to do the following: 1. Read files for the current directory 2. Open and read from nbe files files only 3. Read only the lines with the results pattern 4. Split the line and print 3rd field Please indicate what line I need to modify. ... (8 Replies)
Discussion started by: jroberson
8 Replies

9. Shell Programming and Scripting

perl issue ..

hi one perl issue i have xml file with 2 values and one condition b.w them <rule> <val1>12</val1> <cond>and</cond> <val2>13</val2> </rule> i read these values in hash in perl code $one{val1} = 12 $one{cond} = and $one{val2} = 13 now i want to form... (3 Replies)
Discussion started by: zedex
3 Replies

10. Shell Programming and Scripting

issue with if loop in perl

Hi I have a log file, I am having problem with "if else" loop in my perl script which does, find a string in that file ,If that string is found append to success.txt else append it to failed.txt. problem is: else part of loop it is not working I am adding problem part of the script. ... (4 Replies)
Discussion started by: amitrajvarma
4 Replies
Login or Register to Ask a Question