The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Perl Issue raj001 Shell Programming and Scripting 23 01-30-2009 06:12 AM
Need Help with Perl Scripting Issue. manik112 Shell Programming and Scripting 23 12-13-2008 12:52 PM
Perl Script Issue - Please Help * Thanks!!! jroberson Shell Programming and Scripting 8 11-03-2008 03:47 AM
perl issue .. zedex Shell Programming and Scripting 3 09-13-2008 11:22 PM
issue with if loop in perl amitrajvarma Shell Programming and Scripting 4 01-09-2008 12:02 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 05-20-2009
akreibich07 akreibich07 is offline
Registered User
  
 

Join Date: May 2009
Posts: 10
Perl issue - please help!

Hello. I've been writing some code in Perl to read in strings from html files and have been having issues. In the html file, each "paragraph" is a certain file on the website. I need to find every one of the files that is a certain type, in this case, having green color....therefore bgcolor=#ddffff. Then once I find all of those, I'm having problems, because I find them and it only returns that line. I need my code to return the entire paragraph, because the string I need to return is in each paragraph that contains #ddffff and is usually approx. 7 lines below. Example:

Code:
</tr>
<tr bgcolor="#ddffff"><td><a target=_top href=http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=34305><font color="green" size=-1>Lotus japonicus</font></a></td>
<td><font size=-1>&nbsp;</font></td>
<td><a target=_top href=http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=15617><font size=-1>NC_002694</font></a></td>
<td><font size=-1>150519&nbsp;nt&nbsp;</font></td>
<td><a target=_top href=http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Protein+Table&list_uids=15617><font size=-1>82</font></a></td>
<td><a target=_top href=http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Structural+RNA+Table&list_uids=15617><font size=-1>45</font></a></td>
<td><a target=_top href=http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene&Cmd=Search&TermToSearch=NC_002694[accn]><font size=-1>128</font></a></td>
<td><font size=-1>Mar 1 2001</font></td>
<td><font size=-1>Jan 30 2008</font></td>

This is one of the "paragraphs" I would need, because it does in fact have bgcolor="#ddffff". From this paragraph, I then need to return and print the NC_'number' that is in the middle of it. How do I do this when the string matching of "#ddffff" only returns the line that that text is specifically on. Any help would be great!

Last edited by Neo; 05-20-2009 at 06:24 PM.. Reason: code tags
  #2 (permalink)  
Old 05-20-2009
doutdes doutdes is offline
Registered User
  
 

Join Date: May 2009
Posts: 3
Algorithm

Well, I'll say that a possible workaround is to first look for the the first paragraph, then save it in an array and finally do the grep

If the grep returns nothing, then proceed to fill the array with the next paragraph and repeat the grep

Doing those processes in a for, that goes through all the lines seems feasible.

just a thought
  #3 (permalink)  
Old 05-20-2009
akreibich07 akreibich07 is offline
Registered User
  
 

Join Date: May 2009
Posts: 10
#!/usr/bin/perl
my $data_file = 'genomehtml2.txt';
open DATA, "$data_file" or die "can't open $data_file $!";
my @array_of_lines = <DATA>;
foreach my $line (@array_of_lines)
{
if ($line =~ m/#ddffff/i)
{
print "This line: $line\n";
}
}
close(DATA);



This is what I have so far...and this is returning the first line of each paragraph that has "#ddffff". I just don't know where to put in the code to get the NC numbers...I also have some code I've tried using grep:

#!/usr/bin/perl
my $data_file = 'genomehtml2.txt';
open DATA, "$data_file" or die "can't open $data_file $!";
my @array_of_lines = <DATA>;
my @grepColor = grep(/#ddffff/, @array_of_lines)
my @grepFiles = grep(/NC_/, @array_of_lines)

I don't really know where to go with this one as much......any coding ideas?
  #4 (permalink)  
Old 05-20-2009
quine quine is offline
Registered User
  
 

Join Date: Mar 2008
Location: Bay Area California
Posts: 68
I believe the structure <DATA> only returns ONE line at a time...

You need to put it in a loop...

$state = 0;
$line = <DATA>;
$state = 1 if $line =~ /#ddffff/i;
while (<DATA>){
$keep_line = $_ if ($state && $_ =~ /NC_/);
# Now do something with $keep_line to persist it...
$state = 1 if $_ =~ /#ddffff/i;
$state = 0 if $_ =~ /^\s+&/;
}

Assuming blank lines between paragraphs, set a $state variable to 1 (some TRUE) value if you encounter a /#bbffff/ line... Now with the state set to TRUE, look for your NC_ pattern and save the line. The next blank, set state back to zero so that the next paragraph will NOT be searched unless you find #ddffff etc. There's probably a more elegant way to do it, but this should get you started...
  #5 (permalink)  
Old 05-20-2009
KevinADC KevinADC is offline Forum Advisor  
Registered User
  
 

Join Date: Jan 2008
Posts: 731
Your html sample is pretty small, but see how this works:

Code:
my @NC = ();
my $data_file = 'genomehtml2.txt';
open (my $IN, $data_file) or die "can't open $data_file $!";
OUTTER: while(<$IN>){
   if(/<tr bgcolor="#ddffff">/){
      INNER: while(<$IN>) {
         if(/\b(NC_\d+)\b/){
            push @NC, $1;
            next OUTTER;
         }
      }
   }
}
print "$_\n" for @NC;
  #6 (permalink)  
Old 05-20-2009
akreibich07 akreibich07 is offline
Registered User
  
 

Join Date: May 2009
Posts: 10
Hey thanks guys. KevinADC, I just tried your code and it worked great, but could you put up what exactly you were thinking when you put it together. Just wanted to know as a learning experience. I understand a large majority of it, but a full description would be great. Thanks!
  #7 (permalink)  
Old 05-21-2009
KevinADC KevinADC is offline Forum Advisor  
Registered User
  
 

Join Date: Jan 2008
Posts: 731
Quote:
Originally Posted by akreibich07 View Post
Hey thanks guys. KevinADC, I just tried your code and it worked great, but could you put up what exactly you were thinking when you put it together. Just wanted to know as a learning experience. I understand a large majority of it, but a full description would be great. Thanks!
The code I posted is very simple if you know what a label is (OUTTER and INNER in the code I posted). If the code finds the first pattern it enters the inner "while" and searches each line until it finds the second pattern, if it does it pushes it into the array and then starts again in the outter loop (next OUTTER).

Its very similar to what ghostdog posted using a binary flag ($f) but they way I did it is "flagless".
Closed Thread

Bookmarks

Tags
file, html, perl, script

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 02:54 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0