using Lynx and Grep to return search page rank - help


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers using Lynx and Grep to return search page rank - help
# 1  
Old 09-18-2007
using Lynx and Grep to return search page rank - help

I am writing a script which will read in search terms from a text file and pass each line to Lynx. Lynx will grab the source html, then I want grep/tr, whatever to search for the first occurance of a term (mydomain.name), then delete from that 1st occurance on, creating a new end of file.

Then I want to count a certain marker <class=L> in the remaining source to determine the search engine page rank until end of file.

This is what I have so far. My primary issue is that google returns all search html source as 1 line, which is why I need to count the style tag <class=L> (in this case lowercase L), what I have right now grab the search terms and the results, but I'm unsure of where to go from here.

#!/bin/bash
cat ${1} | while read searchTerm; do
#echo "${searchTerm}"
lynx -source -accept_all_cookies "http://www.google.com/search?q=$searchTerm">> /path/to/dir/archive.txt
done

Thanks in Advance!
 
Login or Register to Ask a Question

Previous Thread | Next Thread

7 More Discussions You Might Find Interesting

1. What is on Your Mind?

The Return of the Show Post Page

You may recall we used to have a "Show Post" link in each post that referenced the post and had a link to this post in page. I am going to bring back this feature and and renovate the page: https://www.unix.com/members/1-albums215-picture1013.png So that page has a "Under Renovation"... (1 Reply)
Discussion started by: Neo
1 Replies

2. What is on Your Mind?

Update to Advanced Search Page (Phase 1)

Update: I have completed the first phase of revamping the "Advanced Search" page using Bootstrap (desktop not mobile yet): https://www.unix.com/search.php https://www.unix.com/search.php I may change this to a Bootstrap modal later and change the CSS a bit more; but for now it is much... (0 Replies)
Discussion started by: Neo
0 Replies

3. Web Development

Fix For Google Page Rank: Wordpress List Rank Dashboard Widget

Here is the fix for the recent Google changes to their pagerank API. For example, in the List Rank Dashboard Widget Wordpress Plugin (Version 1.7), in this plugin file: list-rank-dashboard-widget/wp-list-rank-class.php in this function: function getGooglePR($url) Change this line: ... (0 Replies)
Discussion started by: Neo
0 Replies

4. Solaris

How to search man page (pdf file)

I'm not sure is it just only me or something. I try to download man page documentation from SUN.COM. However, it seems I can't search what I looking for in SUN man page. I try to search "passwd" but it return me a word "less" why this pdf can't search or is it require specific plugin to... (3 Replies)
Discussion started by: Smith
3 Replies

5. UNIX for Dummies Questions & Answers

Lynx Grep Pattern Match 2 conditions Print from Start to End

I am working on a scraping project and I am stuck at this tiny grep pattern match. Sample text : FPA List. FPA List. FPA List. FPA List. FPA List. FPA List. FPA List. FPA List. ABC Personal Planning Catherine K. Wat Cath Wat Catherine K. Wat Catherine K. Wat IFRAME:... (8 Replies)
Discussion started by: kkiran
8 Replies

6. UNIX for Dummies Questions & Answers

| help | unix | grep - Can I use grep to return a string with exactly n matches?

Hello, I looking to use grep to return a string with exactly n matches. I'm building off this: ls -aLl /bin | grep '^.\{9\}x' | tr -s ' ' -rwxr-xr-x 1 root root 632816 Nov 25 2008 vi -rwxr-xr-x 1 root root 632816 Nov 25 2008 view -rwxr-xr-x 1 root root 16008 May 25 2008... (7 Replies)
Discussion started by: MykC
7 Replies

7. UNIX for Advanced & Expert Users

Man page search issue

I have an issue with my man page configuration. I can able to see man pages for 1st section. But for not the rest of the sections. But If it give section number, man page is working properly Following are the details echo $MANPATH... (4 Replies)
Discussion started by: praveenkumar_l
4 Replies
Login or Register to Ask a Question
Mail::Box::Search::Grep(3pm)				User Contributed Perl Documentation			      Mail::Box::Search::Grep(3pm)

NAME
Mail::Box::Search::Grep - select messages within a mail box like grep does INHERITANCE
Mail::Box::Search::Grep is a Mail::Box::Search is a Mail::Reporter SYNOPSIS
use Mail::Box::Manager; my $mgr = Mail::Box::Manager->new; my $folder = $mgr->open('Inbox'); my $filter = Mail::Box::Search::Grep->new ( label => 'selected' , in => 'BODY', match => qr/abc?d*e/ ); my @msgs = $filter->search($folder); my $filter = Mail::Box::Search::Grep->new ( field => 'To' , match => $my_email ); if($filter->search($message)) {...} DESCRIPTION
Try to find some text strings in the header and footer of messages. Various ways to limit the search to certain header fields, the whole header, only the body, the whole message, but even binary multiparts, are provided for. The name grep is derived from the UNIX tool grep, which means: "Get Regular Expression and Print". Although you can search using regular expressions (the Perl way of them), you do not have to print those as result. METHODS
Constructors Mail::Box::Search::Grep->new(OPTIONS) Create a UNIX-grep like search filter. -Option --Defined in --Default binaries Mail::Box::Search <false> decode Mail::Box::Search <true> delayed Mail::Box::Search <true> deleted Mail::Box::Search <false> deliver undef field undef in Mail::Box::Search <$field ? 'HEAD' : C<'BODY'>> label Mail::Box::Search undef limit Mail::Box::Search 0 log Mail::Reporter 'WARNINGS' logical Mail::Box::Search 'REPLACE' match <required> multiparts Mail::Box::Search <true> trace Mail::Reporter 'WARNINGS' binaries => BOOLEAN decode => BOOLEAN delayed => BOOLEAN deleted => BOOLEAN deliver => undef|CODE|'DELETE'|LABEL|'PRINT'|REF-ARRAY Store the details about where the match was found. The search may take much longer when this feature is enabled. When an ARRAY is specified it will contain a list of references to hashes. Each hash contains the information of one match. A match in a header line will result in a line with fields "message", "part", and "field", where the field is a Mail::Message::Field object. When the match is in the body the hash will contain a "message", "part", "linenr", and "line". In case of a CODE reference, that routine is called for each match. The first argument is this search object and the second a reference to same hash as would be stored in the array. The "PRINT" will call printMatchedHead() or printMatchedBody() when any matching header resp body line was found. The output is minimized by not reprinting the message info on multiple matches in the same message. "DELETE" will flag the message to be deleted in case of a match. When a multipart's part is matched, the whole message will be flagged for deletion. field => undef|STRING|REGEX|CODE Not valid in combination with "in" set to "BODY". The STRING is one full field name (case-insensitive). Use a REGEX to select more than one header line to be scanned. CODE is a routine which is called for each field in the header. The CODE is called with the header as first, and the field as second argument. If the CODE returns true, the message is selected. in => 'HEAD'|'BODY'|'MESSAGE' label => STRING limit => NUMBER log => LEVEL logical => 'REPLACE'|'AND'|'OR'|'NOT'|'AND NOT'|'OR NOT' match => STRING|REGEX|CODE The pattern to be search for can be a REGular EXpression, or a STRING. In both cases, the match succeeds if it is found anywhere within the selected fields. With a CODE reference, that function will be called each field or body-line. When the result is true, the details are delivered. The call formats are $code->($head, $field); # for HEAD searches $code->($body, $linenr, $line); # for BODY searches The $head resp $body are one message's head resp. body object. The $field is a header line which matches. The $line and $linenr tell the matching line in the body. Be warned that when you search in "MESSAGE" the code must accept both formats. multiparts => BOOLEAN trace => LEVEL Searching $obj->inBody(PART, BODY) See "Searching" in Mail::Box::Search $obj->inHead(PART, HEAD) See "Searching" in Mail::Box::Search $obj->search(FOLDER|THREAD|MESSAGE|ARRAY-OF-MESSAGES) See "Searching" in Mail::Box::Search $obj->searchPart(PART) See "Searching" in Mail::Box::Search The Results $obj->printMatch([FILEHANDLE], MATCH) $obj->printMatchedBody(FILEHANDLE, MATCH) $obj->printMatchedHead(FILEHANDLE, MATCH) Error handling $obj->AUTOLOAD() See "Error handling" in Mail::Reporter $obj->addReport(OBJECT) See "Error handling" in Mail::Reporter $obj->defaultTrace([LEVEL]|[LOGLEVEL, TRACELEVEL]|[LEVEL, CALLBACK]) Mail::Box::Search::Grep->defaultTrace([LEVEL]|[LOGLEVEL, TRACELEVEL]|[LEVEL, CALLBACK]) See "Error handling" in Mail::Reporter $obj->errors() See "Error handling" in Mail::Reporter $obj->log([LEVEL [,STRINGS]]) Mail::Box::Search::Grep->log([LEVEL [,STRINGS]]) See "Error handling" in Mail::Reporter $obj->logPriority(LEVEL) Mail::Box::Search::Grep->logPriority(LEVEL) See "Error handling" in Mail::Reporter $obj->logSettings() See "Error handling" in Mail::Reporter $obj->notImplemented() See "Error handling" in Mail::Reporter $obj->report([LEVEL]) See "Error handling" in Mail::Reporter $obj->reportAll([LEVEL]) See "Error handling" in Mail::Reporter $obj->trace([LEVEL]) See "Error handling" in Mail::Reporter $obj->warnings() See "Error handling" in Mail::Reporter Cleanup $obj->DESTROY() See "Cleanup" in Mail::Reporter $obj->inGlobalDestruction() See "Cleanup" in Mail::Reporter DIAGNOSTICS
Error: Package $package does not implement $method. Fatal error: the specific package (or one of its superclasses) does not implement this method where it should. This message means that some other related classes do implement this method however the class at hand does not. Probably you should investigate this and probably inform the author of the package. SEE ALSO
This module is part of Mail-Box distribution version 2.105, built on May 07, 2012. Website: http://perl.overmeer.net/mailbox/ LICENSE
Copyrights 2001-2012 by [Mark Overmeer]. For other contributors see ChangeLog. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See http://www.perl.com/perl/misc/Artistic.html perl v5.14.2 2012-05-07 Mail::Box::Search::Grep(3pm)