Perl code to retrieve text from website


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl code to retrieve text from website
# 1  
Old 03-18-2014
Perl code to retrieve text from website

Code:
  perl -MLWP::Simple -le '$s=shift;$c=get("http://www.google.com/intl/en/chrome/devices/chromecast/$s/");$c=~/meta content=(.*?)name=\"Remote free\"/msg; print length($1),"\t$1"' ?gclid=CJDg27OdnL0CFcFlOgodFD8A6Q >output.txt

output.txt should be: Chromecast works with devices you already own, including Android tablets and smartphones, iPhones®, iPads®, Chrome for Mac® and Chrome for Windows®. Browse for what to watch, control playback, and adjust volume using your device. You won't have to learn anything new

The script above creates a 1KB file, but it is empty. I have also tried another site with this code with the same result and I do not know why or how to fix it.

Code:
 perl -MLWP::Simple -le '$s=shift;$c=get("http://www.ncbi.nlm.nih.gov/gtr/tests/508680/$s/");$c=~/meta content=(.*?)name=\"Test name\"/msg; print length($1),"\t$1"' #overview

output.txt should be: Whole Exome Sequencing (Exome)


Thanks.
# 2  
Old 03-18-2014
It's cool to compact code; however, when you're debugging as well as requesting help, I suggest you use multiple lines to show the code.

Once you do so, you should evaluate each line to see that it actually returns someting. For example, what does $c evaluate to at each step, as well as other variables.

Try that approach.
# 3  
Old 03-18-2014
perl code to retrieve text from website

Code:
 perl -MLWP::Simple -le '$s=shift;$c=get("http://www.ncbi.nlm.nih.gov/gtr/tests/508680/$s/");
$c=~/meta content=(.*?)name=\"Test name\"/msg; print length($1),"
\t$1"' #overview

The $c is supposed to be meta content in the overview section of ID 508680 from the website. I am putting a specific ID in the code.

output.txt should be Whole Exome Sequencing (Exome). Thanks.
# 4  
Old 03-18-2014
The shell treats anything beginning with # as a comment. You must quote it to get it treated as a string.

Code:
perl -MLWP::Simple -le '$s=shift;$c=get("http://www.ncbi.nlm.nih.gov/gtr/tests/508680/$s/");
$c=~/meta content=(.*?)name=\"Test name\"/msg; print length($1),"
\t$1"' "#overview"

It still doesn't work quite right, but now at least you can get $s in the url.

I don't see "test name" tag value anywhere in that URL incidentally. And the only 'meta content=' tag has a value of 'robots' which doesn't look that useful to me.

Last edited by Corona688; 03-18-2014 at 02:57 PM..
# 5  
Old 03-18-2014
perl code to retrieve data from website

The site: Whole Exome Sequencing - Tests - GTR - NCBI

has 7 tabs (overview, how to order, indication, methodology, performance characteristics, interpretation, labratory contact). Each tab has relavent data in it that I would like to pull. For example the tag "Test name" is in overview. The tag "Method" is also in overview. The tag "Labratory contact" is in Labratory Contact. Thank you.
# 6  
Old 03-18-2014
There is no point adding the #overview to the URL -- it does not change what text is downloaded. # denotes a position inside one document, which Perl does not care about, it downloads the page as one big blob.

You still cannot match name="Test name" because the page does not contain it, anywhere.

The HTML on that page is awful. Smilie I'm still trying to figure out a way to efficiently extract anything from it.
# 7  
Old 03-18-2014
perl code to retrieve text from webpage

So you are saying that regardless of where the Tag is perl downloads all 7 of the tabs as a BLOB and then trys to parse the BLOB for "Test name" which it can not find. Eventhough the text is visually there (attachment), the html on that page is preventing perl from finding it, sorry but I am new to perl and trying to understand better Smilie. Thank you for your help.



Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to search a text in file and retrieve required lines following it with UNIX command?

I have requirement to search for a text in the file and retrieve required lines that is user defined with unix command. Eg: Find the text UNIX in the below file and need to return Test 8 & Test 9 Test 1 Test 2 Test 3 Test 4 UNIX Test 5 Test 6 Test 7 Test 8 Test 9 Result can... (8 Replies)
Discussion started by: Arunkumarsak4
8 Replies

2. Shell Programming and Scripting

Retrieve information Text/Word from HTML code using awk/sed

awk/sed newbie here. I have a HTML file and from that file and I would like to retrieve a text word. <font face=arial size=-1><li><a href=/value_for_clients/Tokyo/abc_process.txt>abc</a> NDK Version: 4.0 </li> <font face=arial size=-1><li><a... (6 Replies)
Discussion started by: sk2code
6 Replies

3. Shell Programming and Scripting

How can i run sql queries from UNIX shell script and retrieve data into text docs of UNIX?

Please share the doc asap as very urgently required. (1 Reply)
Discussion started by: 24ajay
1 Replies

4. Shell Programming and Scripting

PERL: retrieve the data based on regular expression

Hi Friends i have a code below sample $text contains the values test1 PIC X test1 PIC XX test1 PIC XXX test1 PIC X(8) test1 PIC X(12) test1 PIC X test1 X(8) test1 PIC X VALUE 'N'. $text =~ /^\d{6} +(\d{2}) +(+) +PIC +(+)(\((\d+)\)(V(+)| +(COMP\-3).|\.)|( +(COMP\-3).|... (4 Replies)
Discussion started by: i150371485
4 Replies

5. Shell Programming and Scripting

How to retrieve a number or string from file1 and redirect into file2 in perl script?

hello forum members, I am siva ,As i am new to perl scripting i looking help from forum members. i need a sample program are command for pattern matching. I have file name infile1 which some data, I need to search the particular number are string in the file which repeats n number of... (0 Replies)
Discussion started by: workforsiva
0 Replies

6. Shell Programming and Scripting

perl: a way to see a sub code in debug mode: perl -de 0 ?

Is there a way to see or print a sub code? Sometime a sub could be already defined, but in the debug mode (so, interactively) it could be already out of screen. So, I would think about a way to check if the sub is defined (just 'defined' is not a problem) and how it is defined. Also, if... (4 Replies)
Discussion started by: alex_5161
4 Replies

7. Shell Programming and Scripting

Using Perl to query a website and parse the result

Hi, I am a JAVA programmer and I have no idea about perl. I did use it a long time ago and I don't even remember the basics. So here is my problem: In my work, I am supposed to build a simple program that opens a website (Gene Ontology)and passes my query and returns the result into a file. The... (1 Reply)
Discussion started by: chavanak
1 Replies

8. Shell Programming and Scripting

retrieve what the currently selected item is in a dropdown select list using perl tk

I have a dropdown menu built in perl tk (I am using active state perl). I want to select a value from the dropdown menu and I want to be able to perform some other actions depending upon what value is selected. I have all the graphical part made but I dont know how to get the selected value. Any... (0 Replies)
Discussion started by: lassimanji
0 Replies

9. Shell Programming and Scripting

Perl website login and session

Hi, I'm currently working on a perl website, and I would need a system where a few users can login into the administration side of the site. about 5-10 users maximum, all pretty simple. I was thinking of using an .htaccess file and a seperate admin folder on the server. I'm wondering if there... (2 Replies)
Discussion started by: LNC
2 Replies

10. UNIX for Dummies Questions & Answers

retrieve text after grep

I am trying to search for a pattern in a file containing xml - When I match the search I want to retrieve all the text within the xml brackets.. Whats the best way to read in data between xml tags in a shell script? ie.. xml returned which I have in a file now is something like below:... (2 Replies)
Discussion started by: frustrated1
2 Replies
Login or Register to Ask a Question