![]() |
|
|
|
|
|||||||
| Forums | Portal | Register | Forum Rules | FAQ | Contribute | Members List | Arcade | Search | Today's Posts | Mark Forums Read |
| High Level Programming Post questions about C, C++, Java, SQL, and other programming languages here. |
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| how to delete content in a file (delete content only) | kittusri9 | Shell Programming and Scripting | 5 | 05-15-2008 10:12 AM |
| reading web page source in unix | jaymzlee | UNIX for Dummies Questions & Answers | 3 | 03-26-2008 04:27 PM |
| lpr- how to print from page to page | naamas03 | Shell Programming and Scripting | 4 | 12-26-2007 03:30 AM |
| reading reading data from webpage | phani_sree | High Level Programming | 3 | 11-01-2007 10:28 AM |
| Content of Content of a variable! | jaduks | Shell Programming and Scripting | 2 | 08-26-2007 09:40 PM |
|
|
Submit Tools | LinkBack | Thread Tools | Display Modes |
|
#1
|
|||
|
|||
|
Reading web page content
Hi, guys.
I really need to solve this problem and I don't know how. So, can somone be so kind to help me? Please. And the problem is: I have to write a C program which will open a web page, filter it's contents and save needed data to a file. Now, everything's easy, but reading this web page. How can I open a page from within a program? Just for example, lets say I need to find the name of the newest member of unix.com from the notification on home page. How can I even tell it to go to web?! Reading pure HTML (or PHP generated or whatever) is also ok. I'm on Sun Solaris OS 5.8. |
| Forum Sponsor | ||
|
|
|
#2
|
|||
|
|||
|
My personal favorite is to use perl to invoke wget or curl, and also to filter the webpage using perl's regular expressions. I'm sure C is the wrong tool for this sort of job, but if you're against learning perl, by all means use C.
|
|
#3
|
||||
|
||||
|
This is an interesting debate, it is easy to say using a higher
level language is easier, but when you are using an API to solve more difficult requests you will run into problems. To truly understand the request and response, you have to understand how ports work. For instance try to request http://www.google.com/search?hl=en&i...&q=unix+forums with a high level language API. I tried with Java for instance: Code:
// This example is from the book _Java in a Nutshell_ by David Flanagan.
// Written by David Flanagan. Copyright (c) 1996 O'Reilly & Associates.
// You may study, use, modify, and distribute this example for any purpose.
// This example is provided WITHOUT WARRANTY either expressed or implied.
import java.net.*;
import java.io.*;
import java.util.*;
public class GetURLInfo {
public static void printinfo(URLConnection u) throws IOException {
// Display the URL address, and information about it.
System.out.println(u.getURL().toExternalForm() + ":");
System.out.println(" Content Type: " + u.getContentType());
System.out.println(" Content Length: " + u.getContentLength());
System.out.println(" Last Modified: " + new Date(u.getLastModified()));
System.out.println(" Expiration: " + u.getExpiration());
System.out.println(" Content Encoding: " + u.getContentEncoding());
// Read and print out the first five lines of the URL.
System.out.println("First five lines:");
DataInputStream in = new DataInputStream(u.getInputStream());
for(int i = 0; i < 5; i++) {
String line = in.readLine();
if (line == null) break;
System.out.println(" " + line);
}
}
// Create a URL from the specified address, open a connection to it,
// and then display information about the URL.
public static void main(String[] args)
throws MalformedURLException, IOException
{
URL url = new URL(args[0]);
URLConnection connection = url.openConnection();
printinfo(connection);
}
}
To break this problem you have to go to the socket level and do low level sends such as: Code:
Socket socket = new Socket(u.getHost(),port); OutputStream out = socket.getOutputStream(); InputStream in = socket.getInputStream(); Code:
byte buffer[]=new byte[1024]; int l = in.read(buffer); body.append(new String(buffer,0,l,"8859_1")); It is not as easy as you would think and C would give you a better understanding of socket programming. And when I get $60.00 I will pick up Stevens book to see how he explains it, to really understand network programming. Therefore, whatever language you use, open port send HTTP request and read HTTP response from port. |
|
#4
|
|||
|
|||
|
Yeah!
Thanks, wget does the trick. Actually, what I really needed is summed in one line (Google as an example): Code:
system ("wget -O file.dat http://www.google.com");
Tnx so much! |
|||
| Google The UNIX and Linux Forums |
| Tags |
| regex, regular expressions |
| Thread Tools | |
| Display Modes | |
|
|