Visit Our UNIX and Linux User Community


Need help regarding HTTP parsing


 
Thread Tools Search this Thread
Top Forums Programming Need help regarding HTTP parsing
# 1  
Old 08-04-2009
Need help regarding HTTP parsing

Hi..

I've got a program that can connects to a remote server and displays some garbage value and closes the connection.

The code goes like this

Code:
#include <stdio.h>
#include <netdb.h>
#include <netinet/in.h>
#include <string.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <errno.h>


int main(int argc,char argv[])
{
int connectres,bytes,i=0,sock,len;
struct sockaddr_in connto;

struct hostent *server;
char source[2048];
char buffer[256];
char *outmsg="GET  / HTTP/1.1";
sock=socket(AF_INET,SOCK_STREAM,0);
connto.sin_family=AF_INET;
connto.sin_port=htons(80);
connto.sin_addr.s_addr=inet_addr("74.125.67.100");

if(connectres=connect(sock,(struct sockaddr*)&connto,sizeof(struct sockaddr))==-1)
{
perror("unable to connect");
return -3;
}

printf("connection successful \n");
len=strlen(outmsg);
bytes=send(sock,outmsg,len,0);
printf("bytes sent are %d \n",bytes);

bzero(buffer,256);

do
{
i=recv(sock,buffer,sizeof(buffer),0);
printf("still recieving data \n");
strcat(source,buffer);
printf("%s",source);
bzero(buffer,256);
}while(i!=0);

//closing socket
printf("closing socket \n");

close(sock);
}

can someone pls help me out...
I've really got no clue as to wat to do????

Last edited by vino; 08-05-2009 at 03:15 AM.. Reason: Added code tags
# 2  
Old 08-04-2009
recv() doesn't add null terminators to anything, you can't just use strcpy on raw binary data.
# 3  
Old 08-11-2009
Thanks for your advice.

I came up with the following program

Code:
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <arpa/inet.h>

#define MAXMESS 9999

int main(int argc, char** argv)
{
        struct sockaddr_in servaddr;
        struct hostent *hp;
        int sock_id,i=0;
        char message[MAXMESS];
        char msglen;
        char request[] = "GET /about.html HTTP/1.1\r\n               Host:www.google.com\r\n\r\n";



        //Get a socket
        if((sock_id = socket(AF_INET, SOCK_STREAM, 0)) == -1)
        {
                fprintf(stderr,"Couldn't get a socket.\n");
                exit(EXIT_FAILURE);
        }
        else
        {
                fprintf(stderr,"Got a socket.\n");
        }

        //book uses bzero which my man pages say is deprecated
        //the man page said to use memset instead. :-)
        memset(&servaddr,'\0',sizeof(servaddr));

        //get address for google.com
        if((hp = gethostbyname("www.google.com")) == NULL)
        {
             fprintf(stderr,"Couldn't get an address.\n");
             exit(EXIT_FAILURE);
}
        else
       {
        fprintf(stderr,"Got an address.\n");
        }

        //bcopy is deprecated also, using memcpy instead
        memcpy((char *)&servaddr.sin_addr.s_addr, (char *)hp->h_addr, hp->h_length);

        //fill int port number and type
        servaddr.sin_port = htons(80);
        servaddr.sin_family = AF_INET;

        //make the connection
        if(connect(sock_id, (struct sockaddr *)&servaddr, sizeof(servaddr)) != 0)
        {
          fprintf(stderr, "Connection error.\n");
        }
        else
        {
           fprintf(stderr,"Got a connection!!!\n");
        }

        //NOW THE HTTP PART!!!

        //send the request
         read(sock_id,request,strlen(request));

       //read the response
      while(i!=0)
     {
        write(sock_id,message,9999);
        printf("%s",message);
        }
   return 0;
   }

but i am still not able to display the source code of the webpage.Can someone tell me wat i'm doing wrong.
# 4  
Old 08-11-2009
You are using 'read' to send and 'write' to receive, this seems backwards.

And you still have not added any null terminators! printf will not know how long the string is! And the receiving loop will loop infinitely since you never change i. Actually, the receive loop will never start since i starts at zero, but would loop infinitely if it wasn't.

Code:
ssize_t len=recv(sock_id, message, MAXMESS-1);
  if(len <= 0) break;
  message[len]='\0'; // Null terminator!  printf needs one!  It says where the string ends!
  printf("%s\n", message);


Last edited by Corona688; 08-11-2009 at 12:14 PM..
# 5  
Old 08-12-2009
Thanks for the reply.I eventually came up with this program as a result of a few modifications

Code:
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <arpa/inet.h>

#define MAXBUFF 99999

int main(int argc, char** argv)
{
         struct sockaddr_in servaddr;
         struct hostent *hp;
         int sock_id,i=1;
         int len;
         char buffer[MAXBUFF],message[9999];
         char msglen;
         char request[] = "GET /en/index.html HTTP/1.1\r\nHost:bandvalley.com\r\nUser-Agent:HTMLGET 

1.1\r\n\r\n";

                                       //Get a socket

        if((sock_id = socket(AF_INET, SOCK_STREAM, 0)) == -1)
           {
                fprintf(stderr,"Couldn't get a socket.\n");
                exit(EXIT_FAILURE);
           }

        else
        {
               fprintf(stderr,"Got a socket.\n");
        }

                                       //book uses bzero which my man pages say is deprecated
                                       //the man page said to use memset instead. :-)
        memset(&servaddr,'\0',sizeof(servaddr));

                                      //get address for bandvalley.com
        if((hp = gethostbyname("174.36.228.144")) == NULL)
        {
            fprintf(stderr,"Couldn't get an address.\n");
            exit(EXIT_FAILURE);
        }
        else
        {
           fprintf(stderr,"Got an address.\n");
        }

                                       //bcopy is deprecated also, using memcpy instead
        memcpy((char *)&servaddr.sin_addr.s_addr, (char *)hp->h_addr, hp->h_length);

                                        //fill int port number and type
        servaddr.sin_port = htons(80);
        servaddr.sin_family = AF_INET;

                                        //make the connection
        if(connect(sock_id, (struct sockaddr *)&servaddr, sizeof(servaddr)) != 0)
        {
          fprintf(stderr, "Connection error.\n");
        }
        else
        {
           fprintf(stderr,"Got a connection!!!\n");
        }

                                          //NOW THE HTTP PART!!!

                                          //send the request

      send(sock_id,request,strlen(request),0);



     //read the response
     if(i==1)
     {
      do
      {
      len=recv(sock_id,buffer,MAXBUFF-1,0);
      buffer[len]='\0';
      printf("%s",buffer);
      }while(i=1);
     }
     else
    {
      return 0;
      }
    }

but it still doesnt get the last two lines of code.I know my mistake also.
i's value never changes.But I dont know the exact size of the page source so how do I correct this error.

Any advices on this???
# 6  
Old 08-12-2009
You don't have to know the exact size of the page. You just keep receiving data until recv() returns less than or equal to zero.

I set the null terminator after checking its return value since buffer[-1]='\0' writes out of array bounds. This may cause a segmentation fault, or at the very least, mess up other variables.

Previous Thread | Next Thread
Test Your Knowledge in Computers #168
Difficulty: Easy
The OSI networking model uses 6 layers and the TCP/IP protocol suite uses a 4 layer model.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk script to find time difference between HTTP PUT and HTTP DELETE requests in access.log

Hi, I'm trying to write a script to determine the time gap between HTTP PUT and HTTP DELETE requests in the HTTP Servers access log. Normally client will do HTTP PUT to push content e.g. file_1.txt and 21 seconds later it will do HTTP DELETE, but sometimes the time varies causing some issues... (3 Replies)
Discussion started by: Juha
3 Replies

2. Web Development

HTTP Headers Reference: HTTP Status-Codes

Hypertext Transfer Protocol -- HTTP/1.1 for Reference - HTTP Headers 10 Status Code Definitions Each Status-Code is described below, including a description of which method(s) it can follow and any metainformation required in the response. (1 Reply)
Discussion started by: Neo
1 Replies

3. Shell Programming and Scripting

Parsing the http post request

Hi, I am trying to write a shell script to parse the post request data that it received to a xml file. Below is the post request data that script is receiving. -----------------------------7dd2339190c8e Content-Disposition: form-data; name="param1" 1... (2 Replies)
Discussion started by: jdp
2 Replies

4. Shell Programming and Scripting

sending http url through http socket programming..

hi am senthil am developing a software to send and receive SMS using HTTP connection first of all am forming a URL and sending that URL to a remote server using my Client Program i send that url through Socket(using Send() Function) if i send more than one URL one by one using the same... (4 Replies)
Discussion started by: senkerth
4 Replies

5. Programming

sending http url through http socket programming..

hi am senthil am developing a software to send and receive SMS using HTTP connection first of all am forming a URL and sending that URL to a remote server using my Client Program i send that url through Socket(using Send() Function) if i send more than one URL one by one using the same... (0 Replies)
Discussion started by: senkerth
0 Replies

6. Linux

Dynamic HTTP

hi guys, please i would like to know how to cache dynamic websites, on squid 3.1 thanks (1 Reply)
Discussion started by: zazoo
1 Replies

7. Shell Programming and Scripting

Parsing of file for Report Generation (String parsing and splitting)

Hey guys, I have this file generated by me... i want to create some HTML output from it. The problem is that i am really confused about how do I go about reading the file. The file is in the following format: TID1 Name1 ATime=xx AResult=yyy AExpected=yyy BTime=xx BResult=yyy... (8 Replies)
Discussion started by: umar.shaikh
8 Replies

8. Shell Programming and Scripting

Perl parsing compared to Ksh parsing

#! /usr/local/bin/perl -w $ip = "$ARGV"; $rw = "$ARGV"; $snmpg = "/usr/local/bin/snmpbulkget -v2c -Cn1 -Cn2 -Os -c $rw"; $snmpw = "/usr/local/bin/snmpwalk -Os -c $rw"; $syst=`$snmpg $ip system sysName sysObjectID`; sysDescr.0 = STRING: Cisco Internetwork Operating System Software... (1 Reply)
Discussion started by: popeye
1 Replies

9. UNIX for Advanced & Expert Users

http

how to downloaad a web page using http server (0 Replies)
Discussion started by: krishnavel
0 Replies

10. Shell Programming and Scripting

HTTP Query Request & Parsing returned XML output

I have a PERL script from which I need to make a HTTP request to Web Servlet (Essentially a URL with variables and values like &Variable1=AAAAAA&Variable2=BBBBBBBBB&Variable3=CCCCCCC). The Web servlet returns an XML result which needs to be parsed for the contents of the result within the program.... (15 Replies)
Discussion started by: jerardfjay
15 Replies

Featured Tech Videos