Sponsored Content
Top Forums Shell Programming and Scripting ery weird wget/curl output - what should I do? Post 302554936 by alister on Tuesday 13th of September 2011 12:13:43 PM
Old 09-13-2011
I too was able to replicate the observed behavior. Corona688 is correct in that it's not an attempt to deny access to the file. It's either apache or wget being stupid. I cannot confirm at the moment which since the wget header dump only included the server side of the conversation (@#$@@#?).

In any case, this is what's happening.

When Firefox requests the file, it indicates that it accepts gzip encoding. When wget or curl ask for it, they do not indicate this. In a bizarre attempt to be helpful, instead of sending you the compressed text file, or redirecting, or refusing to comply, the webserver sends you plain text.

That in itself seems foolish, as depending on the client headers you may download a .gz file that may or may not be a gzip'd file. Meanwhile, the Content-Type header always indicates "application/x-gzip".

(We're just getting warmed up.)

The server response, in the Content-Length header, indicates that the data (you know, the gzip'd text which is actually gunzip'd text) that it's sending you is 13258 bytes long. In its infinte wisdom, their Apache decides to close the connection one byte short of the advertised size.

(Just when you think things couldn't get more messed up ...)

When wget reconnects to finish the transfer, their webserver begins sending at the byte offset requested, but in the original, gzip compressed data file ... and continues to send until the end of that compressed data. This is why you see an identical file size that begins with text followed by "garbled data".

Using dd to skip the first 13257 bytes in the mangled file, I used cmp to compare the remaining bytes with their counterparts in the file downloaded from Firefox. They were identical.

So, in the end, the transfer received is not the 13258 bytes advertised by the first server response, but the 86777 bytes file size of the gzip'd compressed file with the first 13257 bytes as uncompressed text and the remainder as gzip'd data.

Long story short: Tell Apache that you can handle gzip'd data. Using curl, the following option works around the problem:
Code:
-H 'Accept-Encoding: gzip'

Regards,
Alister

---------- Post updated at 12:13 PM ---------- Previous update was at 11:52 AM ----------

Quote:
Originally Posted by jstilby
curl also gives incorrect output - only the text of the first message. it probably tosses out the garbled binary data.
Nah. curl is simply not retrying after the webserver closes the connection. Both curl and wget are sent plain text before the connection closes. Only wget reconnects and begins receiving gzip'd data.

Regards and welcome to the forum,
Alister

Last edited by alister; 09-13-2011 at 01:20 PM..
This User Gave Thanks to alister For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help needed in Curl & Wget

We are trying to invoke a https service from our unix script using curl command. The service is not getting invoked because it is SSL configured. Bypassing certification (using curl –k) does not work. curl -k https://site curl -k -x IP:Port https://site curl -k -x IP:443 https://id:pwd@site ... (0 Replies)
Discussion started by: dineshbabu01
0 Replies

2. Shell Programming and Scripting

Proxy with curl/wget support

I need a proxy that would enable me to use cli curl/wget with another ip address. How do I find a paid proxy server that supports curl/wget? (1 Reply)
Discussion started by: locoroco
1 Replies

3. Shell Programming and Scripting

Specifying IP address with curl/wget

Hello, I am wondering does anyone know of a method using curl/wget or other where by I could specify the IP address of the server I wish to query for a website. Something similar to editing /etc/hosts but that can be done directly from the command line. I have looked through the man pages... (4 Replies)
Discussion started by: colinireland
4 Replies

4. Shell Programming and Scripting

How to download file without curl and wget

Hi I need a Shell script that will download a zip file every second from a http server but i can't use neither curl nor wget. Can anyone will help me go about this task ??? Thanks!! (1 Reply)
Discussion started by: rubber08
1 Replies

5. Shell Programming and Scripting

Encapsulating output of CURL and/or WGET

i use curl and wget quite often. i set up alarms on their output. for instance, i would run a "wget" on a url and then search for certain strings within the output given by the "wget". the problem is, i cant get the entire output or response of my wget/curl command to show up correctly in... (3 Replies)
Discussion started by: SkySmart
3 Replies

6. Shell Programming and Scripting

Wget vs Curl - Proxy issue

Hi, My script needs to crawl the data from a third party site. Currently it is written in wget. The third party site is of shared interface with different IP addresses. My wget works with all the IP address but not with one. Whereas the curl is able to hit that IP address and comes out... (2 Replies)
Discussion started by: sathyaonnuix
2 Replies

7. Shell Programming and Scripting

Wget/curl credentials validation

Experts, I login to a 3rd party and pull some valuable information with my credentials. I pass my credentials via --post-data in wget. Now my Account is locked. I want my wget to alert that the Account is locked. How can i achieve this. My idea is, get the Source page html from the... (2 Replies)
Discussion started by: sathyaonnuix
2 Replies

8. Shell Programming and Scripting

How to get content of a webpage Curl vs Wget?

Hello, What I am trying to do is to get html data of a website automatically. Firstly I decided to do it manually and via terminal I entered below code: $ wget http://www.***.*** -q -O code.html Unfortunately code.html file was empty. When I enter below code it gave Error 303-304 $... (1 Reply)
Discussion started by: baris35
1 Replies

9. Shell Programming and Scripting

Wget and curl to post data

i'm using this command to post data to a remote host: wget --post-data="My Data" http://<my-ip>:80 -O /dev/null -q and curl --data "My Data" http://<my-ip>:80 however, when i run the above, i see the following in my access log on the remote host: Wget: 10.10.10.10 - - "POST /... (1 Reply)
Discussion started by: SkySmart
1 Replies

10. Web Development

Wget/curl and javascript

What can I use instead of wget/curl when I need to log into websites that use javascript? Wget and curl don't handle javascript. (6 Replies)
Discussion started by: locoroco
6 Replies
All times are GMT -4. The time now is 05:52 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy