Curl parallel download file list


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Curl parallel download file list
# 1  
Old 05-21-2017
Curl parallel download file list

Hello guys, first post sorry if I did some mess here =)

Using Ubuntu 14.04lts 64bits server version.

I have a list (url.list) with only URLs to download, one per line, that looks like this:
Code:
http://domain.com/teste.php?a=2&b=3&name=1
http://domain.com/teste.php?a=2&b=3&name=2
...
http://domain.com/teste.php?a=2&b=3&name=30000

As you can see, there are many lines in the file (in this case 30000). Because of that I'm using a trick to download many URLs simultaneosly with this:
Code:
cat url.list | xargs -n 1 -P 10 <<MAGIC COMMAND THAT WILL SAVE ME>>

The problem is that I'd like to rename the output file with the same value of the name field, like: 1.html, 2.html, ..., 30000.html ecc, and use curl to limit the size of the file to 50KB. So the curl command should be something like:
Code:
curl -r 0-50000 -L $URL -o $filename.html -a $filename.log

How can I have it done? Smilie

I can parse the output of the pipe with echo $URL | sed -n -e 's/^.*name=//p' but I don't know how use this in the same line grabbing the output of a pipe in 2 variables ($URL and $filename).

I tried this with no success:
Code:
cat url.list | xargs -n 1 -P 10 | filename=$(sed -n -e 's/^.*name=//p') ; curl -r 0-50000 -L $URL -o $filename.html -a $filename.log

Thank you in advance,
tonispa

Last edited by tonispa; 05-21-2017 at 01:58 PM..
# 2  
Old 05-22-2017
Did you try reading that file:
Code:
while IFS="?&=" read URL X X X X X FN REST; do echo $FN, $URL; done <url.list
1, http://domain.com/teste.php
2, http://domain.com/teste.php
, ...
30000, http://domain.com/teste.php

The Xes are dummy variables. Instead of the echo, put in your magic command. There have been threads on "parallel" execution with some tricks; use the search function in here.
This User Gave Thanks to RudiC For This Post:
# 3  
Old 05-22-2017
Thank you so much for your help @RudiC , I'll try your tips this night and post here back. I walked a little yesterday with this codem and figured out how to use xargs to "parallelize" the jobs to curl:
Code:
xargs -n 1 -P 10 curl -s -r 0-50000 -O < url.list

But the problem is that I can't rename the file as I want. So what I did is cd to my destination directory path, and then I run the code above. But I notice that if there is some similar filenames in differents URLs the first file is overwrote by the last one. Because of that, if I want to keep the same destination directory, be able to rename the output is mandatory.
# 4  
Old 05-22-2017
How about something like this:

Code:
#!/bin/bash
fetch_url () {
   URL=$@
   filename=${URL##*=}

   curl -r 0-50000 -L "$URL" -o ${filename}.html -a ${filename}.log
}

export -f fetch_url

xargs -n 1 -P 10 fetch_url < url.list


Last edited by Chubler_XL; 05-27-2017 at 02:50 PM..
These 2 Users Gave Thanks to Chubler_XL For This Post:
# 5  
Old 05-22-2017
Code:
perl -nle '/(\d+)$/ and print "$_ -o $1.html -a $1.log"' url.list | xargs -I {} -P 10 curl -r 0-50000 -L "{}"

This User Gave Thanks to Aia For This Post:
# 6  
Old 05-25-2017
Quote:
Originally Posted by Chubler_XL
How about something like this:
Code:
fetch_url () {
   URL=$@
   filename=${URL##*=}
   curl -r 0-50000 -L "$URL" -o ${filename}.html -a ${filename}.log
}
export -f fetch_url
xargs -n 1 -P 10 fetch_url < url.list

Thank you for your reply! I did not make this work. I tried to write a script only for this function and call it, tried put "inline" command inside a screen, and always is the same error:

xargs: fetch_urlxargs: fetch_urlxargs: fetch_url: No such file or directory: No such file or directory

Do you know how to solve this?
# 7  
Old 05-27-2017
export -f is a bash feature and I use it here to insure the internal function fetch_url is exported to sub shells. This is needed as xargs is an external command to the shell and runs the assembled commands in a new shells.

I assumed, as you were using GNU xargs (-P feature is a GNU extension), that you were also using the bash shell. I've updated my original post to specify the required shell, and this is all you may need to do to get you version working.

However, if you do not wish to use bash, you could put your function in an external script so that it can be called from xargs for example:

$HOME/bin/fetch_url:
Code:
#!/bin/sh
URL=$@
filename=${URL##*=}
curl -r 0-50000 -L "$URL" -o ${filename}.html -a ${filename}.log

And from another script (or the command line) you can call this with:
Code:
xargs -n 1 -P 10 $HOME/bin/fetch_url < url.list


Last edited by Chubler_XL; 05-27-2017 at 02:54 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Curl , download file with user:pass in bash script

Hello, My question is about curl command. (ubuntu14.04) In terminal, I am able to download my mainfile with: curl -u user1:pass1 http://11.22.33.44/******* When I convert it into bash script like this: #!/bin/bash cd /root/scripts computer_ip=11.22.33.44 curl -u $1:$2... (8 Replies)
Discussion started by: baris35
8 Replies

2. Shell Programming and Scripting

Curl to download file from subdivx.com after following location without knowing the file name/extens

This question could be specific to the site subdivx.com In the past, I've been able to download a file following location using cURL but there is something about subdivx.com that's different and can't figure out how to get it to work. I tried the following directly in the terminal with no... (5 Replies)
Discussion started by: MoonD
5 Replies

3. UNIX for Beginners Questions & Answers

Curl to download using Linux not working

Good Morning. I'm trying to download a file from a server. I was able to upload to the server successfully but when i download, i see the file name in my server but with some unknow data. The file name i'm trying to download is abcd.zip in binary mode. curl -1 -v --ftp-pasv -o abcd.zip -u... (4 Replies)
Discussion started by: Pavan Kumar19
4 Replies

4. Shell Programming and Scripting

Curl command to download multiple files with a file prefix

I am using the below curl command to download a single file from client server and it is working as expected curl --ftp-ssl -k -u ${USER}:${PASSWD} ftp://${HOST}:${PORT}/path/to/${FILE} --output ${DEST}/${FILE} let say the client has 3 files hellofile.101, hellofile.102, hellofile.103 and I... (3 Replies)
Discussion started by: r@v!7*7@
3 Replies

5. Shell Programming and Scripting

Curl ftp ssl download files

Hello all, I have been struggling with this issue on and off for a couple of weeks now and I just got it all working, so I wanted to share my findings in case some other poor soul needs to know how. First some background on what I'm doing. I am uploading files to different directories based on... (0 Replies)
Discussion started by: msjkadams
0 Replies

6. Shell Programming and Scripting

Curl download zip extract large xml file

Hi i have a php script that works 100% however i don't want this to run on php because of server limits etc. Ideally if i could convert this simple php script to a shell script i can set it up to run on a cron. My mac server has curl on it. So i am assuming i should be using this to download the... (3 Replies)
Discussion started by: timgolding
3 Replies

7. Shell Programming and Scripting

How to download file without curl and wget

Hi I need a Shell script that will download a zip file every second from a http server but i can't use neither curl nor wget. Can anyone will help me go about this task ??? Thanks!! (1 Reply)
Discussion started by: rubber08
1 Replies

8. UNIX for Advanced & Expert Users

Help with using curl to download files from https

Hi I'm trying to download an xml file from a https server using curl on a Linux machine with Ubuntu 10.4.2 I am able to connect to the remote server with my username and password but the output is only "Virtual user <username> logged in". I am expecting to download the xml file. My output... (4 Replies)
Discussion started by: henryN
4 Replies

9. Shell Programming and Scripting

Download using curl and redirect question_please help

Basically I am needing to Download (using curl) in the background some data from(link here), with stderr redirected to /dev/null, to a file named taxcode I was doing this, curl & http:// name here/download/pls/Title_26.txt 2> /dev/null > taxcode but the results were not what I was after. ... (1 Reply)
Discussion started by: santod
1 Replies

10. UNIX for Dummies Questions & Answers

cURL Active FTP Download

Hello, I know this is probably a very silly question for most but how to do I force curl to do active FTP downloads? Thank you Dallas (2 Replies)
Discussion started by: Dallasbr
2 Replies
Login or Register to Ask a Question