|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
Multi html download.
Hello, I have a url list. it is very huge. I want download them concurrently. Aria2c is very good tool for this.(or concurrently curl command) But my server is crash for I/O process. it is very high load. I want all download htmls(Htmls are very small) save to a single text file. Is it possible? Thank you very much. Aria2c command: Code:
aria2c -iurl.txt -j30 url.txt Code:
http://www.domain.com/f34gf345g.html http://www.domain.com/jyjk678.html .... |
| Sponsored Links | ||
|
|
#2
|
||||
|
||||
|
Code:
while read URL
do
wget "$URL" >> download.txt # Downloading URL using wget & appending it to file: download.txt
done < urls_list.dat # Reading from a file: urls_list.dat which has list of URLs |
| Sponsored Links | ||
|
|
#3
|
|||
|
|||
|
Thanks but it is not concurrently download. it is very slow for huge url list.
|
|
#4
|
||||
|
||||
|
Downloading 50 URLs at a time, you can customize as per your requirement:- Code:
seq=1
while read URL
do
wget "$URL" >> download_${seq}.txt &
seq=$( expr $seq + 1 )
mod=$( expr $seq % 50 )
if [ $mod -eq 0 ]
then
wait
fi
done < urls_list.dat
wait
cat download_*.txt > consolidated.txt |
| Sponsored Links | |
|
|
#5
|
|||
|
|||
|
Quote:
Code:
while read line
do
wget ...
done > download.txtwget also has some features which make a loop unnecessary though ![]() wget is able to read a list of files with -i. The -nv option is also useful, to make it still print completed files without printing all the complicated junk wget usually does. Code:
wget -nv -i urls_list.dat > download.txt This should be much faster than calling wget 1000 times since it is able to re-use the same connection if it's connecting to the same site. Concurrency may not be necessary ( and may not be desirable in many cases -- how fast is your connection? ) but if it is, I'd split the list into parts and use wget -i on those parts. |
| Sponsored Links | |
|
|
#6
|
|||
|
|||
|
Thanks. it is very fast. but each file separately downloading to hdd. it is very high load for server. I want downloading but only to single file.
Quote:
|
| Sponsored Links | |
|
|
#7
|
|||
|
|||
|
Since they're in the background, they have to be saved to independent files. It'd be almost impossible to guarantee the order of the output if they weren't. I'd try splitting the file into many chunks for wget -i to handle independently. This will allow them to be concurrent without such an overwhelming number of files. Code:
#!/bin/sh
# Calculate how many lines among n processes, 10 default
MAXPROC=${2:-10}
# Count lines first
LINES=$(wc -l < $1 )
# Divide lines by processes
let LINES=LINES/MAXPROC
# Split file into 10 chunks xaa, xab, ...
split -l $LINES < $1
# Loop over xaa, xab, ...
for FILE in x*
do
# Download one set of files from $FILE into $FILE.out in background
wget -nv -i "$FILE" -O - > $FILE.out 2> $FILE.err &
done
wait # Wait for all processes to finish
# Assemble files in order
cat x*.out
cat x*.err >&2
# Remove temporary files
rm x*Use it like Code:
./multiget.sh filelist 5 2> errlog > output for 5 simultaneous downloads. |
| Sponsored Links | ||
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| How to substract selective values in multi row, multi column file (using awk or sed?) | nricardo | Shell Programming and Scripting | 4 | 10-15-2012 10:13 AM |
| download an html file via wget and pass it to mysql and update a database | mapasainfo | Shell Programming and Scripting | 8 | 05-18-2011 02:24 AM |
| Multi Link Interface Runtime - where to download ? | vilius | AIX | 1 | 07-23-2009 03:04 PM |
| Multi User Multi Task | Reza Nazarian | UNIX for Dummies Questions & Answers | 6 | 04-13-2006 09:23 AM |
| multi-file multi-edit | kielitaide | UNIX for Dummies Questions & Answers | 12 | 06-28-2001 03:12 AM |
|
|