Multi html download.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Multi html download.
# 1  
Old 11-30-2012
Multi html download.

Hello,

I have a url list. it is very huge. I want download them concurrently.
Aria2c is very good tool for this.(or concurrently curl command) But my server is crash for I/O process.
it is very high load. I want all download htmls(Htmls are very small) save to a single text file. Is it possible? Thank you very much.

Aria2c command:
Code:
aria2c -iurl.txt -j30

url.txt
Code:
http://www.domain.com/f34gf345g.html
http://www.domain.com/jyjk678.html
....

# 2  
Old 11-30-2012
Code:
while read URL
do
    wget "$URL" >> download.txt # Downloading URL using wget & appending it to file: download.txt
done < urls_list.dat            # Reading from a file: urls_list.dat which has list of URLs

# 3  
Old 11-30-2012
Thanks but it is not concurrently download. it is very slow for huge url list.
# 4  
Old 11-30-2012
Downloading 50 URLs at a time, you can customize as per your requirement:-
Code:
seq=1
while read URL
do
   wget "$URL" >> download_${seq}.txt & 
   seq=$( expr $seq + 1 )
   mod=$( expr $seq % 50 )
   if [ $mod -eq 0 ]
   then
         wait   
   fi
done < urls_list.dat
wait
cat download_*.txt > consolidated.txt

# 5  
Old 11-30-2012
Quote:
Originally Posted by bipinajith
Code:
while read URL
do
    wget "$URL" >> download.txt # Downloading URL using wget & appending it to file: download.txt
done < urls_list.dat            # Reading from a file: urls_list.dat which has list of URLs

Good use of while read. You can redirect the entire loop instead of reopening download.txt 1000 times though:
Code:
while read line
do
        wget ...
done > download.txt

wget also has some features which make a loop unnecessary though Smilie

wget is able to read a list of files with -i. The -nv option is also useful, to make it still print completed files without printing all the complicated junk wget usually does.

Code:
wget -nv -i urls_list.dat > download.txt

This should be much faster than calling wget 1000 times since it is able to re-use the same connection if it's connecting to the same site. Concurrency may not be necessary ( and may not be desirable in many cases -- how fast is your connection? ) but if it is, I'd split the list into parts and use wget -i on those parts.
# 6  
Old 11-30-2012
Thanks. it is very fast. but each file separately downloading to hdd. it is very high load for server. I want downloading but only to single file.



Quote:
Originally Posted by bipinajith
Downloading 50 URLs at a time, you can customize as per your requirement:-
Code:
seq=1
while read URL
do
   wget "$URL" >> download_${seq}.txt & 
   seq=$( expr $seq + 1 )
   mod=$( expr $seq % 50 )
   if [ $mod -eq 0 ]
   then
         wait   
   fi
done < urls_list.dat
wait
cat download_*.txt > consolidated.txt

# 7  
Old 11-30-2012
Since they're in the background, they have to be saved to independent files. It'd be almost impossible to guarantee the order of the output if they weren't.

I'd try splitting the file into many chunks for wget -i to handle independently. This will allow them to be concurrent without such an overwhelming number of files.

Code:
#!/bin/sh

# Calculate how many lines among n processes, 10 default
MAXPROC=${2:-10}
# Count lines first
LINES=$(wc -l < $1 )
# Divide lines by processes
let LINES=LINES/MAXPROC

# Split file into 10 chunks xaa, xab, ...
split -l $LINES < $1

# Loop over xaa, xab, ...
for FILE in x*
do
        # Download one set of files from $FILE into $FILE.out in background
        wget -nv -i "$FILE" -O - > $FILE.out 2> $FILE.err &
done

wait    # Wait for all processes to finish

# Assemble files in order
cat x*.out
cat x*.err >&2
# Remove temporary files
rm x*

Use it like
Code:
./multiget.sh filelist 5 2> errlog > output

for 5 simultaneous downloads.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with reformat single-line multi-fasta into multi-line multi-fasta

Input File: >Seq1 ASDADAFASFASFADGSDGFSDFSDFSDFSDFSDFSDFSDFSDFSDFSDFSD >Seq2 SDASDAQEQWEQeqAdfaasd >Seq3 ASDSALGHIUDFJANCAGPATHLACJHPAUTYNJKG ...... Desired Output File >Seq1 ASDADAFASF ASFADGSDGF SDFSDFSDFS DFSDFSDFSD FSDFSDFSDF SD >Seq2 (4 Replies)
Discussion started by: patrick87
4 Replies

2. Programming

Multi head/multi window hello world

I am trying to write a large X app. I have successfully modified my xorg.conf to setup 4 monitors on an NVIDIA Quatro5200. I am trying to modify a simple hello world application to open a window on three of the four monitors. depending on the changes to loop the window creation section and event... (2 Replies)
Discussion started by: advorak
2 Replies

3. Shell Programming and Scripting

Download dynamic generated image from HTML page

I've an HTML page where the pie chart is generated with google java code with the required input values in UNIX. The HMTL page is generated in UNIX and then when it loads in browser, the code is interpreted thought internet and the pie chart is generated. This is done by the java code in the... (4 Replies)
Discussion started by: Amutha
4 Replies

4. UNIX for Advanced & Expert Users

Mutt for html body and multiple html & pdf attachments

Hi all: Been racking my brain on this for the last couple of days and what has been most frustrating is that this is the last piece I need to complete a project. There are numerous posts discussing mutt in this forum and others but I have been unable to find similar issues. Running with... (1 Reply)
Discussion started by: raggmopp
1 Replies

5. Shell Programming and Scripting

How to substract selective values in multi row, multi column file (using awk or sed?)

Hi, I have a problem where I need to make this input: nameRow1a,text1a,text2a,floatValue1a,FloatValue2a,...,floatValue140a nameRow1b,text1b,text2b,floatValue1b,FloatValue2b,...,floatValue140b look like this output: nameRow1a,text1b,text2a,(floatValue1a - floatValue1b),(floatValue2a -... (4 Replies)
Discussion started by: nricardo
4 Replies

6. Shell Programming and Scripting

download an html file via wget and pass it to mysql and update a database

CAN I download an html file via wget and pass it to mysql and update a database field? (8 Replies)
Discussion started by: mapasainfo
8 Replies

7. Red Hat

Send HTML body and HTML attachment using MUTT command

Hi there.. I need a proper "mutt" command to send a mail with html body and html attachment at a time. Also if possible let me know the other commands to do this task. Please help me.. (2 Replies)
Discussion started by: vickramshetty
2 Replies

8. AIX

Multi Link Interface Runtime - where to download ?

Hello, I need "devices.common.IBM.ml 1.4.0.0 C F Multi Link Interface Runtime" to be installed on my machine. I need it for two SAN cards to work correctly. Where do I get it ? thanks Vilius (1 Reply)
Discussion started by: vilius
1 Replies

9. UNIX for Dummies Questions & Answers

Multi User Multi Task

Dear Experts Why we always hear that unix operating system is Multi User and Multi task. What does these two means. I have looked at some books and documents but couldn't find aclear explenation. Can we say Windows operating system is also multi user and multi task?? Thanks for your help in... (6 Replies)
Discussion started by: Reza Nazarian
6 Replies

10. UNIX for Dummies Questions & Answers

multi-file multi-edit

Good day! I am trying to learn how to use the "sed" editor, to perform multiple edits on multiple files in multiple directories. I have one script that tries to call up each file and process it according to the edits listed in a second script. I am using a small input text to test these, at... (12 Replies)
Discussion started by: kielitaide
12 Replies
Login or Register to Ask a Question