Downloading jpgs from a gallery type website


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Downloading jpgs from a gallery type website
# 1  
Old 04-29-2014
Downloading jpgs from a gallery type website

Can someone explain what this does step by step? I found this script on stackoverflow and want to customize it for personal use in downloading jpg images from a website.

Code:
# get all pages 
curl 'http://domain.com/id/[1-151468]' -o '#1.html' 

# get all images 
grep -oh 'http://pics.domain.com/pics/original/.*jpg' *.html >urls.txt 

# download all images 
sort -u urls.txt | wget -i-

1. What I think the first like does is download the pages of domain with curl but what does the '#1.html' mean?

2. Why in .*jpg is the * after the '.'? Also what is this trying to do? I attempted altering this using a different website but there's an error grep: *.html: No such file or directory even though the first command is downloading the html files just fine.

3. I think the third option is just organizing the results and wget goes to the jpg's website and downloads the jpgs.

Moderator's Comments:
Mod Comment Code tags for code, please.

Last edited by Corona688; 04-29-2014 at 06:47 PM..
# 2  
Old 04-29-2014
1) From man curl:

Code:
       -o/--output <file>
              Write output to <file> instead of stdout. If you are using {} or
              []  to  fetch  multiple documents, you can use '#' followed by a
              number in the <file> specifier. That variable will  be  replaced
              with the current string for the URL being fetched. ...

So it replaces #1 with the number of the page in question.

2) Because it's a regex, not a glob. In a regex, * means "zero or more of the previous character", and . means "any character". So .*jpg means "any string ending in jpg".

3) Yes, it sorts them to download in-order. Possibly not very well since it's just a random pile of URL's but order doesn't matter too much here anyway.
# 3  
Old 04-29-2014
I am having the most trouble with step two I think.

I'm assuming the -oh is the option -o and -h?

Step 1 downloads the files fine but they're stored in my directory as 1.html *dot* , 2.html *dot* etc, with a dot right after (not a period), not sure if this is a problem but Step 2 doesn't seem to be able to find any .html files

and so Step 3 fails because there is no urls.txt. What could be the problem?

---------- Post updated at 05:24 PM ---------- Previous update was at 04:55 PM ----------

I think I might have found a different problem actually.

Running this in terminal works fine
Code:
wget -nd -H -p -A jpg,jpeg,png,gif -e robots=off www.url.example

but when I put this in my bash script and run it I get awaiting response... 404 Not Found. At the end of the jpg url, for some reason a %0D gets appended to the end which I'm thinking makes wget go to the wrong url.

I've been trying a different approach than my earlier one since I couldn't get that working. What could be the problem now so that I can automate the downloading?

Last edited by workisnotfun; 04-29-2014 at 08:13 PM..
# 4  
Old 04-29-2014
%OD is a carriage return has probably been appended to your files by Windows, did you edit this file on windows and transfer to unix?

dos2unix filename from unix should remove these extra characters
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Wget error while downloading from https website

Hi, I would like to download a file from a https website. I don't have the file name as it changes every day. I am using the following command: wget --no-check-certificate -r -np --user=ABC --password=DEF -O temp.txt https://<website/directory> I am getting followin error in my... (9 Replies)
Discussion started by: pinnacle
9 Replies

2. Windows & DOS: Issues & Discussions

Downloading a file from Website to a Windows Folder

Hi, Is it possible to download a file using Wget or some other command from a Windows machine? Say I want to download something from https server to C:\ABC\abc.xls Any ideas, Thanks. (4 Replies)
Discussion started by: dohko
4 Replies

3. Shell Programming and Scripting

File Management: How do I move all JPGS in a folder structure to a single folder?

This is the file structure: DESKTOP/Root of Photo Folders/Folder1qweqwasdfsd/*jpg DESKTOP/Root of Photo Folders/Folder2asdasdasd/*jpg DESKTOP/Root of Photo Folders/Folder3asdadfhgasdf/*jpg DESKTOP/Root of Photo Folders/Folder4qwetwdfsdfg/*jpg DESKTOP/Root of Photo... (4 Replies)
Discussion started by: guptaxpn
4 Replies

4. Shell Programming and Scripting

Downloading info from website to database

Hi guys! I created a database using mysql in bash now i would like to download weather info from the data.(temp, date and time)...and just store this in the database to display after every 3 hours or so... i have tried to get the website using wget and now dont exactly now how to go from here... (0 Replies)
Discussion started by: vadharah
0 Replies

5. Programming

array type has incomplete element type

Dear colleagues, One of my friend have a problem with c code. While compiling a c program it displays a message like "array type has incomplete element type". Any body can provide a solution for it. Jaganadh.G (1 Reply)
Discussion started by: jaganadh
1 Replies

6. Shell Programming and Scripting

String type to date type

Can one string type variable changed into the date type variable. (1 Reply)
Discussion started by: rinku
1 Replies

7. UNIX for Dummies Questions & Answers

Lynx - Downloading - extension handling - changing mime type?

Using Lynx, when I try to download a .rar, it confirms I want to download and its got it as an appication/rar file. However, split archives that end in .r## (.r00, .r01 ...) are not recognized as an appication/rar file and it reads the file like a .txt or .html. How can I fix this? Thanks! (2 Replies)
Discussion started by: yitzle
2 Replies

8. UNIX Desktop Questions & Answers

Recommendations for good shell utility to resize JPGs?

Not sure if this is the right place to be posting this. If not, let me know where it fits. I am running RedHat Linux 8.0. I've recently acquired a Sony Mavica digital camera. In short, this thing is awesome (uses cd-rw!). I have been taking high quality images, but I now have the need to try... (2 Replies)
Discussion started by: deckard
2 Replies
Login or Register to Ask a Question