Use your image_search_download.py for Flicker

Question

Use your image_search_download.py for Flicker

cygcsh opened this issue 6 years ago · comments

Hi Master I Finners :
First,please forgive me to to used your code for trying to download the image from Flicker,
I think it will working,But It don't-My code got No image.So Would you please check my problem?
And also let me show my great appreciation to you for your all solutions for "Automate the Boring stuff with Python",It helps me a lot,thank you very much.
Sincerely

                                                                                                                                         cygcsh

Please see the attached file,I think image_elem with soup.select( )can't not got the image.
Flicker_image_downloader.txt

Ian Findlay · Answer 1 · Tue Oct 16 2018 07:16:20 GMT+0800 (China Standard Time)

Hi, thanks for reaching out. I'm glad my solutions are helping you.

These were written a long time ago so may simply be broken at this point. Scanning your code, I can see two potential problems:

I don't think Flickr allows you to search by extension and if it does I'm pretty sure it isn't by adding ' ext:' to the URL. Try removing all the bits that mention extensions and make that line simply:

url = 'https://www.flickr.com/search/?text=' + search

When using 'Inspect Element' to get information for BeautifulSoup, class names that appear to have spaces don't - it should actually be something like this:

image_elem = soup.select('.view.photo-list-photo-view.awake')

No idea if this will get it working but hopefully it can get you a bit closer. I'll keep this issue open so feel free to get back to me about it if you're still struggling. If you have questions about any of the other AtBS projects, please try to make sure they are raised in the correct repository.

cygcsh · Answer 2 · Tue Oct 16 2018 10:12:34 GMT+0800 (China Standard Time)

1.Thanks you for your response,I follow your instruction,then it goes out like this:

Enter desired search term(s): robot
Traceback (most recent call last):
File "C:/Users/Sc698/Desktop/Python project/Python automate/image_search_download1.py", line 28, in
downloaded = image_downloader()
File "C:/Users/Sc698/Desktop/Python project/Python automate/image_search_download1.py", line 17, in image_downloader
image_url_s = 'https://www.flickr.com' + image_elem[i].get('url')
TypeError: must be str, not NoneType

2.then I made a modification:
image_elem = soup.select('.view.photo-list-photo-view.awake background-image')

3.Then It still got no image,and I got the result like this:
Enter desired search term(s): robot
No images found.

Process finished with exit code 0

I hope this question won't bother you to much and waste your time.Thanks.
Sincerely
cygcsh

Flicker_image_downloader1.txt

Ian Findlay · Answer 3 · Wed Oct 17 2018 01:15:45 GMT+0800 (China Standard Time)

At the bottom of this message, there is a link to a Gist with a fixed version of your code and comments explaining what I have done. However, before reading that here's some pointers so that you might get there on your own:

Think about the TypeError and what it means that it happened:

Clearly, the 'https...' bit is a string so that means that the image_elem[i].get('url') portion must be NoneType.
The image_elem[i] must exist or the for loop wouldn't have been entered, so you know that BS4 was able to find some elements by selecting '.view.photo-list-photo-view.awake'.
When you changed this to '.view.photo-list-photo-view.awake background-image' you were back to finding no images - meaning that the soup.select() was no longer finding anything. Is this a step forward or back?
What made you want to access the background-image section of the class and, more importantly, is it something you can access individually or part of a larger section of the class?

The image URL:

If you manually click through to the link under background-image in the class, what does the URL of the image actually look like?
Can you just slice away the last 6 characters of every single one of those URLs without breaking them?
Adding '_b' would download the large version of the images (I do this in the Gist as well) but what about those that don't have a large size? Once you get it working, checking that the '_b' file download worked and if not trying again with a slightly different URL could be a nice addition.

Finally, you've picked a challenging site to download images from. The way Imgur worked when i did this meant that I could download all of the files. With Flickr there are ~25 files available to download as you will see when you fix your code/use mine. The rest are essentially hidden from BeautifulSoup even though they appear on the screen due to how the site uses Javascript - You can see this when viewing the source code manually. You could get all the images, but it would be more complicated and probably involve automation software like Selenium (which AtBS does have a chapter on).

Here is the fixed version, best of luck.

cygcsh · Answer 4 · Wed Oct 17 2018 11:31:18 GMT+0800 (China Standard Time)

Thanks a lot ,Master IFinners,It works very well,Finally I can get a normal sleep.
Be honestly ,I am not a professional programmer,just a layman ,but a business man ,
This is my first code-learning of my life,I hope learning from AtBs really could help me save more time at dealing with some file managements ,Now I will dive into Ch12 of AtBs soon,Maybe I will write some "practical code" relate to my work, then I deeply hope can get more your instructions and helps,Thanks.
Sincerely
cyg.csh

Ian Findlay · Answer 5 · Wed Oct 17 2018 17:51:56 GMT+0800 (China Standard Time)

No problem, glad to have been some help. I'll close this issue but in the future feel free to open another over at the AtBS repository.