How to Extract Houzz Product Images with Python & BeautifulSoup

by Emma Dyer
Posted: Aug 21, 2021

Introduction

Here, we will observe how to extract Houzz products images data with Python & a BeautifulSoup easily. The purpose of this blog is to start on solve the real-world problems while keeping it easy to help you to become comfortable and also get practical results immediately.

So, the initial entity we wish is to make sure that you have Python 3 installed. So, first of all, install Python 3.

Then, install BeautifulSoup with:

pip3 install beautifulsoup4

We will require LXML, library request, and soupsieve to scrape data, break it into XML, and also utilize the CSS selectors. After that, install them with...

pip3 install requests soupsieve lxml

Once it is installed, open an editor and type:

coding: utf-8 -*- from bs4 import BeautifulSoup import requests

After that, let's visit a Houzz list page and analyze data, which we can scrape.

It will look like this:

Let’s think about the code and try and get data by fantasizing that we are having a browser including that:

coding: utf-8 -*- from bs4 import BeautifulSoup import requests headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'} url = 'https://www.houzz.in/photos/kitchen-design-ideas-phbr0-bp~t_26043' response=requests.get(url,headers=headers) soup=BeautifulSoup(response.content,'lxml')

After that, save this with the name scrapeHouzz.py.

Whenever, you run it:

python3 scrapeHouzz.py

You will observe the whole HTML page.

After that, let's make use of CSS selectors to get the necessary data. To do that, let's utilize Chrome and open the inspect tool.

Now, we have observed that all the individual product data are restricted in a div having class 'hz-space-card'. So, we can extract it using CSS selector '.hz-space-card' easily and that’s how its code would look like:

coding: utf-8 -*- from bs4 import BeautifulSoup import requests headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'} url = 'https://www.houzz.in/photos/kitchen-design-ideas-phbr0-bp~t_26043' response=requests.get(url,headers=headers) soup=BeautifulSoup(response.content,'lxml') for item in soup.select('.hz-space-card'): try: print('----------------------------------------') print(item) except Exception as e: #raise e b=0

This prints content in different containers that grasp the product data.

After that, let’s choose classes within the rows that has the required data. We have observed that the title is inside this class hz-space-card__photo-title, the image in hz-image, and more. Therefore, now it looks like that when we will try and get Titles, user names, images, and links for that.

source code: https://www.retailgators.com/how-to-extract-houzz-product-images-with-python-and-beautifulsoup.php

About the Author

ECommerce Web Scraping Tools & Services | Retailgators USA, UK, Australia, UAE, Germany. https://www.retailgators.com/index.

Rate this Article

Emma Dyer

Member since: Aug 10, 2021
Published articles: 66