- Views: 1
- Report Article
- Articles
- Business & Careers
- Business Opportunities
How to Extract Houzz Product Images with Python & BeautifulSoup
Posted: Aug 21, 2021
Here, we will observe how to extract Houzz products images data with Python & a BeautifulSoup easily. The purpose of this blog is to start on solve the real-world problems while keeping it easy to help you to become comfortable and also get practical results immediately.
So, the initial entity we wish is to make sure that you have Python 3 installed. So, first of all, install Python 3.
Then, install BeautifulSoup with:
pip3 install beautifulsoup4We will require LXML, library request, and soupsieve to scrape data, break it into XML, and also utilize the CSS selectors. After that, install them with...
pip3 install requests soupsieve lxmlOnce it is installed, open an editor and type:
- coding: utf-8 -*- from bs4 import BeautifulSoup import requests
It will look like this:
Let’s think about the code and try and get data by fantasizing that we are having a browser including that:
- coding: utf-8 -*- from bs4 import BeautifulSoup import requests headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'} url = 'https://www.houzz.in/photos/kitchen-design-ideas-phbr0-bp~t_26043' response=requests.get(url,headers=headers) soup=BeautifulSoup(response.content,'lxml')
Whenever, you run it:
python3 scrapeHouzz.pyYou will observe the whole HTML page.
After that, let's make use of CSS selectors to get the necessary data. To do that, let's utilize Chrome and open the inspect tool.
Now, we have observed that all the individual product data are restricted in a div having class 'hz-space-card'. So, we can extract it using CSS selector '.hz-space-card' easily and that’s how its code would look like:
- coding: utf-8 -*- from bs4 import BeautifulSoup import requests headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'} url = 'https://www.houzz.in/photos/kitchen-design-ideas-phbr0-bp~t_26043' response=requests.get(url,headers=headers) soup=BeautifulSoup(response.content,'lxml') for item in soup.select('.hz-space-card'): try: print('----------------------------------------') print(item) except Exception as e: #raise e b=0
After that, let’s choose classes within the rows that has the required data. We have observed that the title is inside this class hz-space-card__photo-title, the image in hz-image, and more. Therefore, now it looks like that when we will try and get Titles, user names, images, and links for that.
source code: https://www.retailgators.com/how-to-extract-houzz-product-images-with-python-and-beautifulsoup.php
About the Author
ECommerce Web Scraping Tools & Services | Retailgators USA, UK, Australia, UAE, Germany. https://www.retailgators.com/index.
Rate this Article
Leave a Comment