Directory Image
This website uses cookies to improve user experience. By using our website you consent to all cookies in accordance with our Privacy Policy.

How to Scrape E-commerce Sites Using Web Scraping to Compare Pricing Using Python — Part 1

Author: Retail Gator
by Retail Gator
Posted: Jul 17, 2021
Introduction

We have been frequently said that between two big e-commerce platforms of Malaysia (Shopee and Lazada), one is normally cheaper as well as attracts good deal hunters whereas other usually deals with lesser price sensitive.

So, we have decided to discover ourselves… in the battle of these e-commerce platforms!

For that, we have written a Python script with Selenium as well as Chrome driver for automating the scraping procedure and create a dataset. Here, we would be extracting for these:

  • Product’s Name
  • Product’s Name

Then we will do some basic analysis with Pandas on dataset that we have extracted. Here, some data cleaning would be needed and in the end, we will provide price comparisons on an easy visual chart with Seaborn and Matplotlib.

Between these two platforms, we have found Shopee harder to extract data for some reasons: (1) it has frustrating popup boxes that appear while entering the pages; as well as (2) website-class elements are not well-defined (a few elements have different classes).

For the reason, we would start with extracting Lazada first. We will work with Shopee during Part 2!

Initially, we import the required packages:

  • Web Scraping from selenium import webdriver from selenium.common.exceptions import * # Data manipulation import pandas as pd # Visualization import matplotlib.pyplot as plt import seaborn as sns
Then, we start the universal variables which are:

  1. Path of a Chrome web driver
  2. Website URL
  3. Items we wish to search
webdriver_path = 'C://Users//me//chromedriver.exe' # Enter the file directory of the Chromedriver Lazada_url = 'https://www.lazada.com.my' search_item = 'Nescafe Gold refill 170g' # Chose this because I often search for coffee!

After that, we would start off the Chrome browser. We would do it with a few customized options:

  • Select custom Chrome options options = webdriver.ChromeOptions() options.add_argument('--headless') options.add_argument('start-maximized') options.add_argument('disable-infobars') options.add_argument('--disable-extensions') # Open the Chrome browser browser = webdriver.Chrome(webdriver_path, options=options) browser.get(Lazada_url)
Let’s go through about some alternatives. The ‘— headless’ argument helps you run this script with a browser working in its background. Usually, we would suggest not to add this argument in the Chrome selections, so that you would be able to get the automation as well as recognize bugs very easily. The disadvantage to that is, it’s less effective.

Some other arguments like ‘disable-infobars’, ‘start-maximised’, as well as ‘— disable-extensions’ are included to make sure smoother operations of a browser (extensions, which interfere with the webpages particularly can disrupt the automation procedure).

Running the shorter code block will open your browser.

When the browser gets opened, we would require to automate the item search. The Selenium tool helps you find HTML elements with different techniques including the class, id, CSS selectors, as well as XPath that is the XML path appearance.

Then how do you recognize which features to get? An easy way of doing this is using Chrome’s inspect tool:

search_bar = browser.find_element_by_id('q') search_bar.send_keys(search_item).submit()
About the Author

ECommerce Web Scraping Tools & Services | Retailgators https://www.retailgators.com/index.

Rate this Article
Leave a Comment
Author Thumbnail
I Agree:
Comment 
Pictures
Author: Retail Gator

Retail Gator

Member since: Jul 14, 2021
Published articles: 17

Related Articles