How to Scrape E-commerce Sites Using Web Scraping to Compare Pricing Using Python — Part 1

Author: Retail Gator
Introduction

We have been frequently said that between two big e-commerce platforms of Malaysia (Shopee and Lazada), one is normally cheaper as well as attracts good deal hunters whereas other usually deals with lesser price sensitive.

So, we have decided to discover ourselves… in the battle of these e-commerce platforms!

For that, we have written a Python script with Selenium as well as Chrome driver for automating the scraping procedure and create a dataset. Here, we would be extracting for these:

  • Product’s Name
  • Product’s Name

Then we will do some basic analysis with Pandas on dataset that we have extracted. Here, some data cleaning would be needed and in the end, we will provide price comparisons on an easy visual chart with Seaborn and Matplotlib.

Between these two platforms, we have found Shopee harder to extract data for some reasons: (1) it has frustrating popup boxes that appear while entering the pages; as well as (2) website-class elements are not well-defined (a few elements have different classes).

For the reason, we would start with extracting Lazada first. We will work with Shopee during Part 2!

Initially, we import the required packages:

Amazing! Although, we need to do some additional cleaning. You could have observed any difference in the datasets. Amongst the items, which is actually the twin pack that we would require to remove from the datasets.

Data cleaning is important for all sorts of data analysis as well as here we would remove entries, which we don’t require with the following code:

  • This removes any entry with 'x2' in its title dfL = dfL[dfL[‘ItemName’].str.contains(‘x2’) == False]
Though not required here, you can also make sure that different items, which seem are the items that we precisely searched for. At times other associated products might appear in the search lists, particularly if the search terms aren’t precise enough.

For instance, if we would have searched ‘nescafe gold refill’ rather than ‘nescafe gold refill 170g’, then 117 items might have appeared rather than only 9 that we had scraped earlier. These extra items aren’t some refill packs that we were looking for however, rather capsule filtering cups instead.

Nevertheless, this won’t hurt for filtering your datasets again within the search terms:

dfL = dfL[dfL[‘ItemName’].str.contains(‘170g’) == True]

In the final game, we would also make a column called ‘Platform’ as well as allocate ‘Lazada’ to all the entries here. It is completed so that we could later group different entries by these platforms (Shopee and Lazada) whenever we later organize the pricing comparison between two platforms.

dfL[‘Platform’] = ‘Lazada’

Hurrah! Finally, our dataset is ready and clean!

Now, you need to visualize data with Seaborn and Matplotlib. We would be utilizing the box plot because it exclusively represents the following main statistical features (recognized as a five number summary) in this chart:

Highest Pricing

Lowest Pricing

Median Pricing

25th as well as 75th percentile pricing

  • Plot the chart sns.set() _ = sns.boxplot(x=’Platform’, y=’Price’, data=dfL) _ = plt.title(‘Comparison of Nescafe Gold Refill 170g prices between e-commerce platforms in Malaysia’) _ = plt.ylabel(‘Price (RM)’) _ = plt.xlabel(‘E-commerce Platform’) # Show the plot plt.show()
Every box represents the Platform as well as y-axis shows a price range. At this time, we would only get one box, because we haven’t scraped and analyzed any data from a Shopee website.

We could see that item prices range among RM21–28, having the median pricing between RM27–28. Also, we can see that a box has shorter ‘whiskers’, specifying that the pricing is relatively constant without any important outliers. To know more about understanding box plots, just go through this great summary!

Looking to scrape price data from e-commerce websites? Contact Retailgators for eCommerce Data Scraping Services.

source code: https://www.retailgators.com/how-to-scrape-e-commerce-sites-using-web-scraping-to-compare-pricing-using-python.php