Directory Image
This website uses cookies to improve user experience. By using our website you consent to all cookies in accordance with our Privacy Policy.

How to Extract Amazon Reviews with Python Scrapy?

Author: Retail Gator
by Retail Gator
Posted: Jul 30, 2021
Introduction

We search many things online on the internet daily to purchase something, for comparing one product with another, to decide if one product is superior to other, etc. We straight away go to the reviews to see the stars or positive feedbacks it has received, right?

In this tutorial blog we will see how to extract Amazon reviews with Python Scrapy. We will save data in the excel spreadsheet or csv. These are the data-fields we will extract:

  1. Review’s Title
  2. Ratings
  3. Reviewer’s Name
  4. Review’s Description
  5. Review’s Content
  6. Helpful Counts

Then we will do some basic analysis with Pandas on dataset that we have extracted. Here, some data cleaning would be needed and in the end, we will provide price comparisons on an easy visual chart with Seaborn and Matplotlib.

Between these two platforms, we have found Shopee harder to extract data for some reasons: (1) it has frustrating popup boxes that appear while entering the pages; as well as (2) website-class elements are not well-defined (a few elements have different classes).

For the reason, we would start with extracting Lazada first. We will work with Shopee during Part 2!

Initially, we import the required packages:

  • Web Scraping from selenium import webdriver from selenium.common.exceptions import * # Data manipulation import pandas as pd # Visualization import matplotlib.pyplot as plt import seaborn as sns
It’s time to get started.

We choose Scrapy – a Python framework for larger-scale data scraping. Together with it, a few other packages would be needed to extract Amazon product reviews.

  • Requests: For sending a URL request
  • Pandas: For exporting csv
  • Pymysql: For connecting mysql server as well as storing data there
  • Math: For implementing mathematical operations

You can anytime install packages like given below with conda or pip.

pip install scrapy

OR

conda intall -c conda-forge scrapyLet’s outline Start URL for Scraping Seller’s Links

Let’s see what this will like to extract reviews for a product. We have taken the URL: https://www.amazon.com/dp/B07N9255CG This will look like this:

When we go to its review section, this looks like an image given below. This might have different names given in the reviews.

However, if you carefully inspect these requests on the back whereas loading a page as well as play a bit with next as well as previous pages of the review, you could have noticed that there’s the post request loaded having content in a page?

Here, we have looked at the payload as well as headers needed for the successful response. In case, you are having properly inspected pages, you’ll identify the change between shifting a page as well as how that reflects on requests given for that.

source code: https://www.retailgators.com/how-to-extract-amazon-reviews-with-python-scrapy.php

About the Author

ECommerce Web Scraping Tools & Services | Retailgators https://www.retailgators.com/index.

Rate this Article
Leave a Comment
Author Thumbnail
I Agree:
Comment 
Pictures
Author: Retail Gator

Retail Gator

Member since: Jul 14, 2021
Published articles: 17

Related Articles