How to Extract Coupon Details from the Walmart Store using LXML and Python?

Author: Retail Gator

This tutorial blog will help you know how to scrape coupon details from Walmart.

We’ll scrape the following data from every coupon listed in the store:

  • Discounted Pricing
  • Category
  • Brand
  • Activation Date
  • Expiry Date
  • Product Description
  • URL

From, below screenshot you can see how data is getting extracted.

You can extract or go further with different coupons created on different brand & filters. But as of now, you need to keep it simple.

Finding the Data

Use any browser or choice a store URL.

https://www.walmart.com/store/5941/washington-dc.

Click the option Coupon on left-hand side and you will able to see list of all the coupons which are offered for Walmart store 5941.

You need to Right-click on the given link on page and select – Inspect Element. The browser will help you to open toolbar and will display HTML Content of the Website, organized nicely. Click on the Network panel so that you can clear all requirements from the Demand table.

Click on this request –?pid=19521&nid=10&zid=vz89&storezip=20001

You can see this Request URL – https://www.coupons.com/coupons/?pid=19251&nid=10&zid=vz89&storezip=20001

After that, you need to recognize the parameters values- nid, pid, as well as storezip. Check the variables in a page source - https://www.walmart.com/store/5941/washington-dc

Here, you can observe different variables are allocated to the javascript variable _wml.config. You can use variables from different source, page and make the URL of coupons endpoint – https://www.coupons.com/coupons/?pid=19251&nid=10&zid=vz89&storezip=20001

Recover the HTML coupon from URL and you will see how data can be extract from javascript variable APP_COUPONSINC. You can copy data into JSON parser to display data in a structured format.

You can see data fields for the coupons with each coupon ID.

Building the Scraper

Utilize Python 3 in this tutorial. This code is not going to work if you use Python 2.7. You require a computer to start PIP and Python 3 fixed in it.

Many UNIX OS like Mac OS and Linux come with pre-installed Python. However, not each Linux OS ships by default with Python 3.

Let’s check Python version. Exposed the terminal (in Mac OS and Linux) or Facility Prompt (with Windows) and kind

and click enter. In case, the outputs look like Python 3.x.x, then you need to install Python 3. If you say Python 2.x.x then you are using Python 2. If error comes, that means you don’t have installed Python. If Python 3 is not install then, install that first.

Installing Python 3 as well as Pip

You can go through the guide of installing Python 3 with Linux – http://docs.python-guide.org/en/latest/starting/install3/linux/

The Mac Users may also follow the guide – http://docs.python-guide.org/en/latest/starting/install3/osx/

Installing Packages
  • Python requirements, for making requests as well as downloading HTML content about various pages (http://docs.python-requests.org/en/master/user/install/).
  • You can use Python LXML to analyze HTML Tree Assembly through Xpaths (Find out how to install it there – http://lxml.de/installation.html)
  • UnicodeCSV to handle Unicode typescripts in output folder. Install that using pip install unicodecsv.
The Codefrom lxml import htmlimport csvimport requestsimport reimport jsonimport argparseimport tracebackdef parse(store_id):"""Function to retrieve coupons in a particular walmart store:param store_id: walmart store id, you can get this id from the output of walmart store location script#sending request to get coupon related meta detailsurl = "https://www.walmart.com/store/%s/coupons"%store_idheaders = {"accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8","accept-encoding":"gzip, deflate, br","accept-language":"en-GB,en;q=0.9,en-US;q=0.8,ml;q=0.7","referer":"https://www.walmart.com/store/finder","upgrade-insecure-requests":"1","user-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36" }#adding retryfor retry in range(5):try:response = requests.get(url, headers=headers)raw_coupon_url_details = re.findall('"couponsData":({.*?})',response.text)if raw_coupon_url_details:coupons_details_url_info_dict = json.loads(raw_coupon_url_details[0])#these variables are used to create coupon page urlpid = coupons_details_url_info_dict.get('pid')nid = coupons_details_url_info_dict.get('nid')zid = coupons_details_url_info_dict.get('zid')#coupons details are rendering from the following url#example link:https://www.coupons.com/coupons/?pid=19251&nid=10&zid=vz89&storezip=20001coupons_details_url ="https://www.coupons.com/coupons/?pid={0}&nid={1}&zid={2}".format(pid,nid,zid)print("retrieving coupon page")coupon_headers ={"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8","Accept-Encoding":"gzip, deflate, br","Accept-Language":"en-GB,en;q=0.9,en-US;q=0.8,ml;q=0.7","Host":"www.coupons.com","Upgrade-Insecure-Requests":"1","User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36"}response = requests.get(coupons_details_url,headers=coupon_headers) coupon_raw_json = re.findall("APP_COUPONSINC\s?=\s?({.*});",response.text) print("processing coupons data")if coupon_raw_json:data = []coupon_json_data = json.loads(coupon_raw_json[0])coupons = coupon_json_data.get('contextData').get('gallery').get('podCache')for coupon in coupons:price = coupons[coupon].get('summary')product_brand = coupons[coupon].get('brand')details = coupons[coupon].get('details')expiration = coupons[coupon].get('expiration')category_1 = coupons[coupon].get('catdesc1','')category_2 = coupons[coupon].get('catdesc2','')category_3 = coupons[coupon].get('catdesc3','')category = '> '.join([category_1,category_2,category_3])wallmart_data={"offer":price,"brand":product_brand,"description":details,"category":category,"activated_date":activated,"expiration_date":expiration,"url":coupons_details_url}data.append(wallmart_data)return dataexcept:print(traceback.format_exc())return []if __name__=="__main__":argparser = argparse.ArgumentParser()argparser.add_argument('store_id',help = 'walmart store id')args = argparser.parse_args()store_id = args.store_id{scraped_data = parse(store_id)if scraped_data:print ("Writing scraped data to %s_coupons.csv"%(store_id))with open('%s_coupons.csv'%(store_id),'w') as csvfile:fieldnames =["offer","brand","description","category","activated_date","expiration_date","url"]writer = csv.DictWriter(csvfile,fieldnames = fieldnames,quoting=csv.QUOTE_ALL)writer.writeheader()for data in scraped_data:writer.writerow(data)

Perform the code using script name trailed by a store ID:

python3 walmart_coupon_retreiver.py store_id

For example, get the coupon information from store 3305, we can run a script like that:

python3 walmart_coupon_retreiver.py 3305

Also, you will get file name 3305_coupons.csv which will remain in the similar folder as a script. The result file will appearance similar.

Identified Limitations

The given code works for extract eCommerce Data Scraping coupons information of Walmart stores for store IDs obtainable on Walmart.com. In case, you wish to extract data of millions of pages you need to go through more sources.

source code: https://www.retailgators.com/how-to-extract-coupon-details-from-the-walmart-store-using-lxml-and-python.php