How to Use Kaggle API in Python?
Introduction
Kaggle is the world’s leading online data science community with powerful tools, datasets, and other resources to help us achieve our data science goals. It is a subsidiary of Google LLC. Kaggle covers tons of freely available datasets used for educational determinations. It likewise hosts competitions and has freely available notebook to discover and run data science and machine learning models.
We need to log in to the Kaggle website and search respectively to use Kaggle resources and contribute in Kaggle competitions. One requires searching for the dataset and downloading it manually and moving to the desired folder to further explore for downloading a dataset from Kaggle. All interactions with Kaggle may be done using a Kaggle API through the command-line tool (CLI) applied in Python.
Description
Kaggle technologically advanced its start in 2010 using contribution machine learning competitions. It currently similarly deals a public data platform, a cloud-based workbench for data science, and Artificial Intelligence education. Its important staffs were Anthony Goldbloom and Jeremy Howard. Nicholas Gruen was establishing chair do well by Max Levchin. Equity was elevated in 2011. Those have been treasuring the company at $25 million. Google proclaimed on 8 March 2017, that they were obtaining Kaggle.
Installation
Before installation ensures we have Python 3 and the package manager pip installed.
Run the below command to get the Kaggle API using the command line:
pip install kaggle
Firstly we need to authenticate using an API token.
Check the steps to download a new confirmation token onto our machine to complete the authentication.
Click on profile picture and on Account from the drop-down menu.
Scroll down to the API section.
Get on to create New API Token button to download a fresh token as a JSON file having a username and API key.
Duplicate the JSON file to ~/.kaggle/ the directory. Go to the root directory, formerly to.kaggle folder, and copy the downloaded file to this directory.
Always keep in mind that if we run into a kaggle: command not found error; make sure that our python binaries are on our path. We may understand where kaggle is installed by way of doing pip uninstall kaggle and sighted where the binary is. The default location is ~/.local/bin for a local user install on Linux. The default location is $PYTHON_HOME/Scripts on Windows.
API identifications
Sign up for a Kaggle account at https://www.kaggle.com to use the Kaggle API.
Then go to the "Account"' tab of user profile (https://www.kaggle.com//account)
Select 'Create API Token'.
Keep this file in the location ~/.kaggle/kaggle.json
On Windows in the location C:\Users\\.kaggle\kaggle.json
We may check the exact location, sans drive, with echo %HOMEPATH %).
We can define a shell environment variable as KAGGLE_CONFIG_DIR to change this position to $KAGGLE_CONFIG_DIR/kaggle.json
Make sure that other users of computer do not have read access to our credentials for our security.
We can do this on Unix-based systems with the following command: chmod 600 ~/.kaggle/kaggle.json
We can select below to export Kaggle username and token to the environment ; export KAGGLE_USERNAME=datadinosaur
export KAGGLE_KEY=xxxxxxxxxxxxxx
We can also export any other configuration value that usually will be in the $HOME/.kaggle/kaggle.json in the format 'KAGGLE_'.
Such as, if the file had the variable "proxy" we will export KAGGLE_PROXY and it would be learnt by the client.
How to search for the dataset:
We can search for any keyword to find the resultant datasets by using CLI arguments. Find below the CLI statement to acquire the list of datasets using search statement:
kaggle datasets list -s [KEYWORD]
How to download the dataset:
This API delivers an advantage to download any datasets from Kaggle to our local machine after we have searched for the suitable dataset using CLI arguments for searching. Commands to download the files related by the datasets using CLI:
kaggle datasets download -d [DATASET]
How to Create and uphold dataset
Kaggle API may be used to upload new datasets and versions of datasets using CLI arguments. This may comfort the distribution of datasets and projects on Kaggle.
Follow the below steps to make a new dataset:
Collect the dataset files in a folder to upload on Kaggle.
Run: kaggle datasets init -p /path/to/dataset to create metadata.
Add the metadata to the produced file: datapackage.json
Run: kaggle datasets create -p /path/to/dataset to lastly make the dataset
Follow the below steps to upload a new version of the current dataset:
Run: kaggle datasets init -p /path/to/dataset to create metadata.
Ensure the id field in the metadata files datapackage.json points to our dataset.
Run: kaggle datasets version -p /path/to/dataset -m "MESSAGE" to lastly generate the dataset.
How to search for the published notebook
We can search using a keyword to find consistent published notebooks by using Kaggle API. It allows to search for published Notebooks and their metadata along with workflows for creating and running Notebooks.
To acquire, the list of published notebooks, using search keyword CLI statement:
kaggle kernels list -s [KEYWORD]
How to download a published notebook
Kaggle API offers the benefit to download any published notebooks from Kaggle to our local machine. Commands to download the files related by way of the notebooks using CLI:
kaggle kernels pull-k [KERNEL] -p /path/to/download -m
How to create and Run a notebook
Kaggle API may be used to upload new notebooks and uphold versions of notebooks using CLI arguments. This can comfort sharing notebooks and projects on Kaggle.
Follow the below steps to generate new notebooks:
Gather the code files in a folder to upload on Kaggle.
Run: kaggle kernels init -p /path/to/kernel to make metadata.
Add the metadata to the made file: kernel-metadata.json
Run: kaggle kernels push -p /path/to/kernel to end with create the dataset
Track the below steps to upload a new version of the current dataset:
Download the last version of notebook and matching metadata to our local machine: kaggle kernels pull -k [KERNEL] -p /path/to/download –m
Ensure the id field in the metadata files kernel-metadata.json points to our notebook.
Run:kaggle kernels version -p /path/to/kernel to in conclusion push the new version of the notebook
How to interact with competitions
Kaggle API tools give an easy method to interact with the competitions held on Kaggle. We need to login to the competition site and accept the rules to download the dataset and make the submission to accept the rules for any competition. We require visiting the Kaggle website and accepting the rules there, for instance it is not likely over the API.
Commands to act together with the competition held on Kaggle:
List of all continuing competitions: kaggle competitions list
Download the files related with a competition: kaggle competitions download -c [COMPETITION]
How to submit to a competition
After we have accepted the submission rules by visiting the Kaggle competition website page, submitting to any competition is only possible.
CLI arguments to submit to competition and get manually scored:
kaggle competitions submit -c [COMPETITION NAME] -f [FILE PATH]
Run: kaggle competitions submissions -c [COMPETITION NAME] to list all the preceding submission to a competition.