Access The Data Using Sql/Sparql Queries
The second way of accessing your data using the data.world python library is using the .query()function, which allows you to access data.world's query tool.
The query() function returns a QueryResults object which has three attributes, similar to the attributes of the LocalDataset object: QueryResults.dataframe, QueryResults.table, and QueryResults.
,
Installing The data.world Library
The first thing you'll need to do is install the library, which you can do via pip: This will install the library and all its dependent packages.
One of the handy things about the library is its command line utility, which allows you to easily store your API token locally.
This avoids having to put it in your scripts or notebook and have to worry a.
,
Our Data Set
For this tutorial, we'll be working with a data set of information on the TV show, The Simpsons.
The dataset was scraped by Tod Schenider for his post The Simpsons by the Data, for which he made the scraper available on GitHub.
Kaggle user William Cukierski used the scraper to upload the data set, which was then rehosted on data.world.
If you look .
,
Using Data.world's Python Library to Explore The Data
First, let's import the datadotworldlibrary: We're going to use the load_dataset() function to take a look at the data.
When we use load_dataset()for the first time, it:.
1) Downloads the data set from data.world and caches it in our ~/.dw/directory.
2) Returns a LocalDatasetobject representing the data set Caching the data set locally is a really ne.
,
Why do data engineers use Python?
Data engineers use Python libraries to acquire data via web scraping, interacting with the APIs many companies use to make their data available and connecting with databases.
With libraries for cleaning, transforming, and enriching data, Python helps data engineers create usable, high-quality data sets ready for analysis.