- Hands-on Lab: Working with common PostgreSQL extensions
- Prerequisites
- Exercise 1: Add extensions to allowlist
- Exercise 2: Store geospatial data with the PostGIS extension
- Exercise 3: Set up scheduled jobs
- Exercise 4: Load data into the database from Azure storage using the Azure Storage extension (Optional)
- Task 1: Add extension to allowlist
- Task 2: Create an Azure Blob Storage account
- Task 3: Create a container
- Task 4: Upload data files
- Task 5: Connect to the database using psql in the Azure Cloud Shell
- Task 6: Enable and configure the pg_azure_storage extension
- Task 7: Import data from blob storage
- Task 8: Export data into blob storage using the blob_put function
- Summary
Azure Database for PostgreSQL Flexible Server is an extensible platform that provides the ability to extend a database's functionality using many popular PostgreSQL extensions. Extensions enable the addition of extra functionality to the database by adding and expanding the capabilities of processes within the database. They work by bundling multiple related SQL objects in a single package that can be loaded or removed from the database with a single command. After being loaded into the database, extensions function like built-in features.
You can view the list of supported extensions by running SHOW azure.extensions;
within the database. Note that extensions not listed are not currently supported on Azure Database for PostgreSQL flexible server. You cannot create or load your own extension in Azure Database for the PostgreSQL flexible server.
In this lab, you work with the following common PostgreSQL extensions:
Extension | Description |
---|---|
postgis | Adds support for storing, indexing, and querying geospatial data (geometry, geography). |
pg_cron | A cron-based job scheduler that allows you to schedule PostgreSQL commands directly from the database. |
pg_azure_storage (Optional) | Allows for importing and exporting data in multiple file formats directly between Azure blob storage and an Azure Database for PostgreSQL Flexible Server database. |
This lab uses the Azure Database for PostgreSQL instance created in Lab 1 and the data and tables created in Lab 2. If you are starting this lab without completing the previous labs, expand the section below and complete the steps to set up the database.
Expand this section to view the prerequisite setup steps.
-
In a web browser, navigate to the Azure portal.
-
select the Cloud Shell icon in the Azure portal toolbar to open a new Cloud Shell pane at the bottom of your browser window.
-
At the cloud shell prompt, run the following commands to define variables for creating resources. The variables represent the names to assign to your resource group and database and specify the Azure region into which resources should be deployed.
The resource group name specified is
rg-postgresql-labs
, but you can provide any name you wish to use to host the resources associated with this lab.RG_NAME=rg-postgresql-labs
In the database name, replace the
{SUFFIX}
token with a unique value, such as your initials, to ensure the database server name is globally unique.DATABASE_NAME=pgsql-flex-{SUFFIX}
Replace the region with whatever location you want to use for lab resources.
REGION=eastus
-
Run the following Azure CLI command to create a resource group, specifying the location. If you have more than one Azure subscription, use the
az account set --subscription <subscription id>
command first to select the subscription you want to use for lab resources.az group create --name $RG_NAME --location $REGION
-
Provision an Azure Database for PostgreSQL database instance within the resource group you created above by running the following Azure CLI command:
az postgres flexible-server create --name $DATABASE_NAME --location $REGION --resource-group $RG_NAME \ --admin-user s2admin --admin-password Seattle123Seattle123 --database-name airbnb \ --public-access 0.0.0.0-255.255.255.255 --version 16 \ --sku-name Standard_D2ds_v5 --storage-size 32 --yes
In this task, you use the psql command-line utility from the Azure Cloud Shell to connect to the database.
-
You need the connection details for the database to connect to it using
psql
in the Cloud Shell. Using the open Azure portal window with the cloud shell pane at the bottom, navigate to your Azure Database for PostgreSQL Flexible Server resource, and in the left-hand navigation menu, select Connect under Settings. -
From the database's Connect page in the Azure portal, select airbnb for the Database name, then copy the Connection details block and paste it into the Cloud Shell.
-
At the Cloud Shell prompt, replace the
{your_password}
token with the password you assigned to thes2admin
user when creating the database, then run the command. If you followed the instructions in Lab 1, the password should beSeattle123Seattle123
. -
Connect to the database using the
psql
command-line utility by entering the following at the prompt:psql
Connecting to the database from the Cloud Shell requires that the
Allow public access from any Azure service within Azure to the server
box is checked on the Networking page of the database. If you receive a message that you are unable to connect, please verify this is checked and try again.
Using the psql
command prompt, you will create tables and populate them with data for use in the lab.
-
Run the following commands to create temporary tables for importing JSON data from a public blob storage account.
CREATE TABLE temp_calendar (data jsonb); CREATE TABLE temp_listings (data jsonb); CREATE TABLE temp_reviews (data jsonb);
-
Using the
COPY
command, populate each temporary table with data from JSON files in a public storage account.\COPY temp_calendar (data) FROM PROGRAM 'curl https://solliancepublicdata.blob.core.windows.net/ms-postgresql-labs/calendar.json'
\COPY temp_listings (data) FROM PROGRAM 'curl https://solliancepublicdata.blob.core.windows.net/ms-postgresql-labs/listings.json'
\COPY temp_reviews (data) FROM PROGRAM 'curl https://solliancepublicdata.blob.core.windows.net/ms-postgresql-labs/reviews.json'
-
Run the following command to create the tables for storing data in the shape used by this lab:
CREATE TABLE listings ( listing_id int, name varchar(50), street varchar(50), city varchar(50), state varchar(50), country varchar(50), zipcode varchar(50), bathrooms int, bedrooms int, latitude decimal(10,5), longitude decimal(10,5), summary varchar(2000), description varchar(2000), host_id varchar(2000), host_url varchar(2000), listing_url varchar(2000), room_type varchar(2000), amenities jsonb, host_verifications jsonb, data jsonb );
CREATE TABLE reviews ( id int, listing_id int, reviewer_id int, reviewer_name varchar(50), date date, comments varchar(2000) );
CREATE TABLE calendar ( listing_id int, date date, price decimal(10,2), available boolean );
-
Finally, run the following
INSERT INTO
statements to load data from the temporary tables to the main tables, extracting data from the JSONdata
field into individual columns:INSERT INTO listings SELECT data['id']::int, replace(data['name']::varchar(50), '"', ''), replace(data['street']::varchar(50), '"', ''), replace(data['city']::varchar(50), '"', ''), replace(data['state']::varchar(50), '"', ''), replace(data['country']::varchar(50), '"', ''), replace(data['zipcode']::varchar(50), '"', ''), data['bathrooms']::int, data['bedrooms']::int, data['latitude']::decimal(10,5), data['longitude']::decimal(10,5), replace(data['description']::varchar(2000), '"', ''), replace(data['summary']::varchar(2000), '"', ''), replace(data['host_id']::varchar(50), '"', ''), replace(data['host_url']::varchar(50), '"', ''), replace(data['listing_url']::varchar(50), '"', ''), replace(data['room_type']::varchar(50), '"', ''), data['amenities']::jsonb, data['host_verifications']::jsonb, data::jsonb FROM temp_listings;
INSERT INTO reviews SELECT data['id']::int, data['listing_id']::int, data['reviewer_id']::int, replace(data['reviewer_name']::varchar(50), '"', ''), to_date(replace(data['date']::varchar(50), '"', ''), 'YYYY-MM-DD'), replace(data['comments']::varchar(2000), '"', '') FROM temp_reviews;
INSERT INTO calendar SELECT data['listing_id']::int, to_date(replace(data['date']::varchar(50), '"', ''), 'YYYY-MM-DD'), data['price']::decimal(10,2), replace(data['available']::varchar(50), '"', '')::boolean FROM temp_calendar;
You are now ready to begin Lab 3!
Before you can install and use extensions in an Azure Database for PostgreSQL Flexible Server, you must allowlist the desired extensions, as described in how to use PostgreSQL extensions.
-
In a web browser, navigate to your Azure Database for PostgreSQL Flexible Server resource in the Azure portal.
-
From the database's left-hand navigation menu, select Server parameters under Settings, then enter
azure.extensions
into the search box. Expand the VALUE dropdown list, then locate and check the box next to each of the following extensions:- PG_CRON
- POSTGIS
If the extension you are attempting to allowlist is not present in the list, it may not be supported in the version of PostgreSQL you are using. You can review the supported extension versions in the Microsoft docs. Alternatively, you can run the query,
SELECT * FROM pg_available_extensions;
, against the database. -
Some extensions must be included in the server's shared preloaded libraries. Of the extensions you use in this lab,
pg_cron
must be preloaded at the server start. Clear the search box on the Server parameters page and entershared_preload
. Expand the VALUE dropdown list, then locate thePG_CRON
extension and ensure it is checked. This extension is preloaded by default, so you are verifying its state and should not have to make any changes. -
Select Save on the toolbar to save the updates to the Azure extensions allowlist. This action will trigger a deployment on the database that should be completed within 20 to 30 seconds.
PostGIS is a 3rd party open-source spatial database extender for PostgreSQL object-relational databases, which adds support for geographic objects that allow location queries to be run in SQL. This extension enables complex geospatial queries and analysis within PostgreSQL, essential for mapping, location tracking, and spatial analysis. It provides a variety of spatial data types, indexing options, and functions to query and manipulate geospatial data efficiently.
You will use pgAdmin to connect to and execute queries against the database for this exercise. The pgAdmin tool offers a Geometry Data Viewer, which allows you to view GIS objects on a map.
In this task, you will open pgAdmin and connect to the database.
-
If you do not already have it installed, download and install pgAdmin.
-
Open pgAdmin and, if you have not already done so, register your server by right-clicking Servers in the Object Explorer and selecting Register > Server.
Note: you can skip this step if you registered the server in pgAdmin in a previous lab.
Retrieve the server name of your Azure Database for PostgreSQL Flexible Server from the Azure portal.
In the Register - Server dialog, paste your Azure Database for PostgreSQL Flexible Server server name into the Name field on the General tab.
Next, select the Connection tab and paste your server name into the Hostname/address field. Enter s2admin into the Username field, enter Seattle123Seattle123 into the Password box, and optionally, select Save password.
Finally, select the Parameters tab and set the SSL mode to require.
Select Save to register your server.
-
Expand the Servers node within the Object Explorer, select the database server from the list, then right-click the server and select Connect Server from the context menu.
-
Once connected to your server, expand the Databases node and select the airbnb database. Right-click the airbnb database and select Query Tool from the context menu.
To install the postgis
extension in the database, you will use the CREATE EXTENSION command. Behind the scenes, executing CREATE EXTENSION
runs the extension's script file. The script typically creates new SQL objects such as functions, data types, operators, and index support methods. Additionally, CREATE EXTENSION
records the identities of all the created objects so they can be dropped again if DROP EXTENSION
is issued.
-
In the query window you opened above, run the
CREATE EXTENSION
command to install thepostgis
extension in the database.CREATE EXTENSION IF NOT EXISTS postgis;
If you attempt to install an extension with the same name as one already loaded in the database, you will receive an error that the extension already exists. Specifying the
IF NOT EXISTS
clause when running theCREATE EXTENSION
command allows you to avoid this error.
With the PostGIS
extension now loaded, you are ready to begin working with geospatial data in the database. The listings
table you created and populated above contains the latitude and longitude of all listed properties. To use these data for geospatial analysis, you must alter the listings
table to add a geometry
column that accepts the point
data type. These new data types are included in the postgis
extension.
-
To accommodate
point
data, add a newgeometry
column to the table that acceptspoint
data. Copy and paste the following query into the open pgAdmin query window:ALTER TABLE listings ADD COLUMN listing_location geometry(point, 4326);
-
Next, update the table with geospatial data associated with each listing by adding the longitude and latitude values into the
geometry
column.UPDATE listings SET listing_location = ST_SetSRID(ST_Point(longitude, latitude), 4326);
With PostGIS
installed in the database, you can take advantage of the Geometry Data Viewer in pgAdmin to view GIS objects in a map.
-
Copy and paste the following query into the open query editor, then run it to view the data stored in the
listing_location
column:SELECT listing_id, name, listing_location FROM listings LIMIT 50;
-
In the Data Output panel, select the View all geometries in this column button displayed in the
listing_location
column of the query results.The View all geometries in this column button opens the Geometry Viewer, allowing you to view the query results on a map.
-
Select one of the points displayed on the map to view details about the location. In this case, the query only provided the name of the property.
-
Now, run the following query to perform a geospatial proximity query, returning properties that are available for the week of January 13, 2016, are under $75.00 per night, and are within a short distance of Discovery Park in Seattle. The query uses the ST_DWithin function provided by the
PostGIS
extension to identify listings within a given distance from the park, which has a longitude of-122.410347
and a latitude of47.655598
.SELECT name, listing_location, summary FROM listings l INNER JOIN calendar c ON l.listing_id = c.listing_id WHERE ST_DWithin( listing_location, ST_GeomFromText('POINT(-122.410347 47.655598)', 4326), 0.025 ) AND c.date = '2016-01-13' AND c.available = 't' AND c.price <= 75.00;
-
Select the View all geometries in this column button displayed in the
listing_location
column of the query results to open the Geometry Viewer and examine the results.
The following extension you will work with is pg_cron, a simple, cron-based job scheduler for PostgreSQL that runs inside the database as an extension. This powerful and simple extension can be used for many tasks, including aggregating data in near-real time, database cleanup and administrative tasks, and much more.
Extending the database with the pg_cron extension allows you to schedule PostgreSQL commands directly from the database using a cron-based job scheduler. Using this extension, you can schedule a job that calls a function to execute on a schedule, such as every month, hour, or minute. In this exercise, you will create a scheduled task that runs the VACUUM
operation against the airbnb
database.
The pg_cron
extension works slightly differently than the extensions you worked with above. It can only be installed in the postgres
database, which requires access to the background worker process to schedule jobs. To install the pg_cron
extension, you will open a new pgAdmin query window from the postgres
database.
-
In your open pgAdmin session, locate the
postgres
databases under the Databases node in the Object Explore, then right-click the database and select Query Tool from the context menu. -
In the new query panel, run the
CREATE EXTENSION
command to install thepg_cron
extension in thepostgres
database.CREATE EXTENSION IF NOT EXISTS pg_cron;
In this task, you will use pg_cron
to create a scheduled job for performing garbage collection in the airbnb
database using the VACUUM operation.
-
Copy and paste the following query into the query window for the
postgres
database, then run it to schedule aVACUUM
operation in theairbnb
database every five minutes.SELECT cron.schedule_in_database('VACUUM','*/5 * * * * ','VACUUM','airbnb');
-
Run the following query to view the job list, including the one you just added.
SELECT * FROM cron.job;
If you need to stop a job, you must retrieve the
jobid
for your job using this query. -
To unschedule the job, use the following, replacing the
{job_id}
token with thejobid
you retrieved from the previous query.SELECT cron.unschedule({job_id});
Exercise 4: Load data into the database from Azure storage using the Azure Storage extension (Optional)
The pg_azure_storage extension enables seamless integration of Azure blob storage with PostgreSQL databases. The extension allows you to import and export data in multiple file formats directly from Azure blob storage and the database. To use the pg_azure_storage
extension for this lab, you must first provision an Azure Storage account, retrieve its access key, create a container, and copy the sample Seattle Airbnb data files into the container.
Before installing and using the azure_storage
extension, you must allowlist it.
-
In a web browser, navigate to your Azure Database for PostgreSQL Flexible Server resource in the Azure portal.
-
From the database's left-hand navigation menu, select Server parameters under Settings, then enter
azure.extensions
into the search box. Expand the VALUE dropdown list, then locate and check the box next to each of the following extensions:- AZURE_STORAGE
If the extension you are attempting to allowlist is not present in the list, it may not be supported in the version of PostgreSQL you are using. You can review the supported extension versions in the Microsoft docs. Alternatively, you can run the query,
SELECT * FROM pg_available_extensions;
, against the database. -
The
pg_azure_storage
extension must be preloaded at server start and, therefore, must be added to the server's shared preloaded libraries list. Clear the search box on the Server parameters page and entershared_preload
. Expand the VALUE dropdown list, then locate and check the box next to theAZURE_STORAGE
extension: -
Select Save on the toolbar, then select Save and Restart in the dialog that appears.
In this task, you provision an Azure Storage account in the Azure portal to host the sample data files.
-
In a web browser, navigate to the Azure portal.
-
On the portal home page, select Create a resource under Azure services.
-
On the Create a resource page, select Storage in the left-hand menu then Storage account. You can also use the Search functionality to find the resource.
-
On the Create a storage account Basics tab, enter the following information:
Parameter Value Project details Subscription Select the subscription you use for lab resources. Resource group Select the resource group you created in Lab 1. Instance details Storage account name Enter a globally unique name, such as PREFIXpostgreslabs
.Region Select the same region you chose for your Azure Database for PostgreSQL Flexible Server database. Performance Select Standard. Redundancy Select Locally-redundant storage (LRS). -
The default settings will be used for the remaining tabs of the storage account configuration, so select the Review button.
-
Select the Create button on the Review tab to provision the storage account.
You have been provided with sample Seattle Airbnb data files in CSV format. To host these files, you will create a container named seattle-airbnb-data
in the new storage account.
-
Navigate to your new storage account in the Azure portal.
-
In the left-hand navigation menu, select Containers under Data storage, and then select + Container on the toolbar.
-
In the New container dialog, enter
seattle-airbnb-data
in the Name field and leave Private (no anonymous access) selected for the Public access level setting, then select Create.Setting the container's access level to **Private (no anonymous access)**prevents public access to the container and its contents. Below, you will provide the
pg_azure_storage
extension with the account name and access key, allowing it to access the files securely.
In this task, you upload the sample Seattle Airbnb data files into the container you created using the Azure CLI.
-
You need the name and key associated with your storage account to upload files using the Azure CLI. In the left-hand navigation menu, select Access keys under Security + networking.
-
With the Access keys page open, select the Cloud Shell icon in the Azure portal toolbar to open a new Cloud Shell pane at the bottom of your browser window
-
If prompted, select Bash, then select Create storage.
-
At the Azure Cloud Shell prompt, execute the following
curl
commands to download the Seattle Airbnb data files.curl -O https://solliancepublicdata.blob.core.windows.net/ms-postgresql-labs/listings.csv
curl -O https://solliancepublicdata.blob.core.windows.net/ms-postgresql-labs/calendar.csv
curl -O https://solliancepublicdata.blob.core.windows.net/ms-postgresql-labs/reviews.csv
The above commands download the files into the storage account associated with your Cloud Shell.
-
Next, you will use the Azure CLI to upload the files into the
seattle-airbnb-data
container you created in your storage account. Create variables to hold your storage account name and key values to make things easier.Copy your storage account name by selecting the Copy to clipboard button next to the storage account name on the Access keys page above your Cloud Shell:
Now, execute the following command at the Cloud Shell prompt to create a variable for your storage account name, replacing the
{your_storage_account_name}
token with your storage account name.ACCOUNT_NAME={your_storage_account_name}
Next, select the Show button next to the Key for key1 and then select the Copy to clipboard button next to the key's value.
Then, run the following, replacing the
{your_storage_account_key}
token with the key value you copied.ACCOUNT_KEY={your_storage_account_key}
-
You are now ready to upload the data files. To accomplish this, you will use the
az storage blob upload
CLI command from the Cloud Shell prompt. Run the following commands to upload the three data files into your storage account'sseattle-airbnb-data
container.az storage blob upload --account-name $ACCOUNT_NAME --account-key $ACCOUNT_KEY --container-name seattle-airbnb-data --file listings.csv --name listings.csv --overwrite
az storage blob upload --account-name $ACCOUNT_NAME --account-key $ACCOUNT_KEY --container-name seattle-airbnb-data --file calendar.csv --name calendar.csv --overwrite
az storage blob upload --account-name $ACCOUNT_NAME --account-key $ACCOUNT_KEY --container-name seattle-airbnb-data --file reviews.csv --name reviews.csv --overwrite
In this exercise, you are working with a small number of files. You will most likely work with many more files in real-world scenarios. In those circumstances, you can review different methods for migrating files to an Azure Storage account and select the technique that will work best for your situation.
-
To verify the files uploaded successfully, navigate to your storage account's Containers page by selecting Containers from the left-hand navigation menu. Select the
seattle-airbnb-data
container from the list of containers and observe that it now contains three files, namedcalendar.csv
,listings.csv
, andreviews.csv.
With the files now securely stored in blob storage, it's time to set up the pg_azure_storage
extension in the database. You will use the psql
command line utility from the Azure Cloud Shell.
-
You need the connection details for the database to connect to it using
psql
in the Cloud Shell. Navigate to your Azure Database for PostgreSQL Flexible Server resource in the Azure portal, and in the left-hand navigation menu, select Connect under Settings. -
With the Connect page open, select the Cloud Shell icon in the Azure portal toolbar to open a new Cloud Shell pane at the bottom of your browser window.
-
From the database's Connect page in the Azure portal, select airbnb for the Database name, then copy the Connection details block and paste it into the Cloud Shell.
-
At the Cloud Shell prompt, replace the
{your_password}
token with the password you assigned to thes2admin
user when creating the database, then run the command. If you followed the instructions in Lab 1, the password should beSeattle123Seattle123
. -
Connect to the database using the
psql
by entering the following at the prompt:psql
Connecting to the database from the Cloud Shell requires that the
Allow public access from any Azure service within Azure to the server
box is checked on the Networking page of the database. If you receive a message that you are unable to connect, please verify this is checked and try again.
Now that you are connected to the database, you can install the pg_azure_storage
extension. You install extensions using the CREATE EXTENSION command.
-
From the
psql
prompt in the Cloud Shell, run the following command to install the extension in the database:CREATE EXTENSION azure_storage;
When creating and working with the
pg_azure_storage
extension in the database, note that the extension's name is abbreviated toazure_storage
. -
To ensure the s2admin account you specified as the database administrator has appropriate permissions to add a storage account in the
pg_azure_storage
extension, run the following command toGRANT
it the necessary rights.GRANT azure_storage_admin TO s2admin;
-
With the extension now installed, you can configure the extension to connect to your storage account. You will need the name and key associated with your storage account. Using the same browser tab where the Cloud Shell is open, navigate to your storage account resource in the Azure portal. In the left-hand navigation menu, select Access keys under Security + networking.
-
Before importing data from blob storage, you must map to the storage account using the
account_add
method, providing the account access key defined when the account was created. At thepsql
prompt, execute the following command after replacing the{your_storage_account_name}
and{your_storage_account_key}
tokens with the Storage account name and Key values, respectively.SELECT azure_storage.account_add('{your_storage_account_name}', '{your_storage_account_key}');
-
Once your storage account has been mapped, you can list the storage account contents. After replacing the
{your_storage_account_name}
token with the Storage account name, run the following command to view the list of files in theseattle-airbnb-data
container:SELECT path, bytes, pg_size_pretty(bytes), content_type FROM azure_storage.blob_list('{your_storage_account_name}', 'seattle-airbnb-data');
The
blob_list
function output should be similar to the following:
With the pg_azure_storage
extension now connected to your blob storage account, you can import the data it contains into the database. Using the queries below, you will create a new schema in the database, add tables to the new schema, and then import data from the files in blob storage into the new tables using the pg_azure_storage
extension.
-
To host the new tables and avoid conflicts with existing ones, you will create a new schema named
abb
in the database. To create a new schema, use the following command:CREATE SCHEMA IF NOT EXISTS abb;
-
Before importing data, you must create tables to store data from the CSV files. Run the commands below to create the three required tables in the
abb
schema. The table structures are based on the columns defined in each file.CREATE TABLE abb.calendar ( listing_id bigint, date date, available boolean, price text );
CREATE TABLE abb.listings ( id bigint PRIMARY KEY, listing_url text, scrape_id bigint, last_scraped date, name text, summary text, space text, description text, experiences_offered text, neighborhood_overview text, notes text, transit text, thumbnail_url text, medium_url text, picture_url text, xl_picture_url text, host_id bigint, host_url text, host_name text, host_since date, host_location text, host_about text, host_response_time text, host_response_rate text, host_acceptance_rate text, host_is_superhost boolean, host_thumbnail_url text, host_picture_url text, host_neighbourhood text, host_listings_count int, host_total_listings_count int, host_verifications text, host_has_profile_pic boolean, host_identity_verified boolean, street text, neighbourhood text, neighbourhood_cleansed text, neighbourhood_group_cleansed text, city text, state text, zipcode text, market text, smart_location text, country_code text, country text, latitude float, longitude float, is_location_exact boolean, property_type text, room_type text, accommodates int, bathrooms real, bedrooms int, beds int, bed_type text, amenities text, square_feet int, price text, weekly_price text, monthly_price text, security_deposit text, cleaning_fee text, guests_included int, extra_people text, minimum_nights int, maximum_nights int, calendar_updated text, has_availability boolean, availability_30 int, availability_60 int, availability_90 int, availability_365 int, calendar_last_scraped date, number_of_reviews int, first_review date, last_review date, review_scores_rating int, review_scores_accuracy int, review_scores_cleanliness int, review_scores_checkin int, review_scores_communication int, review_scores_location int, review_scores_value int, requires_license boolean, license text, jurisdiction_names text, instant_bookable boolean, cancellation_policy text, require_guest_profile_picture boolean, require_guest_phone_verification boolean, calculated_host_listings_count int, reviews_per_month numeric );
CREATE TABLE abb.reviews ( listing_id bigint, id bigint PRIMARY KEY, date date, reviewer_id bigint, reviewer_name text, comments text );
-
You can now import data using the
pg_azure_storage
extension. Starting with thecalendar.csv
file, use the blob_get function with the azure_storage.options_csv_get utility function to specify the decoding options for the CSV file. Theblob_get
function retrieves a file from blob storage. Forblob_get
to know how to parse the data, you can explicitly define the columns in theFROM
clause. Run the following command to import data from the file. Be sure to replace the{your_storage_account_name}
token with your storage account name.INSERT INTO abb.calendar SELECT * FROM azure_storage.blob_get('{your_storage_account_name}', 'seattle-airbnb-data', 'calendar.csv', options:= azure_storage.options_csv_get(header=>true)) AS cal_rec ( listing_id bigint, date date, available boolean, price text );
Note the
price
column is assigned atext
data type in the table. This data type assignment occurs because the CSV file contains dollar signs ($
) and commas (,
) in some values. You will correct this below to allow numeric operations to occur on the field. -
With the data imported, you can now fix the data type associated with the
price
column so numeric operations can be conducted on the column.ALTER TABLE abb.calendar ALTER column price TYPE numeric USING (REPLACE(REPLACE(price, ',', ''), '$', '')::numeric);
-
Above, you explicitly defined the columns in the
calendar.csv
file in theFROM
clause in theblob_get
function. Alternatively, you can pass a value (NULL::table_name
), which infers the file's columns based on the definition of the target table. This method is beneficial when the target table has many columns, as with thelistings
table. Note that this requires the file and table to have the same structure. Run the following command to import thelistings
data from the file. Replace the{your_storage_account_name}
token with your storage account name.INSERT INTO abb.listings SELECT * FROM azure_storage.blob_get('{your_storage_account_name}','seattle-airbnb-data','listings.csv', NULL::abb.listings,options:= azure_storage.options_csv_get(header=>true));
-
To load data into the
reviews
table, you will employ the alternative approach for importing data in thepg_azure_storage
extension and use the COPY statement. Run the following command to importreviews
data from blob storage. Be sure to replace the{your_storage_account_name}
token with your storage account name.COPY abb.reviews FROM 'https://{your_storage_account_name}.blob.core.windows.net/seattle-airbnb-data/reviews.csv' WITH (FORMAT 'csv', header);
Using the
COPY
command in conjunction with thepg_azure_storage
extension allows it to be used against a private, secured blob storage account container by extending the native PostgreSQLCOPY
command to make it capable of handling Azure Blob Storage resource URLs. TheCOPY
statement does not allow you to specify columns, so verifying the table and file column definitions match precisely is essential. -
Run a few queries to verify the data loaded into each table.
SELECT * FROM abb.calendar WHERE price BETWEEN 100 AND 125 LIMIT 10;
SELECT COUNT(id) FROM abb.listings WHERE neighbourhood_group_cleansed = 'Ballard';
By running the
\x auto
command below before executing the last query, you enable the extended display to be automatically applied when necessary to make the output from the command easier to view in the Azure Cloud Shell.\x auto
SELECT listing_id, date, reviewer_name, comments FROM abb.reviews WHERE listing_id = 7202016 LIMIT 3;
The pg_azure_storage
extension also allows data to be exported from an Azure Database for PostgreSQL to Azure blob storage. In this task, you will export the cleansed calendar
data back to blob storage using the blob_put function.
-
Above, you fixed the data type associated with the
price
column in thecalendar
table by cleaning up unwanted characters from thetext
column and assigning anumeric
data type to the column. You can push the updated data back into blob storage using the following command. Make sure to replace the{your_storage_account_name}
token with the name of your storage account before executing the query.SELECT azure_storage.blob_put('{your_storage_account_name}', 'seattle-airbnb-data', 'calendar.csv', cal_rec) FROM (SELECT listing_id, date, available, price FROM abb.calendar) cal_rec;
-
After the query completes, navigate to your storage account in the Azure portal, select Containers under Data storage in the left-hand menu, choose the seattle-airbnb-data container, and then download the
calendar.csv
file. -
Open the file and observe that data in the
price
column not longer contains dollar signs ($
) or commas (,
) and that the rows with values contain a numeric value.
Congratulations! You have completed the Working with common PostgreSQL extensions hands-on lab. In this lab, you explored the powerful extensibility of Azure Database for PostgreSQL Flexible Server by installing and using some common PostgreSQL extensions.
In the next lab, you will continue exploring these capabilities, looking at how you can extend the database to add the power of generative AI and Large Language Models.