Run BigQuery SQL using Python API Client | by Cristian Saavedra Desmoineaux | Aug, 2022

Source Image from unsplash.com

This year I am actively working with Google Cloud (GCP), and I will share how to set up the API Client with Python in WSL (Windows Subsystem Linux) and Anaconda to execute BigQuery SQL queries from scratch.

If you need more information about WSL2 and Anaconda (Miniconda) installation, I may help you with this previous post.

In this post, we are going to see how to:

  1. Use WSL to create a new Anaconda environment.
  2. Install the Google API Client Library using PIP
  3. Create a Google Service Account using the Web Console
  4. Generate a Google Application Credentials using JSON KEY file
  5. Set the Google Application Credential environment variable
  6. Add USA Names public dataset to BigQuery and test a SQL query
  7. Setup Visual Studio Code to run on WSL
  8. Call a Python Script to execute a SQL query
  9. Bonus: Troubleshooting
  10. Thanks
  11. Conclusion

I recommend you check my previous post in case you don’t have Windows Subsystem Linux (WSL) and Anaconda (Miniconda).

To not mix with existing libraries, we are going to set up a local project doing the following steps:

  1. Open a WSL2 terminal
  2. Go to your personal folder a create a new folder gcp_projects
  3. Create an Anaconda environment with the name gcp_prj
  4. Check if the new environment was created
  5. Change to the new environment using activate.

Use the following commands:

mkdir gcp_projects
cd gcp_projects/
conda create --name gcp_prj
conda info --envs
conda activate gcp_prj
I am creating a new Anaconda environment. Image by the author.

We are going to install PIP in the new Anaconda environment and then install the Google API Client library using the following commands:

sudo apt install python3-pip
pip install --upgrade google-cloud-bigquery
Installing PIP. Image by the author.
Installing Google API Client Library for Python. Image by the author.

The next step is to create a new Project and Service Account by going to https://console.cloud.google.com/ > IAM & Admin > Service Account or https://console.cloud.google.com/iam-admin/serviceaccounts

Google Cloud Console, Service Account. Image by the author

Create a new project with the name MyBigQueryTest

Google Cloud Console, create a project. Image by the author

Create a new service account with the name sa_python_scripts and select the role to access your project.

Creating a Service Account in Google Cloud Console. Image by the author

For testing purposes, I will use the Owner role without granting users access to the service account. Be aware that the Owner role is not recommended for the Production environment

Grant role to Service Account, Google Console. Image by author

Go to the Service Account and select the MyBigQueryTest project at https://console.cloud.google.com/iam-admin/serviceaccounts and click the email created for the service account to see the details.

Service Accounts are available by email. Image edited by the author.

Create a new key and download it as a JSON file

Create a new Service Account Key. Image by the author

Save the file as mybigquerytest.json, I am using the direct WSL path \\wsl$\Ubuntu\home\csaavedra\workspace_linux\gcp_projects

Save to WSL using \\wsl$ hide Windows shared network. Image by the author.
Checking the file in WSL Terminal Ubuntu. Image by the author

Set the GOOGLE_APPLICATION_CREDENTIALS variable permanently for your profile by editing the .bashrc file.

sudo nano ~/.bashrc
Editing your profile. Image by the author.

Add the export GOOGLE_APPLICATION_CREDENTIALS at the end of the file:

export GOOGLE_APPLICATION_CREDENTIALS="/home/csaavedra/workspace_linux/gcp_projects/mybigquerytest.json"
Add export GOOGLE_APPLICATION_CREDENTIALS to your profile. Image by the author.

Press Control + X and select Y for Yes to save the modified buffer (file)

Confirm changes. Image by the author

and press Enter key to confirm the file name and the changes.

Confirm file name. Image by the author

I am reloading the profile and checking the path with the echo and printenv command for the variable GOOGLE_APPLICATION_CREDENTIALS

source ~/.bashrc
echo $GOOGLE_APPLICATION_CREDENTIALS
printenv GOOGLE_APPLICATION_CREDENTIALS
Reloading the profile and checking the Google Application Credential Path. Image by the author.

Go to the BigQuery Console at https://console.cloud.google.com/bigquery and Add Data Public Datasets as the following picture:

Add Public Datasets to BigQuery. Image by the author

In the market, search USA names and select the U.S. Social Security Administration dataset

Google Public Dataset Market. Image by the author.

Add the Dataset by clicking on the blue button called VIA DATASET:

Adding USA Names dataset. Image by the author

The USA Names dataset was added, but you cannot find it in the explorer projects tree.

USA Names added. Image by the author.

To add it, you need to search USA names and select Broaden search to all projects:

Search USA Names. Image by the author.

Then you need to click on the pin icon next to the project name bigquery-public-data moving your mouse close to the three dots.

Pinging BigQuery public project. Image by the author.

Now you can see the USA Names in your tree of projects:

Image by the author.

The next step is to use an editor window and execute the following SQL query:

SELECT name, SUM(number) as total_people
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE state = 'TX'
GROUP BY name, state
ORDER BY total_people DESC
LIMIT 20

The result we are going to use to check that the Python Script is working:

Query result of USA Names. Image by the author.

After downloading and installing Visual Studio Code from https://code.visualstudio.com/ go to extensions and install

  • Python Extension
  • Remote – WSL Extension

As the following image:

Visual Studio Code, extensions. Image by the author.

Then click on the left bottom corner green icon with
greater-than and less-than sign

Image by the author.

And choose New WSL Windows to open a new instance of Visual Studio connected to WSL remotely.

Image by the author.

You can check seeing the difference label in the left bottom corner as with the text WSL: Ubuntu

WSL: Ubuntu. Image by the author.

Now you are going to open the folder in WSL by going to file and open the folder:

Visual Studio Code, open folder. Image by the author.

Select the path /home/csaavedra/workspace_linux/gcp_projects/ and press the ok button.

Open the remote folder. Image by the author.

We can find the JSON KEY file there. Now create a new file by doing right click with the name bigquery-usanames-test.py

Create a new remote file. Image by the author.

No module named Google

No module named google error. Image by the author.

If you get the error No module named Google, check the installation of the google cloud library with the command:

pip install --upgrade google-cloud-bigquery

Could not automatically determine credentials

If you get the error: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started

The problem is related to the GOOGLE_APPLICATION_CREDENTIALS export as an environment variable.

You can test run the code os.environ or print the printenv in the terminal to see if return the variable.

import os
print(os.environ)

If not, you can export or reload the profile with the source .bashrc and printenv.

export GOOGLE_APPLICATION_CREDENTIALS="/home/csaavedra/workspace_linux/gcp_projects/mybigquerytest.json"source ~/.bashrc
echo $GOOGLE_APPLICATION_CREDENTIALS
printenv GOOGLE_APPLICATION_CREDENTIALS

Also, you can exit in the visual Studio terminal to force reload the session with the environment variable

To finish, I am grateful to Chris von Csefalvay, who gives me the inspiration to keep writing again.

Happy Development!

Connecting with Python to Google Cloud Services (GCP) is easy to do using the API Client and a Service Account. After doing this first step, you can do a complex automatization like consuming a Human Excel file using Python Pandas, loading and combining in Google Big Query, and refreshing a Tableau Dashboard.

Leave a Reply

Your email address will not be published.