This year I am actively working with Google Cloud (GCP), and I will share how to set up the API Client with Python in WSL (Windows Subsystem Linux) and Anaconda to execute BigQuery SQL queries from scratch.
If you need more information about WSL2 and Anaconda (Miniconda) installation, I may help you with this previous post.
In this post, we are going to see how to:
- Use WSL to create a new Anaconda environment.
- Install the Google API Client Library using PIP
- Create a Google Service Account using the Web Console
- Generate a Google Application Credentials using JSON KEY file
- Set the Google Application Credential environment variable
- Add USA Names public dataset to BigQuery and test a SQL query
- Setup Visual Studio Code to run on WSL
- Call a Python Script to execute a SQL query
- Bonus: Troubleshooting
I recommend you check my previous post in case you don’t have Windows Subsystem Linux (WSL) and Anaconda (Miniconda).
To not mix with existing libraries, we are going to set up a local project doing the following steps:
- Open a WSL2 terminal
- Go to your personal folder a create a new folder gcp_projects
- Create an Anaconda environment with the name gcp_prj
- Check if the new environment was created
- Change to the new environment using activate.
Use the following commands:
cd gcp_projects/conda create --name gcp_prj
conda info --envs
conda activate gcp_prj
We are going to install PIP in the new Anaconda environment and then install the Google API Client library using the following commands:
sudo apt install python3-pip
pip install --upgrade google-cloud-bigquery
The next step is to create a new Project and Service Account by going to https://console.cloud.google.com/ > IAM & Admin > Service Account or https://console.cloud.google.com/iam-admin/serviceaccounts
Create a new project with the name MyBigQueryTest
Create a new service account with the name sa_python_scripts and select the role to access your project.
For testing purposes, I will use the Owner role without granting users access to the service account. Be aware that the Owner role is not recommended for the Production environment
Go to the Service Account and select the MyBigQueryTest project at https://console.cloud.google.com/iam-admin/serviceaccounts and click the email created for the service account to see the details.
Create a new key and download it as a JSON file
Save the file as mybigquerytest.json, I am using the direct WSL path \\wsl$\Ubuntu\home\csaavedra\workspace_linux\gcp_projects
Set the GOOGLE_APPLICATION_CREDENTIALS variable permanently for your profile by editing the .bashrc file.
sudo nano ~/.bashrc
Add the export GOOGLE_APPLICATION_CREDENTIALS at the end of the file:
Press Control + X and select Y for Yes to save the modified buffer (file)
and press Enter key to confirm the file name and the changes.
I am reloading the profile and checking the path with the echo and printenv command for the variable GOOGLE_APPLICATION_CREDENTIALS
Go to the BigQuery Console at https://console.cloud.google.com/bigquery and Add Data Public Datasets as the following picture:
In the market, search USA names and select the U.S. Social Security Administration dataset
Add the Dataset by clicking on the blue button called VIA DATASET:
The USA Names dataset was added, but you cannot find it in the explorer projects tree.
To add it, you need to search USA names and select Broaden search to all projects:
Then you need to click on the pin icon next to the project name bigquery-public-data moving your mouse close to the three dots.
Now you can see the USA Names in your tree of projects:
The next step is to use an editor window and execute the following SQL query:
SELECT name, SUM(number) as total_people
WHERE state = 'TX'
GROUP BY name, state
ORDER BY total_people DESC
The result we are going to use to check that the Python Script is working:
After downloading and installing Visual Studio Code from https://code.visualstudio.com/ go to extensions and install
- Python Extension
- Remote – WSL Extension
As the following image:
Then click on the left bottom corner green icon with
greater-than and less-than sign
And choose New WSL Windows to open a new instance of Visual Studio connected to WSL remotely.
You can check seeing the difference label in the left bottom corner as with the text WSL: Ubuntu
Now you are going to open the folder in WSL by going to file and open the folder:
Select the path /home/csaavedra/workspace_linux/gcp_projects/ and press the ok button.
We can find the JSON KEY file there. Now create a new file by doing right click with the name bigquery-usanames-test.py
No module named Google
If you get the error No module named Google, check the installation of the google cloud library with the command:
pip install --upgrade google-cloud-bigquery
Could not automatically determine credentials
If you get the error: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
The problem is related to the GOOGLE_APPLICATION_CREDENTIALS export as an environment variable.
You can test run the code os.environ or print the printenv in the terminal to see if return the variable.
If not, you can export or reload the profile with the source .bashrc and printenv.
export GOOGLE_APPLICATION_CREDENTIALS="/home/csaavedra/workspace_linux/gcp_projects/mybigquerytest.json"source ~/.bashrc
Also, you can exit in the visual Studio terminal to force reload the session with the environment variable
To finish, I am grateful to Chris von Csefalvay, who gives me the inspiration to keep writing again.
Connecting with Python to Google Cloud Services (GCP) is easy to do using the API Client and a Service Account. After doing this first step, you can do a complex automatization like consuming a Human Excel file using Python Pandas, loading and combining in Google Big Query, and refreshing a Tableau Dashboard.