GCP Goodies Part 9 — Time series data handling and visualization

Published in

SoftwareMill Tech Blog

8 min readOct 24, 2019

This time we are going to utilize a couple of GCP services to display a report. The whole point of this tutorial is to use BigQuery with DataStudio but to make things a little bit more interesting and challenging at the same time we are going to push the data continuously to the database using simple web application.

Setup Overview

The report itself will be build with Google Data Studio but the data for the report will travel from the CSV file read at random positions by the web application on Kubernetes cluster. The web application will push the data to MySQL database and that data will be available to BigQuery. Data read by BigQuery will become the source of truth for the report. Throughout this tutorial you will be able to see how easy it is to create this flow and how powerful DataStudio with BigQuery can become.

Creating GCP Infrastructure

As usual, we are going to create a Kubernetes cluster where we deploy our web application. The sources for the application which you can build and deploy are available as part of the whole GCP Goodies series here.

Open up a console and navigate to part-9 directory. We need to set some variables first, like the name of the cluster to be created or the zone where we want to create it.

Important note: Select your zone wisely, as for the cluster itself it’s not of much importance the zone in which our SQL Instance will operate; it’s very important as for some of them its not possible to use BigQuery federated queries. In other words, creating SQL Instance in wrong zone can result in inability to use BigQuery with it.

NAME=stackdriver-test-9
ZONE=us-west2-a 
gcloud auth login
gcloud config set compute/zone $ZONE
gcloud config set project softwaremill-playground-2

Create a 2-node cluster:

gcloud container clusters create $NAME --num-nodes=2 --scopes=https://www.googleapis.com/auth/cloud-platform

Now it’s time to create our SQL instance. Navigate to Storage -> SQL and select Create Instance

Set up the details like name, instance type, region and leave the rest with defaults. Make sure that the region you choose is allowed in BigQuery CloudSQL Federated Query mechanism as mentioned earlier. Cloud SQL is supported in all BigQuery regions: https://cloud.google.com/bigquery/docs/locations.

It would be best if you choose the same region/zone as you have chosen for the cluster in the previous step.

Once the instance gets created, click on its name and navigate to the Users tab to create a new user:

Write down the Username and Password, we will need that for later when creating secrets on Kubernetes cluster.

The next step would be to create the database itself, navigate to Databases tab and click Create Database button:

Now we need to create a service account which we will use to connect from our pod on GKE to SQL Instance.

Navigate to APIs & Services -> Credentials and click on Create Credentials -> Service Account Key

Select New Service Account, add name and select role to be Cloud SQL Client or Cloud SQL Admin. Click Create to download the credentials json file.

Data set

To make things a little bit easier on the web application side we are going to use already available time series data set. This will allow us to build graphs from data which makes some sense and it’s not completely randomly generated. Our data set is called Occupancy Detection Dataset and it describes measurements of a room. Used with machine learning models the objective is to predict whether or not the room is occupied.

There are 20,560 one-minute observations taken over the period of a few weeks. There are attributes like temperature, light, co2, humidity etc.

The source for the data is credited to Luis Candanedo from UMONS.

In our web application, the scheduler pings an Akka actor to push a randomly picked row from this data set to the database:

This way, when creating the report we can use Refresh functionality to see how the data changes over time while building the report.

Download the data set and extract it to some location on your local machine. We will need to push it later to our Kubernetes cluster for use by the web application.

Releasing the application

At this point you should have Kubernetes cluster and Cloud SQL Instance running, database and database user created, google credentials file for Cloud SQL and data set downloaded. It’s time to play with kubectl command.

gcloud compute instances list
gcloud container clusters get-credentials $NAME
kubectl get po

For our Kubernetes pod to be able to connect to Cloud SQL instance we need to create two separate secrets, one holding the credentials to the database and one storing the credentials.json file for our service account.

Create a file with the following content (changing the values to the ones you have used when creating the user and the database earlier)

username=some_username
password=some_password

(there is a file called credentials.txt inside the conf directory of the part-9 project)

Create secrets on the cluster itself:

kubectl create secret generic cloudsql-instance-credentials --from-file=credentials.json=/path/to/your/credentials.jsonkubectl create secret generic cloudsql-db-credentials --from-env-file ./conf/credentials.txt

Get the database connection string:

gcloud beta sql instances describe datareporting | grep connection

and change kubernetes-manifest.yaml under command section, e.g:

command: ["/cloud_sql_proxy",
          "-instances=softwaremill-playground-2:us-west1:datastorage=tcp:3306",

Create config map to store our data read by the web application every second at random position and insert it into the database:

kubectl create configmap config --from-file /path/to/your/datatraining.txt

Last, change the database name in application.conf to the one you have created earlier.

Build the application with sbt clean release and note the version generated. Use that version to change value in kubernetes-manifest.yaml under theimage path property.

Release the app:

kubectl create -f conf/kubernetes-manifest.yaml

Querying data with BigQuery

Now the fun part begins. We are going to connect BigQuery with our Cloud SQL Instance. To do that we will use Federated Query functionality of BigQuery to run queries against our sql instance.

Navigate to Big Data -> Big Query and then ADD DATA and Create Connection.

We are going to use Cloud SQL Federated Queries to play with our data set.

Cloud SQL federated queries | BigQuery | Google Cloud

This page describes how to query data in BigQuery and Cloud SQL with a federated query. Beta This product or feature is…

cloud.google.com

Fill up the connection details:

The instance id will be the same as the one used in kubernetes-manifest file.

Select your external connection, click Query Connection button and example BigQuery sql will be populated for you, click Run

You can query all the records stored so far in roomdata table with simple:

SELECT * FROM EXTERNAL_QUERY(“softwaremill-playground-2.us-west2.datareporting”, “SELECT * FROM roomdata;”);

You should see the records available in BigQuery query results pane:

Now we can start creating our first report, click Explore with Data Studio button and start adding charts:

First chart will be simple line graph, pick it up from Add a chart drop down and set the dimensions like the ones below:

You should see lines for each metric being added to the graph as you follow along adding the metrics.

There is a ton of functionalities coming out of the box in graphs built with data studio, you can zoom on the data range, select specific metrics you want to see and many more. Play around it to see how cool the data studio charts can be.

The next kind of an item we are going to add are Gauges, Gauge is a community widget so you select it using the drop down with boxes icon on the right of the Add a chart selection, pick one, change its size on the dashboard and set the dimensions as follows:

Just to make things a little more interesting, the last item we are going to add will be table, pick one from the Add a chart drop down selection and fill the metrics and dimensions as follows:

You can set the number of rows per page, create summary row, sort the data etc.

Final version of our Data Studio report looks like the following:

What’s interesting is that selecting single value in chart causes the rest of the widgets to display only that value:

Similarly, selecting single point in our table causes single point display on our chart and gauges:

Once you are happy with what you’ve got on the dashboard you can create report out of Data Studio and share it:

Creating report functionality allows you to add some additional text and other widgets:

Click View:

When you finish polishing up your report you can share it. Again there are a few options to choose from, similarly to the standard Google Docs functionality:

In this part of the GCP Goodies tutorial we went from building a simple web application which generates some data, processing that data in BigQuery up to analyzing it and creating report with Data Studio.

Flow like this could be an end-to-end process for some companies. This shows how powerful Google Computing Platform with its plethora of services could be. Having only a GCP console and less than an hour you can often create a full-blown solution — gathering, processing and visualizing your data.

GCP Goodies Part 9 — Time series data handling and visualization

Setup Overview

Creating GCP Infrastructure

Data set

Releasing the application

Querying data with BigQuery

Cloud SQL federated queries | BigQuery | Google Cloud

This page describes how to query data in BigQuery and Cloud SQL with a federated query. Beta This product or feature is…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in SoftwareMill Tech Blog

Written by Krzysztof Grajek

No responses yet