Path analysis with GA4 data in BigQuery
Path analysis with GA4 data in BigQuery
In this article we will focus on querying the user journey paths on our website or app. This query can provide many insights, such as identifying common paths or patterns, understanding the behaviour of different user segments, and optimising user journeys. We can also gain a good understanding of the role of individual channels within the user journey to inform both budget allocation on channels and the user experience, helping to optimise marketing and increase conversions in the future.
Figure 1: General path example focussing on marketing channels
In the following query, we get the sequence of channels and timestamps per user. This can be extended to sequences of devices or filtered for paths that lead to a conversion goal. To get the timestamps of each touchpoint we will get the event timestamp of the session_start event in the first subquery. For getting the source and medium combination in the second subquery we will unnest the event parameters to capture the source and medium of the individual sessions (We have talked about how to query source and medium for individual sessions in this article ). These two tables are then joined together on the user id to bring timestamps of session starts and source/medium combinations in one table. In the last part of the query, we will build our sequences of timestamps and source/medium using an aggregation function called ‘STRING_AGG’ which takes all timestamps and source/medium combinations from a user and concatenates them into a single string sorted by ascending timestamp. See the transformation example below:
Figure 2: STRING_AGG transformation example
WITH #---get the session start timestamp t1 AS ( SELECT DISTINCT user_pseudo_id, ( SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_id') AS session_id, event_timestamp, FROM -- Replace the following table WITH your own data `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*` WHERE event_name='session_start' ), #---get the source/medium combination of each session t2 AS( SELECT DISTINCT user_pseudo_id AS user_pseudo_id_m, ( SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_id') AS session_id_m, CONCAT( MAX(( SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'source')), ' / ', MAX(( SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'medium'))) AS source_medium, FROM -- Replace the following table WITH your own data `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*` GROUP BY 1, 2), #--- join session start timestamps and the channels of each session t3 AS( SELECT t1.*, t2.source_medium FROM t1 LEFT JOIN t2 ON t1.user_pseudo_id=t2.user_pseudo_id_m AND t1.session_id=t2.session_id_m ORDER BY 1, 2, 3) #--- aggregate per user the timestamps and touchpoints in order SELECT user_pseudo_id, STRING_AGG(CAST(event_timestamp AS STRING), ' > ' ORDER BY event_timestamp) AS timestamp, STRING_AGG(source_medium, ' > 'ORDER BY event_timestamp) AS touchpoints, FROM t3 GROUP BY 1 ORDER BY 1, 2
This report can be extended for various use cases. Here are some tips to get inspired:
1. Product category path analysis: This query can be adjusted to a product category level to learn about their user journey. For example, the number of touchpoints and the length of time from first visit to purchase. By learning about the individual journeys of the product categories, we can create more effective marketing activities at the product category level to optimise our budget planning.
Most path taken for view/conversion of product category A
2. Conversion path analysis: We can break down this report by different conversion goals such as add to cart and purchase. By analysing the specific paths that users take on their way to purchase, we can understand the common touchpoints and interactions that lead to this goal and adjust our marketing efforts accordingly, potentially delivering more conversions and revenue.
Most path taken that leads to a specific conversion goal
Happy querying!