Storage layer
The storage layer consists of three S3 buckets, where each conforms to a zone outlined in the reference architecture diagram:
- A raw bucket, where the raw incoming data to the Kinesis Firehose Delivery Stream is backed up to (partitioned by
event_date
) - A cleaned bucket, where the data is stored by the Kinesis Firehose Delivery Stream (partitioned by
domain_name
,event_date
andevent_type
) - A curated bucket, where the aggregated pageviews and visitors data are stored on a daily level (partitioned by
domain_name
andevent_date
, in thestats
prefix), on an hourly level (partitioned byevent_date
andevent_hour
, in thedaily-stats
prefix), as well as the aggregated and filtered events (partitioned bydomain_name
,event_date
andevent_name
, in theevents
prefix). Furthermore, the complete historical data is stored in a DuckDB database that can be found at theduckdb/data.duckdb
key.