Serving layer
The main goal of the serving layer is to serve static assets, such as the tracking JavaScript libraries (those will be covered in more detail in another part of this series), and the 1x1 pixel GIF files that are used as endpoints that the tracking library can push its gathered data to. This is done by sending the JSON payload as URL-encoded query strings.
In our use case, we want to leverage existing AWS services and optimize our costs, while providing great response times. From an architectural perspective, there are many ways we could set up this data-gathering endpoint. Amazon CloudFront is a CDN that has currently over 90 edge locations worldwide, thus providing great latencies compared to classical webservers or APIs that are deployed in one or more regions.
It also has a very generous free tier (1TB outgoing traffic, and 10M requests), and with its real-time logs feature a great and very cost-effective way ($0.01 for every 1M log lines) to set up such an endpoint by just storing a 1x1px GIF with appropriate caching headers, to which the JavaScript tracking library will send its payload to as an encoded query string.
CloudFront can use S3 as a so-called origin (where the assets will be loaded from if they aren’t yet in the edge caches), and that’s where the static asset data will be located. Between the CloudFront distribution and the S3 bucket, an Origin Access Identity will be created, which enables secure communication between both services and avoids that the S3 bucket needs to be publicly accessible.
To configure CloudFront real-time logs that contain the necessary information, a RealtimeLogConfig needs to be created. This acts as “glue” between the CloudFront distribution and the Kinesis Data Stream that consumes the logs:
CFRealtimeLogsConfig: Type: AWS::CloudFront::RealtimeLogConfig Properties: EndPoints: - StreamType: Kinesis KinesisStreamConfig: RoleArn: !GetAtt 'AnalyticsKinesisDataRole.Arn' StreamArn: !GetAtt 'AnalyticsKinesisStream.Arn' Fields: - timestamp - c-ip - sc-status - cs-uri-stem - cs-bytes - x-edge-location - time-taken - cs-user-agent - cs-referer - cs-uri-query - x-edge-result-type - asn Name: '${self:service}-cdn-realtime-log-config' # IMPORTANT: This setting make sure we receive all the log lines, otherwise it's just sampled! SamplingRate: 100