Distributed Cache: A Free-Tier Deployment Network Latency Solution
Abstract
Following infrastructure reliability issues with a self-managed local Virtual Private Server (VPS), a monolithic Golang application was migrated to Google Cloud Platform (GCP) Cloud Run to leverage a serverless architecture. However, utilizing the GCP free tier introduced significant network latency, as the deployment is restricted to the US-Central1 region—geographically distant from the target users in Southeast Asia. While provisioning paid regional resources easily resolves this, this article explores a cost-effective alternative. It demonstrates how to leverage Cloudflare Workers as a globally distributed edge computing and caching layer. By executing cache logic at the nearest Point of Presence (PoP), this architecture mitigates latency and maintains a zero-cost infrastructure, offering a viable deployment strategy for developers relying on free-tier cloud services.
Background
I have a project that was previously deployed on a local virtual private server. I had already set up Infrastructure as Code (IaC) to handle security and proxying, which included SSH private key-only access, port-knocking, and iptables firewalls. For two months, it ran perfectly.
However, all that effort recently led to some overthinking when the setup suddenly broke. I couldn't SSH into the server, and the application went down. After investigating, the culprit was allegedly my network outbound being blocked by the provider's router.
I communicated with them but found no solution. I didn't want to play the blame game, and I realized my self-managed setup wasn't exactly a silver bullet either. I decided it was time to move to a more reliable infrastructure provider. When moving to a higher-tier provider, the first thing that comes to mind is cost, so I decided I didn't want to manage servers for this project anymore.
I was left with two main options: Cloudflare Workers or GCP Cloud Run. Since my project is a full-stack, single-binary Golang application, migrating it entirely to Cloudflare Workers would have added too much headache to my life. Because of this, GCP Cloud Run became the strong candidate.
I successfully implemented the deployment on the GCP Cloud Run free tier, but a new issue arose: network latency. The previous deployment was running in the same region as my client, but the GCP free tier is restricted to the US-Central1 region, while we are located in South East Asia.
I knew the easiest solution was just to ask my client to provision their own GCP resource and set the region to South East Asia. They accepted it, and the immediate problem was solved.
So, what is this article for? I am writing this to elaborate on my exploration of a "what-if" scenario: what if that simple solution hadn't been accepted? My exploration led me to a different architecture. Since I had initially considered Cloudflare Workers as a total solution, I realized I could instead just use Cloudflare as a cache layer and regulator in front of GCP. Because Cloudflare Workers execute at the nearest edge PoP (Point of Presence) by default, this approach could bypass the latency issue while still utilizing free-tier services—though it does mean managing more moving parts.
System Architecture
How it works
The core idea is to intercept every incoming request at the edge before it even reaches the origin server in the US. Here is the step-by-step breakdown of the flow:
- The Edge Intercept: When a user (e.g., in Southeast Asia) accesses the application, their request is routed to the nearest Cloudflare Point of Presence (PoP). The Cloudflare Worker intercepts this request immediately.
- Cache Lookup: The Worker checks the Cloudflare distributed cache. If the requested data is already cached and valid (a cache hit), the Worker serves the response directly to the user. This takes only a few milliseconds, effectively eliminating the geographical latency to the US-Central1 region.
- Origin Fetch (Cache Miss): If the data is not in the cache, the Worker acts as a proxy and forwards the request to the GCP Cloud Run instance.
- Data Processing: The Cloud Run instance processes the request, communicating with the Neon serverless database as needed, and returns the generated response back to the Worker.
- Caching and Response: The Worker receives the response, stores it in the cache for future requests, and finally serves it to the user.
By shifting the caching layer to the edge, the application feels fast and responsive, regardless of where the origin server is physically located.
The Benefits
- Zero Infrastructure Cost: By carefully combining the free tiers of Cloudflare (Workers/Cache), GCP (Cloud Run), and Neon (Serverless Postgres), you can run a highly available, globally distributed application for absolutely nothing.
- Drastically Reduced Latency: Static assets and cacheable dynamic responses are served from a data center near the user, providing a snappy experience.
- No Server Maintenance: Say goodbye to SSH keys, port-knocking, and iptables. The serverless nature of this architecture means there is no underlying operating system to patch or secure.
The Trade-offs
Of course, there is no such thing as a free lunch. This setup comes with its own set of challenges:
- Architectural Complexity: What used to be a simple, single Go binary deployed on a VPS is now a distributed system spanning three different cloud providers. Debugging issues requires checking logs across multiple dashboards.
- Cache Invalidation: The age-old computer science problem. Serving dynamic content requires a robust strategy for cache invalidation. You have to carefully manage
Cache-Controlheaders and potentially implement webhooks to purge the Cloudflare cache when data changes in the database. - Cold Starts: Both GCP Cloud Run and Neon Database scale to zero when idle. If your application doesn't receive traffic for a while, the first user to trigger a cache miss will experience the combined cold start latency of both services.
Conclusion
While my initial client problem was resolved simply by changing the deployment region (and accepting the associated costs), this exploration demonstrates that network latency on free-tier services is not an insurmountable hurdle.
By strategically placing a distributed edge cache in front of a centralized serverless backend, you can achieve global performance on a shoestring budget. It is a powerful pattern for personal projects, MVPs, or any scenario where budget constraints outweigh architectural simplicity. It proves that with the right combination of tools, we can engineer our way around physical limitations.