Privacy-first analytics: how hard can it be?
I wanted to know if anyone actually reads this site. That's the main thing right now - am I writing into the void or not? But beyond that, I'd also like to understand the traffic better - which posts get the most views, where visitors come from, what devices they use. No tracking individuals, no behavioral profiling, no selling data to advertisers. Just enough data to know what's working and what isn't.
The plan was straightforward: build a small service, embed something on the page, done by the weekend. What could go wrong?
Two hours and a reality check
Roughly two hours into reading about what qualifies as PII - Personally Identifiable Information - I realized this "weekend project" was going to take a bit longer. PII is any data that can be used to identify a specific individual, either on its own or combined with other information. IP addresses and device fingerprints are obvious examples, but what surprised me is how combinations of seemingly harmless data points - screen resolution + timezone + language - can form a unique profile.
Under the GDPR, the definition is even broader - "personal data" covers any information relating to an identifiable natural person. And an IP address? The Court of Justice of the EU has ruled it counts as personal data. Suddenly my "just log a few things" approach started looking a lot more complicated.
This is very much a learning-by-doing situation. I didn't start with a legal textbook - I started with "I want a number" and worked backwards from there. And the deeper I dug, the clearer it became that this is part one of what's going to be a longer series. The first chapter of a privacy-meets-analytics journey.
So many ways to not use JavaScript
One thing I knew from the start: I want this site to stay clean HTML and CSS. No JavaScript. So the usual analytics scripts were immediately off the table - no Google Analytics snippet, no Plausible JS tag, nothing that runs in the browser.
That constraint led me down a research path I didn't expect. It turns out there are quite a few ways to track page views without any client-side scripting:
Server-side middleware - if you control the server, you can log
every request in Flask's after_request hook. Sees 100% of traffic,
zero overhead, invisible to ad-blockers. The best option if you own the stack.
Server log analysis - parsing Nginx or Gunicorn access logs with tools like GoAccess. Same visibility as middleware but requires a post-processing pipeline and careful handling of raw IPs in the logs.
Redirect tracking - routing all links through a
/r?url=... endpoint that logs the click before redirecting. Only
tracks link clicks though, not page views.
CSS-based tracking - using background-image URLs
in CSS to trigger server requests on page render. Creative, but unreliable across
browsers and easy to block.
CSS media query tracking - different URLs in
@media blocks to detect device type or color scheme preference.
Useful as a supplement, not a standalone solution.
Tracking pixel - a tiny invisible image that triggers a server request when the page loads. Works without JavaScript, works with static hosting, and is the approach that resonated with me the most.
The pixel approach
The idea is deceptively simple. You embed a 1x1 transparent GIF on every page. When a browser loads the page, it requests that image from your server. Your server logs the request - what page was visited, when, from what kind of device - and returns the tiny image. The visitor sees nothing. No scripts run. No cookies are set.
In pseudocode, the server-side flow looks roughly like this:
on request for /pixel.gif:
page = query parameter "page"
referrer = extract domain from Referer header
device = classify User-Agent as mobile/desktop/tablet
language = first two letters from Accept-Language
country = GeoIP lookup from IP, then discard the IP
time = current date + 4-hour block
if DNT or GPC header is set:
return the GIF, log nothing
if request looks like a bot:
return the GIF, log nothing
increment counter for (page, referrer, device, language, country, time)
return 1x1 transparent GIF
The key is what happens to the data. The IP address is used only for a GeoIP country lookup - it's never written to disk. The raw User-Agent string is reduced to a device class and thrown away. For external referrers, only the domain name is kept - the full URL path and query string are discarded. Why? Because URLs can contain surprisingly personal information. Think search queries baked into the address bar, session tokens, email addresses in password reset links, someone's shoe size or their mother-in-law's credit card PIN. The point is - there's a lot to think about before deciding what to keep and what to throw away.
The architecture is simple: static site on one side, a small Flask app on a VPS on
the other. The pixel lives on the VPS. The static site just has an
<img> tag pointing to it. No cookies, no ETag,
no Last-Modified - nothing that could be used as a tracking
identifier. The response headers explicitly forbid caching so that every page load
generates a fresh request.
Respecting Do Not Track
Even though this system doesn't process personal data (and therefore DNT isn't
legally required), I want to honor it anyway. If a browser sends
DNT: 1 or the newer Sec-GPC: 1 header (Global Privacy
Control), the server returns the pixel without logging anything. Yes, it means
losing a few percent of data. That's a trade-off I'm comfortable with - the whole
point of this project is to respect the visitor's choices.
Although - should I really skip the visit entirely? I could still count it as a page view but with nulls for all user-derived fields like country, device, or language. That way the total visit count stays accurate while still respecting the "don't profile me" intent behind the flag. Honestly, I'm not sure yet.
The VPS situation
I built this site to be as static as possible. Plain HTML files, a CSS stylesheet, no server required - just drop the files on any hosting platform and you're done. But a tracking pixel needs a server to receive requests and store data. And that means I'm going to need a VPS.
I've been putting this off, but the tracking pixel is just the beginning. I have other projects in the pipeline that need to actually run somewhere to provide real value - not everything can live only as a Docker image on DockerHub or a repository on GitHub. And I'd rather not expose anything to my private home server. A cheap VPS at a European provider like Hetzner (not an ad, please don't come for me, UOKiK, hehe) is looking increasingly inevitable.
What's next
I have an idea for how to implement this so that each page gets its own dedicated
image path instead of pointing to the same file with different query parameters.
No ?page=/blog/post in the markup - just a clean image reference
unique to each page. But that's a topic for the next post in this series.
There will be more posts about this. The implementation, the deployment, maybe even a dashboard, or the edge cases I haven't thought of yet.
If there's one thing I've taken away from this initial deep dive, it's that "measure twice, cut once" applies doubly here. The core tension of this project is maximizing the usefulness of collected data while preserving as much visitor privacy as possible. And of course, doing all of it within the bounds of the law. Those goals are fundamentally at odds, and getting the balance right takes more than a weekend of coding. It takes reading, thinking, and being honest about what you actually need versus what you could technically grab.
And so the ethical web analytics saga begins.