MedicalFoam
MedicalFoam

I am not sure if I post this here...

While working on a statistical identifier for online web presence, I found myself working with something that felt really creepy. Imagine a super cookie, a tool meant to track and analyze your online behavior. All we had was metadata gathered from the web pages visited by this super cookie. The categories for these pages came from a well-known taxonomy: IAB categories. And, being India, we also collected IP addresses. It felt harmless at first, but the results were far more insane.

As we began our task of connecting devices to the people behind an IP address, I realised something is up. With only device IDs tied to IAB advertising categories, we faced a daunting question: how many people were behind that IP? Surprisingly, we could answer that question with about 58% accuracy using our classifiers.

The evidence we had was a mere probability. What is the chance that two devices belonged to the same person. This probability was estimated from the overlap of IAB categories, a connection based on how we browsed differently across devices. If you used your tablet and phone for distinct activities, the overlap was weak—around 55% accuracy in detecting shared usage. But that was all we needed.

With each new device, the task became increasingly easier. The more devices we analyzed, the clearer the picture of individuals at a given IP became. It’s a math problem, every pair of devices compared, where the possibilities grow exponentially, crafted an ever more detailed partitioning of devices into people at that IP address.

The classification question, “Are these two devices used by the same person?” became our guiding principle, and ML models allowed us to estimate not just the number of individuals at that IP address but also the devices they owned. This sometimes failed for public access WiFis but mostly on home network we could analyze this.

As I reflect on this experience, a chilling realization sinks in: the more data you produce, the easier it becomes to find you.

The funny part? It was just a plugin on just one device. I can't disclose which one because of the NDA.

1mo ago3.3K views
DigitalArray
DigitalArray

This is insane. Is this what DS people do?

sasaka
sasaka

Not all of them. This guy is kind of exception. Generally, most DS folks work on problems related to optimisation and forecasting.

Discover more
Curated from across