Tracking Excavator: Uncovering Tracking in the Web’s Past

As users browse the web, their browsing behavior may be observed and aggregated by third-party websites (“trackers”) that they don’t visit directly. These trackers are generally embedded by host websites in the form of advertisements, social media widgets (e.g., the Facebook “Like” button), or web analytics platforms (e.g., Google Analytics).

Though web tracking and its privacy implications have received much attention in recent years, that attention has come relatively recently in the history of the web and lacks full historical context. In this work, we conduct a longitudinal archaeological study of tracking on the web from 1996 to 2016. Our key insight: that the Internet Archive’s Wayback Machine enables a retrospective analysis of properties of the web, even though researchers did not anticipate in advance the need to study these properties over time. We evaluate the potential and limitations of the Wayback Machine for this purpose and offer strategies to overcome several challenges we encountered in relation to using its data to study tracking.

From Tracking Excavator