History sniffing is a class of web vulnerabilities and attacks that allow a website to track a user's web browsing history activities by recording which websites a user has visited and which the user has not. This is done by leveraging long-standing information leakage issues inherent to the design of the web platform, one of the most well-known of which includes detecting CSS attribute changes in links that the user has already visited.

Despite being known about since 2002, history sniffing is still considered an unsolved problem. In 2010, researchers revealed that multiple high-profile websites had used history sniffing to identify and track users. Shortly afterwards, Mozilla and all other major web browsers implemented defences against history sniffing. However, recent research has shown that these mitigations are ineffective against specific variants of the attack and history sniffing can still occur via visited links and newer browser features.

Background

edit

Early browsers such as Mosaic and Netscape Navigator were built on the model of the web being a set of statically linked documents known as pages. In this model, it made sense for the user to know which documents they had previously visited and which they hadn't, regardless of which document was referring to them.[1] Mosaic, one of the earliest graphical web browsers, used purple links to show that a page had been visited and blue links to show pages that had not been visited.[2][3] This paradigm stuck around and was subsequently adopted by all modern web browsers.[4]

Over the years, the web evolved from its original model of static content towards more dynamic content. In 1995, employees at Netscape added a scripting language, Javascript, to its flagship web browser, Netscape Navigator. This addition allowed users to add interactivity to the web page via executing Javascript programs as part of the rendering process.[5][6] However, this addition came with a new security problem, that of these Javascript programs being able to access each other's execution context and sensitive information about the user. As a result, shortly afterwards, Netscape Navigator introduced the same-origin policy. This security measure prevented Javascript from being able to arbitrarily access data in a different web page's execution context.[7] However, while the same-origin policy was subsequently extended to cover a large variety of features introduced before its existence, it was never extended to cover hyperlinks since it was perceived to hurt the user's ability to browse the web.[4] This innocuous omission would manifest into one of the well known and earliest forms of history sniffing known on the web.[8]

History

edit
 
By extracting the colour of certain links, a website can access personally identifiable information. In this example, the website could infer that the user might be interested in leukemia, a form of blood cancer.

One of the first publicly disclosed reports of a history sniffing exploit was made by Andrew Clover from Purdue University in a mailing list post on BUGTRAQ in 2002. The post detailed how a malicious website could use Javascript to determine if a given link was of a specific colour, thus revealing if the link had been previously visited.[9] While this was initially thought of to be a theoretical exploit with little real-world value, later research by Jang et al. in 2010 revealed that high-profile websites were using this technique in the wild to reveal user browsing data.[10] As a result multiple lawsuits were filed against the websites that were found to have used history sniffing alleging a violation of the Computer Fraud and Abuse Act of 1986.[8]

In the same year, L. David Baron from Mozilla Corporation developed a defence against the attack that all major browsers would later adopt. The defence included restrictions against what kinds of CSS attributes could be used to style visited links. The ability to add background images and CSS transitions to links was disallowed. Additionally, visited links would be treated identically to standard links, with Javascript application programming interfaces (APIs) that allow the website to query the color of specific elements returning the same attributes for a visited link as those for non-visited links. This ensured malicious websites could not simply infer a person's browsing history by querying the colour changes.[11]

In 2011, research by then-Stanford graduate student Jonathan Mayer found that advertising company Epic Marketplace Inc. had used history sniffing to collect information about the browsing history of users across the web.[12][13] A subsequent investigation by the Federal Trade Commission (FTC) revealed that Epic Marketplace had used history sniffing code as a part of advertisements in over 24,000 web domains, including ESPN and Papa Johns. The Javascript code allowed Epic Marketplace to track if a user has visited any of over 54,000 domains.[14][15] The resulting data was subsequently used by Epic Marketplace to categorize users into specific groups and serve advertisements based on the websites the user had visited. As a result of this investigation, the FTC banned Epic Marketplace Inc. from conducting any form of online advertising and marketing for twenty years and was ordered to permanently delete the data it had collected.[16][15]

Threat model

edit

The threat model of history sniffing relies on the adversary being able to direct the victim to a malicious website entirely or partially under the adversary's control. The adversary can accomplish this by compromising a previously good web page, by phishing the user to a web page allowing the adversary to load arbitrary code, or by using a malicious advertisement on an otherwise safe web page.[8][17] While most history sniffing attacks do not require user interactions, specific variants of the attacks need users to interact with particular elements which can often be disguised as buttons, browser games, CAPTCHAs, and other such elements.[4]

Modern variants

edit

Despite being partially mitigated in 2010, history sniffing is still considered an unsolved problem.[8] In 2011, researchers at Carnegie Mellon University showed that while the defences proposed by Mozilla were sufficient to prevent most non-interactive attacks, such as those found by Jang et al., they were ineffective against interactive attacks. By showing users overlaid letters, numbers and patterns, which would only reveal themselves if a user had visited a specific website, the researchers were able to trick 307 participants into potentially revealing their browsing history via history sniffing. This was done by presenting the activities in the form of pattern solving problems, chess games and CAPTCHAs.[18][4]

In 2018, researchers at the University of California, San Diego demonstrated timing attacks that could bypass the mitigations introduced by Mozilla. By abusing the CSS paint API (which allows developers to draw a background image programmatically) and targeting the bytecode cache of the browser, the researchers were able to time the amount of time it took to paint specific links. Thus, they were able to provide probabilistic techniques for identifying visited websites.[19][20]

Since 2019, multiple history sniffing attacks have been found targeting various newer features browsers provide. In 2020, Sanchez-Rola et al. demonstrated that by measuring the time a server takes to respond to a request with HTTP cookies and then comparing it to how long it took for a server to respond without cookies, a website could perform history sniffing.[21] In 2023, Ali et al. demonstrated that newly introduced browser features could be abused also to perform history sniffing. One particularly notable example highlighted was the fact that a recently introduced feature, the Private Tokens API, introduced under Google's Privacy Sandbox initiative with an intention to prevent user tracking, could allow malicious actors to exfiltrate users browsing data by using techniques similar to those used for cross-site leak attacks.[22]

References

edit
  1. ^ "WorldWideWeb: Proposal for a HyperText Project". www.w3.org. Archived from the original on 29 June 2023. Retrieved 15 November 2023.
  2. ^ "Why are hyperlinks blue? | The Mozilla Blog". blog.mozilla.org. Archived from the original on 15 November 2023. Retrieved 15 November 2023.
  3. ^ "EMail Msg". ksi.cpsc.ucalgary.ca. Archived from the original on 15 November 2023. Retrieved 15 November 2023.
  4. ^ a b c d Weinberg, Zachary; Chen, Eric Y.; Jayaraman, Pavithra Ramesh; Jackson, Collin (2011). "I Still Know What You Visited Last Summer: Leaking Browsing History via User Interaction and Side Channel Attacks". 2011 IEEE Symposium on Security and Privacy. IEEE. pp. 147–161. doi:10.1109/SP.2011.23. ISBN 978-1-4577-0147-4. S2CID 10662023. Archived from the original on 24 December 2022. Retrieved 30 October 2023.
  5. ^ "JavaScript 1.0 – 1995". www.webdesignmuseum.org. Archived from the original on 7 August 2020. Retrieved 19 January 2020.
  6. ^ "Welcome to Netscape Navigator Version 2.0". netscape.com. 14 June 1997. Archived from the original on 14 June 1997. Retrieved 16 February 2020.
  7. ^ "Netscape 3.0 Handbook – Advanced topics". netscape.com. Archived from the original on 8 August 2002. Retrieved 16 February 2020. Navigator version 2.02 and later automatically prevents scripts on one server from accessing properties of documents on a different server.
  8. ^ a b c d Van Goethem, Tom; Joosen, Wouter; Nikiforakis, Nick (12 October 2015). "The Clock is Still Ticking: Timing Attacks in the Modern Web". Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. CCS '15. New York, NY, USA: Association for Computing Machinery. pp. 1382–1393. doi:10.1145/2810103.2813632. ISBN 978-1-4503-3832-5. S2CID 17705638.
  9. ^ "Bugtraq: CSS visited pages disclosure". seclists.org. Archived from the original on 16 November 2023. Retrieved 16 November 2023.
  10. ^ Jang, Dongseok; Jhala, Ranjit; Lerner, Sorin; Shacham, Hovav (4 October 2010). "An empirical study of privacy-violating information flows in JavaScript web applications". Proceedings of the 17th ACM conference on Computer and communications security. CCS '10. New York, NY, USA: Association for Computing Machinery. pp. 270–283. doi:10.1145/1866307.1866339. ISBN 978-1-4503-0245-6. S2CID 10901628.
  11. ^ "privacy-related changes coming to CSS:visited – Mozilla Hacks – the Web developer blog". Mozilla Hacks – the Web developer blog. Archived from the original on 7 June 2023. Retrieved 16 November 2023.
  12. ^ "Tracking the Trackers: To Catch a History Thief". cyberlaw.stanford.edu. 19 July 2011. Archived from the original on 16 November 2023. Retrieved 16 November 2023.
  13. ^ Goodin, Dan. "Marketer taps browser flaw to see if you're pregnant". www.theregister.com. Archived from the original on 16 November 2023. Retrieved 16 November 2023.
  14. ^ "FTC Final Order Prohibits Epic Marketplace From "History Sniffing"". JD Supra. Archived from the original on 16 November 2023. Retrieved 16 November 2023.
  15. ^ a b "FTC Settlement Puts an End to "History Sniffing" by Online Advertising Network Charged With Deceptively Gathering Data on Consumers". Federal Trade Commission. 5 December 2012. Archived from the original on 16 November 2023. Retrieved 16 November 2023.
  16. ^ Gross, Grant (5 December 2012). "US FTC bars advertising firm from sniffing browser histories". Computerworld. Archived from the original on 16 November 2023. Retrieved 16 November 2023.
  17. ^ Sanchez-Rola, Iskander; Balzarotti, Davide; Santos, Igor (22 December 2020). "Cookies from the Past: Timing Server-side Request Processing Code for History Sniffing". Digital Threats: Research and Practice. 1 (4): 24:1–24:24. doi:10.1145/3419473.
  18. ^ Kikuchi, Hiroaki; Sasa, Kota; Shimizu, Yuta (2016). "Interactive History Sniffing Attack with Amida Lottery". 2016 10th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS). IEEE. pp. 599–602. doi:10.1109/IMIS.2016.109. ISBN 978-1-5090-0984-8. S2CID 32216851. Archived from the original on 6 June 2018. Retrieved 30 October 2023.
  19. ^ Haskins, Caroline (2 November 2018). "Old School 'Sniffing' Attacks Can Still Reveal Your Browsing History". Vice. Retrieved 30 October 2023.
  20. ^ Smith, Michael; Disselkoen, Craig; Narayan, Shravan; Brown, Fraser; Stefan, Deian (2018). "Browser history {re:visited}". Offensive Technologies. Usenix Workshop. 12th 2018. (Woot'18). S2CID 51939166.
  21. ^ Sanchez-Rola, Iskander; Balzarotti, Davide; Santos, Igor (22 December 2020). "Cookies from the Past: Timing Server-side Request Processing Code for History Sniffing". Digital Threats: Research and Practice. 1 (4): 24:1–24:24. doi:10.1145/3419473. S2CID 229716038.
  22. ^ Ali, Mir Masood; Chitale, Binoy; Ghasemisharif, Mohammad; Kanich, Chris; Nikiforakis, Nick; Polakis, Jason (2023). "Navigating Murky Waters: Automated Browser Feature Testing for Uncovering Tracking Vectors (ABTUTV)". Proceedings 2023 Network and Distributed System Security Symposium. Reston, VA: Internet Society. doi:10.14722/ndss.2023.24072. ISBN 978-1-891562-83-9. S2CID 257502501.