When you collect data on people, you automatically inherit responsibilities for that data. In all developed countries (and in most of the world in general), there are data protection laws that govern this, and you should at least be aware of the basic principles. For example, in Europe, the fundamentals are that you put the end user first and that you ask for permission before collecting data. From that guiding baseline you can build and refine for your specific requirements-such as defining what constitutes permission (explicit verses implicit).
This chapter discusses the privacy debate-in general terms and specifically in relation to the European Union (EU). It includes an approach for evaluating your privacy level with best-practice tips on protecting your reputation in this area and building visitor trust. Beyond privacy considerations, it discusses how to structure and protect your data from accidental (or deliberate) pollution and deletion and plan for future changes in the fluid data protection landscape.
ALL ABOUT PRIVACY
Although I always try to base my decisions on good data, I am also a strong privacy advocate. Those two ideals are not mutually exclusive. Data enthusiasts are always on a quest to find more supporting data or better-quality data, but end-user privacy is a fundamental human right to be respected.1,2,3
A detailed view of differing privacy attitudes by geography can be found at the Economist,4 but in general privacy concerns vary by region as follows:
In the US the "average person" is concerned with government intrusion and a bureaucracy overhead when they think about online privacy worries.
In Europe (more specifically the EU), the concerns tend to focus on advertisers and companies harvesting information to sell to third parties or to bombard you with spam.
In Asia, in general the concern when discussing privacy is about a "surveillance state." This is probably connected to the fact that only 6 of 23 countries in this region are democracies.5
Tracking Versus Spying-The Snowden Impact
The Edward Snowden "affair" took the existing privacy debate to a much broader audience. To the general public, the tracking of visitors for commercial reasons appears no different from government security agencies surreptitiously trawling the Internet en masse. The resultant perception is that all tracking is scary and invasive and may one day be used against you. Although I advocate that people should be more aware of their online privacy footprint, the concern in the digital analytics industry post Snowden is that people will start to block commercial tracking techniques more affirmatively-for example, using ad-blocker software, opting out when requested, and blocking or deleting cookies.
Regardless of whether this happens, the reality is that if you use the Internet, some degree of tracking is inevitable. After all, your IP address has to be transmitted to the destination you connect to in order for it to work. Hence there are degrees of privacy (shades of gray).
If analysts, marketers, website owners, and all the other interested data parties take care and respect end-user privacy, there are good reasons for visitors to continue to share information-it benefits both the end user (with better-performing websites and sales offers) and website owners (with an increased return on investment because of their ability to make better, more targeted decisions). This of course is circular: as ROI improves, consumers get increasingly better deals. However, if a mass evasion of tracking by end users ensues, the digital world will, by necessity, take a very large step back-reverting to the bad old days of interruption marketing.
The Privacy Debate
Online privacy is a surprisingly complex subject. Apart from the cultural differences of what defines privacy in different parts of the world, there are also different levels of privacy invasion from a visitor's point of view, ranging from "Tolerable if it provides me a benefit in return" (such as providing access to high-value content on your website-support areas, white papers, discount pricing and so forth) to "Alarming. I wasn't expecting or even aware of that" and "That should not be allowed. It is a loss of my trust."
To compound matters, parts of your website may operate at different privacy levels-making it hard to establish an overall statement on your approach. Common examples include the embedding of content from elsewhere, such as social share buttons (Tweet, Like, Google+, and so forth); embedded video from YouTube, Vimeo, and other video-hosting websites; embedded third-party images, such as certifications; and third-party tools such as the DoubleClick ad network, visitor survey tools, live chat, and A/B testing tools.
Figure 7.1 illustrates the different levels of privacy with an analogy. Suppose you wanted to gather data on the impact of traffic on your community. The question you want to answer is "How busy is this road?"
From a tracking perspective, there are three privacy levels:
Green: Aggregate, non-personal data When nothing is collected that identifies individuals or tracks individual behavior, because all data is aggregated, you have a green flag to collect all the data you want. You can do a great deal of analysis without personal information. If you want to know how busy a road is, you have no reason to collect anything at the individual, or personal, level.
Yellow: No personal data, but individuals tracked Proceed with caution when you start tracking individuals, even though they're still anonymous. The data by definition becomes more invasive. This kind of tracking allows you to profile individuals and target them for advertising, even though you don't know who they are. Because of this, it can be argued this data is personal. There is also the fear for many users that given enough anonymous information, an individual can be identified by triangulation.
By building individual profiles, you make the data more valuable. But to many people that means either you spam them or you sell the information on for others to spam them.
Red: Personally identifiable information (PII) When you obtain personal details such as a person's name and address, you need the explicit permission of the user. Otherwise, stop. Imagine yourself in the driver's seat of one of the cars in Figure 7.1. Would you feel comfortable being tracked by surveillance cameras or tracking devices without giving your explicit permission? And if you gave your permission to be tracked only for this specific study, you would not expect to be tracked for all time, or to have the permission applied retroactively to data collected before you gave permission.
Moving from green to red, the tracking activities become more invasive (more personal), and the end-user privacy concerns grow. Legal obligations with respect to privacy increase, and more work is required to keep on top of best practices so that your visitors maintain trust in your brand.
Summary of Privacy Color-Coding
Using green, yellow, or red to indicate the level of data responsibility for your organization is applicable to all platforms, such as mobile apps, not just to the web.
Green designates methods with the lowest risk to your organization, those most trusted by your visitors. This level entails the least data responsibility and is the easiest to manage.
Yellow allows for visitor profiling, which can be interpreted as personal data because profiles can quickly get very specific. The risk to your organization and the management overhead are significant.
Red is reserved for methods that capture PII and therefore carry the highest risk to your organization. These are the methods that are least trusted by your visitors. They require the most effort with regard to data responsibility and management.
EU Privacy Law
In May 2011 an EU directive called the ePrivacy Directive came into force. In the UK this was implemented in the form of the Privacy and Electronic Communications Regulations (PECR). Distilled to its fundamentals, the law's aim is to protect the privacy of individuals online from people and organizations that collect personal information about them or use behavioral targeting techniques to profile a visitor across the web.
PECR is often referred to as the "EU cookie law" online and in the press because it states, unfortunately in a quite broad and ambiguous way, that cookies cannot be set without a visitor's consent. However, the law is technology agnostic. That is, it is based on privacy levels, not on whether an HTTP cookie, or other technology not yet invented, is used. The law applies in all 27 EU member countries.
As you can imagine, the implementation of this law caused some consternation in the digital industry, where cookies have become as fundamental as HTML and JavaScript. Google Analytics and many other embedded functions rely on cookies to work. In principle, PECR is a good and much-needed law, as behavioral targeting and the abuse of private information were becoming pervasive. Collecting benign, anonymous, aggregate data, such as that provided by Google Analytics, is not the target of this...