The Dough Is in the Data

But with so much data generated and accessed today, lawyers must take care to use it correctly

IN THE DIGITAL world, software and hardware are still very important, but data is fast taking centre stage. Partly because there is now so much of it, but also because it can be analyzed (and therefore commercialized) very profitably, you will increasingly have to consider the legal, contractual and regulatory implications of your data operations. In light of that, what follows is a primer on these important 21st-century data-related issues.

The Digital Swarm

First, just a few data points on the ubiquity of data and the sheer, immense volume of it.

Consider the stream of data that a user creates when they are online. Every click they make, every site they visit, let alone every purchase they make (and every document they view, music file they listen to, movie file they watch, etc.) leaves an indelible data print. An internet user’s click stream is vivid and online technologies can easily capture it.

But all of this still only captures the data users know about. Consider also the millions of data items that an individual didn’t even know they were creating, courtesy of the Internet of Everything (IoE), a relatively new phenomenon fueled by the data sensors lodged in virtually everything from homes and offices to vehicles.

These data generators are overlaid with the surveillance that is recording an individual’s picture and location hundreds of times a day (and sometimes thousands of times, depending where they live and work), and mixed with the daily harvest of data from a person’s social media activity.

A further source of data generation is the trend by makers of products to transform their offerings into services (or at least to add services to what previously was a stand-alone product). For example, the company that used to sell just fertilizer to a farmer now sells a service in which they will scan the farmer’s fields from a satellite and advise on the precise mix of fertilizer to put down and in which segment of which field, with the result that crop yields are greatly increased, and the farmer’s total cost decreases because he can use less fertilizer. The result for the fertilizer supplier is that it shifts from selling a boring, low-margin commodity product to offering a critical, high-margin service and one that is very “sticky” in terms of generating customer loyalty and reliance.

The added benefit to the fertilizer company is that it also now has a very rich data stream coming from this and thousands of other farmers. The company can now start to analyze that rich data, and provide to farmers even more services based on the data analytics that the company performs on the entire data set, comprising billions of data points each growing season.

This trend for manufacturers of products to shift to become providers of services is occurring across a broad range of industries. Consider the automobile. What was once an exclusively mechanical, physical device is quickly being fitted out with a range of tech components that allow the car company to offer a broad suite of driving-related (and some interesting non-driving-related) services. And then the vehicle itself is becoming for many (especially single millennials living in cities) a mobility service. This explains why some car companies are investing in ridesharing services. The result of both these developments is the creation, storage and usage of reams of data points that were unheard of even five years ago.

In short, the various means of harvesting data in today’s digitally enabled world produces an awful lot of information. Every day. Without exception.

In effect, it should come as no surprise that, taken collectively, it is estimated that the world generates about 2.5 quintillion bytes of data each day (that’s 2,500,000,000,000,000,000). And keep in mind, that is each day. But this should not come as a surprise, given that each day 500 million tweets are sent; 4.3 billion Facebook messages are posted; and 6 billion Google searches are performed.

Commercializing Your Data Trove

So, your R&D team, together with marketing and sales, have come to you with the news that they are ready to exploit your organization’s data repositories. Here’s what you need to do.

First, make a thorough inventory of the data so that you understand where it all comes from. Does your organization generate it all, or is some of it sourced from third parties? And very importantly, does any of the data comprise “personal information.”

This is a critical question, because personal information (PI) in Canada, Europe and other places is subject to privacy regulations. Even if the data doesn’t contain PI, you still need to perform due diligence on the data source. For instance, in certain countries geological seismic data is considered a “state secret,” and removing it from the country is a criminal offence (this actually came up in a global IT outsourcing deal, and my firm had to create a separate server site in this particular Southeast Asian country just to accommodate this data sovereignty rule).

Second, once you have done sufficient due diligence on all the data you are harvesting for your big data project/service, you need to find out whether there are any strings attached to this data. For example, if it’s PI, you must review the privacy agreement/policy under which you collected the PI data. What restrictions did you agree to when the data subjects were consenting to give you access to their data? These are critical questions, obviously.

Even if the data is not PI, there may still be some complexity around your ability to use the data. Are you collecting the raw data under a services arrangement, and then you want to aggregate and anonymize the data so you can sell insights gleaned from the aggregated data sets? This is done with greater frequency now that we are well into the era of big data, but some caution must be exercised relative to the following questions: how many data sets do you need before you can say the data is suitably anonymized? How narrowly can you segment the data, before it loses its anonymous quality? What are the best practices in your industry for these sorts of issues?

You’re in the Data Business Now

You have managed to clear the legal rights in the big data you wish to exploit, so you are ready to consider a host of issues at the technical product-delivery level. You have to make sure that you have state-of-the-art physical and logical (i.e., computer-based) security for the systems storing and delivering your big data. The more valuable your service, the more it will become a target of hackers, extortionists, unscrupulous data disseminators, illegal data trolls, spammers and a range of online criminals (welcome to the unfortunate, dark side of the internet). Therefore, you must take reasonable measures to protect it, particularly if it contains personal information.

You will also need a commercial agreement with your customers who use the big data service. Those customers need to agree to a number of important provisions: first, there will invariably be limits on what the customer can do with the big data. It would be very customary, for example, to not allow the client to share the big data, or even any insights gained from it, with third parties, save and except as you narrowly permit it in your contract. You have to be very vigilant to protect your data asset, and the principal way to do that is through implementing reasonable-use restrictions on the data in the customer contract.

If you are marketing any form of personal information, and assuming you have received permission from the data subjects to share the PI with a specific third party, then you need to be very mindful of the privacy laws that apply to such PI.

In Canada, that will largely be federal privacy law, but some of the provincial statutes may be relevant as well. And keep in mind that other jurisdictions may approach privacy regulation somewhat differently than Canada (and, in the case of Europe, even more stringently).

In this regard, it is important to note that starting in May 2018 less than a year from now a new privacy law in Europe will come into effect. If your organization or one of your affiliates is active in Europe and you collect personal information, you have less than a year to prepare for the new legal regime, which will bring some new compliance challenges. One will be the right of consumers to require you to transfer their PI from your systems to another provider of services. This “data portability” right will require some system and software changes, so ideally your efforts in that regard are already well under way.

In a similar vein, the new European data protection law will provide for a “right of erasure” (sometimes known as the “right to be forgotten”). And finally, there are new rules regarding profiling and automated decisions, both of which, again, may well require some modifications in your IT systems. In short, privacy law compliance is a fairly complicated matter nowadays, given that Europe and the United States do not see eye to eye on this subject, and Canada is somewhere in between them.

In conclusion, there are certainly economic and other benefits to be derived from exploiting big data, but you have to manage this new asset and value generator with great care.

George Takach is a senior partner at McCarthy Tétrault LLP and the author of Computer Law.