Neha Khanna and Matthew J. Schneider
Guest columnists
Imagine you and your significant other are searching online for an engagement ring. You’ve been searching for a few days when suddenly you get a message from your bank saying your transaction of $20,000 was successful, which you don’t remember ever initiating. Congratulations, you’ve just become a victim of a financial attack, where some unscrupulous person was able to gauge that you’re loaded based on your browsing history and was able to snoop in on your banking credentials, all by hacking into your Wi-Fi network.

The implications of vulnerability of personal data can be even more consequential. You may be aware of organizations slowly nudging consumers towards making a purchase through targeted advertising, but things start going a little south when the same tools are used to alter our political preferences, case in point being Cambridge Analytica.
What’s concerning is that all this happens without us being consciously aware of it. Another dark side of the misuse of personal data we’re concerned about is what Eli Pariser called “The Filter Bubble.”
“All of us happen to live in our own filter bubble where we are exposed to only the kind of information that is pleasant to us, which in turn reinforces our set of beliefs,” he said. “Because these filters are invisible to us, we would never know what is hidden from us. Our past interests will determine what we are exposed to in the future, leaving less room for the unexpected encounters that spark creativity, innovation and the democratic exchange of ideas.”

In the wake of the numerous data leaks and instances of personal data misuse happening, GDPR rules were established.
We live in a time where we are leaving our digital fingerprints everywhere. Most of us were oblivious to what those fingerprints were being used for. In the wake of the concerns that followed, rules such as EU’s GDPR (General Data Protection Regulation) and California’s CCPA (California Consumer Privacy Act) were established, whose central tenet is that the end users should have complete knowledge about what data the company has, what is it being used for, the right to see that data and the right to have that data deleted.
Although the latter’s rules are not as exhaustive as the former’s, the core essence remain similar.
These rules apply to any organization that deals with the data of any resident of the EU or the state of California. To make things easier, big organizations have chosen to follow GDPR’s rules to keep things simple and avoid the additional cost of maintaining different rules for different geographies (it is also good for PR).
All the measures taken to preserve an individual’s data lead us to associate a higher value for it than we otherwise might. This makes the data seem important, and it actually is.
Data scientists have just begun to scratch the surface of what can be done with data, which leads credence to the phrase “data is the new oil.” Data is of immense use to researchers in the domains of healthcare, sociology, demographics, etc. Data has a lot of utility, but with the privacy protection measures in place, we seem to have entered a stalemate.
The very essence of humankind has been to find solutions to problems, and so, the next logical step is to find a way to make use of the data while still having the individual’s privacy intact.
The main challenge is that our data processing techniques have evolved so much that it may not be necessary for me to know your name in order to identify who you are based only on a few data points.
To ensure complete user privacy, then we need to change the actual data. Traditionally, there is a trade-off between the utility and the accuracy of the data. If the data is changed too much, it may no longer be useful.
There are a few ways this can be tackled. The first one is encryption. The second one is a rule-based altering of data, such as using fictitious names. Neither of these techniques change the data. A third way to change the data is by adding certain “noise” into it in a way that the output data has very similar characteristics to the original data, but the data can’t be used to identify any particular user. That is the focus of our research.
Organizations need to find the right balance of privacy verses insight in order to responsibly and effectively make the most of the data available to them, and this is what organizations need to focus on right now.
Neha Khanna is a data scientist intern at Wilmington-based CompassRed and a graduate candidate in business analytics at the LeBow School of Business at Drexel University.
Matthew J. Schneider is an expert on data protection for business-use cases and is a tenure-track professor of statistics and business analytics in the LeBow College of Business at Drexel University.