By Hanlin Li
Northwestern University Evanston, IL
My research studies the economic inequity in the data economy, in which those who contribute data are not those who benefit from data financially. My prior work highlighted that dissatisfaction with data monetization is technology users’ top motivation for stopping or changing use of a technology. Building upon this study and other scholarly and legislative efforts, my dissertation work will focus on addressing the lack of transparency about data’s monetary value to both technology users and policymakers. I will assess the economic value of users’ data and develop tools to make this value transparent, with the goal of supporting collective action and policymaking. Additionally, I expect my work to provide design recommendations for technologies that profit from user data to communicate data’s monetary value openly. Overall, I see my work as a step toward a more equitable data economy.
User-generated content; data monetization; crowdwork; online experiment;
Modern technology companies such as Google and Facebook rely on data generated by their users to bolster profits. These organizations provide useful and free/low-cost services and in turn, collect a wide range of data from their users, such as user-generated content, behavioral logs, and personal information. In collecting and modeling these data, they generate tremendous revenue through targeted advertising, providing analytical and machine learning services, and reselling content.
Despite this mutually beneficial relationship, scholars and lawmakers have warned about the economic inequity between companies and users— those who benefit from data financially are not those who produce data. Arrieta-Ibarra and colleagues argued that data should be treated as labor so that data workers share the financial gains of the data economy . In 2018, the European Union released a digital tax proposal in an effort to broadly share the value generated by technology users but reaped by businesses . The public has also become aware of this issue; my prior work with a representative sample of U.S. adult internet users showed that dissatisfaction with data monetization practices was people’s top motivation for stopping or changing their use of a technology .
However, currently, there is a major impediment to mitigating this economic inequity for technology users and policymakers: the lack of transparency about data’s economic value. Technology users transact with companies under information asymmetry about data’s worth and, therefore, do not have means to negotiate a price for data collectively. For policymakers, the inability to assess data in monetary terms prevents them from measuring and shifting the economic inequity between technology companies and users.
Building upon my prior work on transparency’s role in collective action, my dissertation work aims to equip technology users and policymakers with a better understanding of data’s worth with the goal of supporting collective negotiation for a more equitable data economy. Specifically, I will provide a measure of monetary value for key types of data generated by technology users and develop a tool to make this value transparent. Additionally, I expect my research to provide design recommendations for technologies that profit from user data to communicate data’s value openly with their users.
My prior work examined technology users’ awareness of the economic inequity in the data economy. I surveyed a representative sample of U.S. adult internet users in 2019 and found dissatisfaction with data monetization was the top motivation for people to stop or change use of a technology . Some participants elaborated on this motivation: “they sell my personal information exploiting ME MAKING PROFIT OFF OF ME, without giving me any financial share of their profit pirating” and “They are trying to control and profit from everything we do in life. They don't respect privacy they just want $$.” It is worth noting that data monetization is not limited to personal information: ample research on user-generated data shows that public data generate massive revenue for technology companies as well [4, 12]
Although data portability—an initiative supported by legislators and advocates [2, 11]—may allow technology users to leverage market competition to mitigate this inequity, my prior work highlighted the complexity of the approach. My comparative analysis of Google Maps and Yelp shows that the two platforms often provide different ratings for the same restaurant . As such, direct data transfer across technologies may run into incompatibility issues. More work needs to be done to examine how feasible data portability is in mitigating the economic inequity between technology companies and users.
Another approach that has been widely studied and leveraged in other domains and may allow technology users to make a material impact on the data economy is collective action. In particular, I studied online consumer boycotts and developed a browser extension that helps boycott participants avoid targeted websites collectively. In this work, transparency about participants’ collective progress helped participants better recognize their consumer power. As such, in my dissertation work, I plan to adopt a similar approach, i.e. increasing transparency about data’s worth to assist technology users in understanding their power in the data economy .
Building on my own work and the work of economists, computer scientists, and policymakers mentioned above (e.g. [3, 9]), my research agenda aims to increase transparency about data’s value in order to ensure that the data economy benefits technology users broadly. Specifically, I will use online experiments and large-scale analyses to improve our ability to assess data’s monetary value. Additionally, I plan to develop a tool to communicate this value with technology users and use it as a design probe  to understand how transparency influences their altitude and behavior.
This early-stage study introduces a data valuation approach by understanding how much data would cost if contributed by crowdworkers. In particular, image tags, ratings, and moderation decisions are the types of data that both crowdworkers on Amazon Mechanical Turk and people who use image platforms (e.g. Flickr and Instagram), rating systems (e.g. Google Maps), and online forums (e.g. Reddit) produce on a regular basis; however, while crowdworkers are compensated for their data, users currently do not receive any financial payment. As such, I ask how much a technology user’s unpaid image tags, ratings, and moderation decisions would be worth if they were produced by crowdworkers.
Although extensive research has been conducted to examine crowdwork pricing in general, the specific costs for image tags, ratings, and moderation decisions remain unknown and rarely reported in research. A thorough review of the crowdwork literature revealed very little data about the unit cost of these types of data. As such, I have designed an online experiment that measures the compensating differential  for data production, which is the extra amount of money needed to compensate workers to complete the same amount of work when they are required to generate these types of data as they would when not required.
The outcome of this project will be an empirical monetary assessment for image tags, reviews, and moderation decisions. This will assist policymakers in understanding how much technology users have subsidized image platforms, rating systems, and online forums by contributing these data. More broadly, I expect this data valuation approach provides a new way of thinking for policymaking in this space—instead of examining how much individual users are worth for technology companies, which is often opaque to the public, we may ask, how much do technology users subsidize companies by contributing data?
In addition to supporting public policymaking, my dissertation work also aims to increase transparency about data’s value for technology users and understand its implications on their attitude and behavior. As such, following up on Study 1, I will develop a transparency tool that informs technology users of the value of their image tags, ratings, and moderation decisions. Given the different views of data ownership (private property  vs. public good ), the tool will experiment with different transparency interventions, i.e. the value of an individual’s data (e.g. your ratings are worth $0.20) or the value of a group’s data (e.g. ratings from your city/state are worth $86.90). I will then recruit participants who regularly produce such data to use the extension and provide them with the tool as well as pre- and post- study questionnaires.
I will use established measures in social computing literature  to capture how transparency influences people’s attitude towards the economic inequity in the data economy. I am actively investigating what variables are helpful to capture people’s attitude change after using the tool and currently have three directions:
Accountability: to what extent technology companies should be accountable for sharing the revenue from user data.
Fairness: what percentage of the revenue would be a fair share for users.
Awareness: participants’ awareness of their data’s value.
I will also collect open-ended responses from participants to understand whether they may change their behaviors in the future and explore what other information they desire to improve transparency about data’s monetary value. I expect this work to provide specific design recommendations for technologies that profit from user data, e.g. which view of data ownership is better suited to raise technology users’ awareness of data value.
Overall, I see my work on increasing transparency about data’s value contributing to two areas. First, my approach to assessing data value provides a way of thinking for researchers and policymakers to understand the magnitude of the economic inequity between technology companies. Second, I expect my user study with technology users to provide researchers and technology designers recommendations on how to communicate data’s monetary value in technologies that profit from user data.
Arrieta-Ibarra, I. et al. 2018. Should We Treat Data as Labor? Moving Beyond “Free.” AEA Papers and Proceedings. 108, (May 2018), 38–42. DOI:https://doi.org/10.1257/pandp.20181003.
Doctorow, C. 2019. Regulating Big Tech makes them stronger, so they need competition instead. The Economist.
Fair Taxation of the Digital Economy: 2017. https://ec.europa.eu/taxation_customs/business/company-tax/fair-taxation- digital-economy_en. Accessed: 2020-06-30.
Heald, P. et al. 2015. The Valuation of Unprotected Works: A Case Study of Public Domain Images on Wikipedia. Harvard Journal of Law & Technology. 29, (2016 2015), 1.
Kollock, P. The Economies of Online Cooperation: Gifts and Public Goods in Cyberspace. 19.
Li, H. et al. 2019. How Do People Change Their Technology Use in Protest?: Understanding. Proceedings of the ACM on Human-Computer Interaction. 3, CSCW (Nov. 2019), 1–22. DOI:https://doi.org/10.1145/3359189.
Li, H. et al. 2018. Out of Site: Empowering a New Approach to Online Boycotts. Proceedings of the 2018 Computer- Supported Cooperative Work and Social Computing (CSCW’2018 / PACM). (2018).
Li, H. and Hecht, B. 2020. 3 Stars on Yelp, 4 Stars on Google Maps: A Cross-Platform Examination of Restauration Ratings. Proceedings of the ACM on Human-Computer Interaction. 2, CSCW (2020).
Posner, E.A. and Weyl, E.G. 2018. Radical Markets: Uprooting Capitalism and Democracy for a Just Society. Princeton University Press.
Rader, E. et al. 2018. Explanations as Mechanisms for Supporting Algorithmic Transparency. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada, Apr. 2018), 1–13.
Rossi, G. and Slaiman, C. 2019. Interoperability = Privacy + Competition. Public Knowledge. (Oct. 2019).
Vincent, N. et al. 2018. Examining Wikipedia With a Broader Lens: Quantifying the Value of Wikipedia’s Relationships with Other Large-Scale Online Communities. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (2018), 566.
Wallace, J. et al. 2013. Making design probes work. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Paris, France, Apr. 2013), 3441–3450.
Yin, M. et al. 2018. Running Out of Time: The Impact and Value of Flexibility in On-Demand Crowdwork. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada, Apr. 2018), 1–11.