A sociophysicist explains how to save the internet from clutter, lies and garbage
Once upon a time, the internet was seen a wondrous fount of knowledge and information, empowering users and spreading democracy. This utopian view resonated widely with early adopters in the 1990s, after the end of the Cold War, but it resonated much more broadly around the world in 2011, during the Arab Spring. There were always dark shadows noted by observers, as in Gene Rochlin's 1997 book, "Trapped in the Net," but collectively we've been blindsided and bewildered by how different the online experience has become — how much of a marketplace for rumor, fear, conspiracy theories and polarized worldviews, all watched over by purportedly neutral platform manipulations bringing us exactly what we're told we want.
Could it be possible to recover that original promise? In a way, that promise was always naïve, as illuminated by Paulina Borsook's 2000 book, "Cyberselfish: A Critical Romp Through the Terribly Libertarian Culture of High-Tech." As Borsook notes, it took more than half a century of government investment to make the internet's commercial incarnation possible, contrary to Silicon Valley's self-serving mythos. But democratic theory, history, philosophy and psychology is far richer than libertarians suppose, and there is a much more sober, realistic version of that promise — one that, for example, scientists collaborating worldwide have experienced for more than a generation now.
So could something like that become possible for all of us? A new paper in the journal Nature Human Behaviour strongly suggests that it could, and lays out an initial framework for what it might be like, reshaping things from the bottom up: "How behavioural sciences can promote truth, autonomy and democratic discourse online." As the abstract explains, the problem can be simply put:
The current online ecosystem has been designed predominantly to capture user attention rather than to promote deliberate cognition and autonomous choice; information overload, finely tuned personalization and distorted social cues, in turn, pave the way for manipulation and the spread of false information. How can transparency and autonomy be promoted instead, thus fostering the positive potential of the web?
Framed like that, this is a scientific problem, susceptible to scientific solutions. Of course, knowing that solutions exist and implementing are two different things: Consider the climate crisis. For a better understanding of how such a future could come to be, and what some specific steps would look like, Salon reached out to lead author Philipp Lorenz-Spreen, a postdoctoral fellow at the Max Planck Institute for Human Development in Berlin. This interview has been edited, as is customary, for clarity and length.
What struck me first on seeing your article was that it's not reactive, but rather affirms a positive, arguing that behavioral science "can promote truth, autonomy and democratic discourse online." So before talking about the paper itself, I'd like to ask about what informs your positive approach.
I think a bit of forward-looking is probably necessary. We need to be careful we don't get into a whack-a-mole game. So, I was searching for more sustainable solutions, and you get end up getting less reactive and trying to find solutions that come from the bottom up, to think about how to change the system itself.
Were there any previous examples in internet history that were resonant for you?
One of the rare examples of online environments that promote truth — one of the shining examples — is Wikipedia. This was an inspiration, to think about how different this website looks, compared to Amazon, how it's designed and how things are arranged.
Your paper draws attention to the asymmetry between what platforms know about users and vice versa. How do you characterize that asymmetry, and what negative consequences result that you propose to address?
The asymmetry probably comes out of dependencies we live with in this information-rich world. It gives the human brain too much to process, so we must rely on some kind of curation. That's what the platforms do for us. That's probably why they are successful. Google and Facebook are skyrocketing, successful, because we really, really depend on them to navigate through this information world.
Additionally, "intransparency" contributes to it. We get recommendations that we cannot really follow where they actually come from, what factors contribute to them. So we are getting increasingly dependent on intransparent platforms, where we have to rely on trust. And the remedy we propose is increased transparency, and giving back a bit of autonomy to the user.
To improve the online environment, you identify available but untapped cues and two kinds of behavioral interventions — "nudging" and "boosting" — that employ these cues. The cues differ in different contexts, so let's start with the interventions. Folks may have heard of them, but just to be clear, what do each of them do?
They are both classic behavioral interventions, so they're more umbrella terms. They both interact with the decision process. "Nudges" do that mainly or exclusively through the choice architecture. So whenever parts of the choice architecture are emphasized to steer behavior or influence behavior, that's nudging. In our case, we just try to draw the attention of the reader to some piece of information that might be good for them — that will be the broad class of nudging.
Boosting goes a bit further into the educative direction. So it's not education, but it's pieces of the environment or external tools that help the decision-maker, within the process of making the decision, to acquire some competencies. Often in this context, we refer to digital media literacy as a competence that a boost would help the user to acquire or develop. That can be done through the environment, by giving useful hints that you can remember, and then you use them even if the hints are missing. So you can actually incorporate that boost and take it along with you. Second can be external tools, like rules of thumb that you can actually use to evaluate sources of information.
One aspect of boosting you mention is "self-nudging." Could you explain what that is?
Self-nudging is when you yourself change your environment in a way that nudges you to a certain behavior that you want for yourself. An alarm clock is a self-nudge, a very elementary one. If you want not to browse certain apps so often, you might delete them from your phone. Or if we want to eat less sweets, we might not buy them. So we ourselves change our choice architecture so we make decisions that we want to make. That's a boost, in a sense, because self-nudging itself is a competence. If you're good at designing your own choice architecture, your own environment, to the choice you want to achieve, that's what we call a competence. That's why we would classify it as a boost.
You discuss what you call "endogenous" or "exogenous" cues as being important in interacting with the internet. Give me some examples.
Endogenous cues are cues that describe the content itself. So when we talk about an online article, that would be the characters that appear in the article, the relationships that may be the part of the story, stuff like that we would call endogenous cues. And of course, it can be helpful to evaluate the truthfulness of the story. If the characters do not exist, or the story doesn't have a logic it's evidence it might not be true.
Exogenous cues, on the other side, are context-dependent. So they do not regard the content of the article itself, but for example, the source of the article, who has written the article, which other outlets or sources the article cites. Where did it come from? How did it reach you? Who else has recommended it for you? Things like that. So the context around the story itself is an exogenous cue.
In the paper, you focus your attention on exogenous cues. Why is that?
We just experience how difficult judgments are about the content itself. There are extreme cases where you should make a judgment — maybe when it's about violence — but you always run into the danger of either censorship or the accusation of censorship when you're making judgments about content itself. So when you make a judgment about context, I think this is more robust against accusations or dangers of censorship. If you only provide more context, that could allow people to make the judgment themselves, without you or a third-party fact-checker or a platform or anyone else to make that judgment for you. That's going in the empowerment direction — we want to let people make these choice decisions themselves. That's why we focus on exogenous cues.
You examine three different contexts: online articles, algorithmic curation and social media. I'd like to go through each of them with you, to understand the challenges you identify and how to propose to meet them. Let's start with online articles. What are the challenges they present? And what cues are available?
The main challenge we think we are facing here is the overabundance of articles, of content being produced. There's a multitude of sources, a high number of articles reaching us and also the consumption patterns have changed. People do not consume a whole newspaper as one piece, and they do not subscribe to one newspaper for a long time. Some do, but we much more consume them on an article basis. So, we are moving from long-term decisions to rather short-term decisions, and we have this huge information overload at the same time. So that makes it very difficult to make decisions about the trustworthiness or reputation of the source. So that's the challenge with news articles online.
And what kinds of cues are available?
We can check if the publisher has an "About" section on the website, or if it's a known publisher one can check for an external validation by making a Google search and finding reports of the same story. But one can also check if the article cites external evidence, if there are other articles of this article's type, if they are clearly marked and if they come from other publishers and not just themselves. These would be some exogenous cues that one can check on — which, by the way, are not always readily accessible, which adds to the challenge of information overload.
So how can nudging and boosting be used to make things more transparent?
Nudging can be used to draw attention to the external cues, so the sources that an article is citing could be listed on top of the article with a very clear color, or something. Or there can be a warning message if there are no external sources cited. There one can actually be inspired by Wikipedia, which often has warning messages with articles if there are not enough sources or if quality is lacking in the sources. That would be a very simple nudge. You can go a bit further when you think about sharing an online article, and then a warning message could require you to click a second time to confirm that you want to share the article even though it doesn't cite any external sources. So the main process of the nudges in this context would be to draw your attention to the external cues.
A boost would be more like a tool: In the article show a decision tree — that's a tool that has been used to improve medical decision-making — by having, like, three or four questions that one can go through systematically to evaluate the trustworthiness of an article, for example. It can become like a cognitive tool kit. A person can remember this cognitive process and it will help them anytime they encounter an article, for example.
The second context you discuss is "algorithmic curation," the often opaque process by which things are delivered to us on the internet. What are the challenges that presents and what cues are available?
We talked about transparency here. Usually the problem is that a lot of things end up on my newsfeed and it's often very unclear for what reason, and these algorithms that source the news feed, for example, are hidden. They are not known to the user. It becomes extremely difficult to understand why a specific article is ending up in the newsfeed. We do not know — this gets back to asymmetry — what they know about many other people who are consuming similar content, for example. That makes it extremely difficult for us to understand this curation.
The biggest challenge in the whole paper is that it's very unclear what cues could be available. But if you think about a less sophisticated algorithm — not a machine-learning AI algorithm, but rather a simple, rule-based algorithm — that could help provide cues. Because if it's just a linear combination of a few factors that source my content — which, for example, could be recency, and how many of my friends have engaged with this article, and a preference that I have, for sports or something — that could be displayed. So there would be cues that in principle a platform with a rule-based algorithm could show and help people understand much better why some things came up and others not.
So how can nudging be used here?
When we talk about nudging in this situation, we're very close to just information providing. So the factors that led to the decision could be displayed more clearly, or displayed at all — that would already help. Another nudge that would help would be a clear separation, even a visual separation, between different types of content. Currently in the newsfeed, everything is very blurred — so a post from my friends looks very similar to a commercial, or something from a politician or political party or a company. So these different entities and players could be much more clearly differentiated. That would change the choice architecture, to help us understand, for example, which is a paid ad and which is not. This is not in the best interest of the platform, because making an ad seemingly appear within the posts of my friends makes it much more personal than if I realize that's an ad. But I think we should make that newsfeed more "overviewable."
And how can boosting be used?
If the newsfeed is more transparent, it could also be customizable. Of course, you can customize the feed in a way by following certain people and outlets, but you cannot determine the order or frequency. So that would be possible, to actually change your preferences and say, "I want to see more sports and less politics," or "More from my friends and less from news." That could allow the user to do self-nudging, so if I want to be more informed I can increase the amount of news I want to get. That gives the user back a bit of agency, at least.
So moving on, the third category is social media. What are the challenges they present? And what cues are available?
The challenge here is that we certainly have access to a huge number of other people, and we can communicate with them in different forms, but this kind of communication is like nothing we are used to. It's difficult to have a feeling for what the numbers mean, social metrics — likes for example, or uploads, downloads or shares. We only see one number, the number of likes, for example. But there could be much more information available that could help us access the real consensus. We have access to such a large group of people, we can feel that even a very weird conspiracy theory, for example, is actually believed by many others. Two hundred seems like a very big number of other people believing it. So, you might actually think you are right in the middle of the conversation when actually you are on the outskirts of the discussion. That would be the challenge.
So how can nudges and boosts be used here?
A nudge in this context would be quite simple. It would be having just additional social information — these are sometimes called social nudges. If we know what others are doing, this is influencing our behavior quite heavily. One example would be to show what the average reading time of other people were on this article, on a newsfeed in social media. That information would give us a hint that something is clickbait, or that most people just stayed on this article for a few seconds or just a minute, and that would help us make a decision to maybe not click on those things. So that's one thing. But also we can provide more information what other people on this network are doing not only around my direct neighborhood, but rather on another side of the network so I can see other opinions, so we can have a feeling what the discussion is actually looks like.
Boosting has a more educated character, it would be more like a tool. One thing that would really help us is adding to social media posts a hint of trustworthiness. So if sharing is very narrow, often it's a niche topic, not so trustworthy, and if it's broadly distributed, and many people at each point share it, it might be more true. So that's something one could teach people to learn, basically to understand the social spreading pattern. Currently we don't have much access to this information, but if it were provided we could actually learn such patterns, and see if someone was, for example, replicating posts several times, artificially amplifying the message, or if someone was picking up an old story from someone very far away in the network and trying to push that. So that would be a skill that people could acquire with such a tool. They could see the social spreading pattern whenever they encounter something on social media, so they can actually learn what the patterns mean and get a feeling for social media and social dynamics.
In your conclusion, you write: "In our view, the future task for scientists is to design interventions that meet at least three selection criteria." What are they and why?
First, of course, is the need to be transparent, because we said the core of the problem is he intransparency of the platform. So if scientists or regulators make new regulations, they have to be fully transparent for people to become trustworthy, I think that's a second point. I was talking about the danger of censorship early on, and so that's what we try to avoid here with transparency and trustworthiness, with cues providing context that cannot be confused with censorship. Because you never know who is in the end implementing such tools and if they are by definition not able to do censorship, there might not be a danger of that. And the last criteria, specifically, is that it can't be gamed. What we mean here is that social media metrics that we have now are often gamed. So the "likes" are a very simple metric that can be gamed by increasing social engagement, by either paying for it or getting other people to click the "like" button, so it appears to be popular, but actually it isn't.
If we come up now with new cues that should help people to assess the quality, there need to be protections. So, for example, if you have a very simple metric like the numbers of references that an article cites, that's something that's quite easy to game by just typing them into the article or something like that. So it has to go a bit deeper. It also has to show which articles are actually cited, for example, which would make it a bit more difficult to game. But that's something you really have to think about and maybe even run experiments and check if it works and if anyone in the internet comes up with a solution to get around it — that's something that happens a lot. That's also in the conclusion: all these things need to be independently tested.
Yes, you say it's important to examine a wide spectrum of interventions. This seems like an invitation to furthering empirical process, correct?
Definitely. The whole paper is a call for more research in this direction, more solution-oriented research, how to improve that environment. There's a lot of empirical work to be done. These are just ideas and suggestions. So yes, it's definitely a call for much more empirical work, and also independent from the platforms themselves.
Your focus is on what might be called normal users, not malicious ones. In fact, you argue that "it is not necessary that all or even the majority of users engage with nudging or boosting interventions." Why are you confident in taking this approach?
We can never assume that everyone would engage with all the tools or interventions we are proposing. It will always be just a fraction. These external interventions are not making any judgments, they are getting people to make decisions for themselves. They of course cannot catch malicious actors who have an agenda, so they can only help people who by accident fall prey to their tactics. So that's one reason. Another one is of course that we believe that we are very social beings. So, I was talking about Wikipedia as one of the examples of collective intelligence that actually has worked out. So that's what we try to use as a parallel — once our online environments are more promotions of quality, in a collective-intelligence way, it will be pushed upward and reach people that would not have engaged with it before.
Finally, what's most important question I didn't ask and what's the answer?
Well, one important question is "Who should do that?' 'Who would be interested?" I think the answer is that maybe the platforms have some interest in implementing such measures to improve quality, but it's always important to keep in mind that they are commercial entities and that they have a certain goal, which is to maximize user engagement so they can make their ad revenue.
So it's probably also the responsibility of a democratic society to participate in this process of designing our online world. We have now let the platforms do that for us for a very long time, for the 10 years or so that they been around. Now we think it's about time that, as a democratic society, we should come up with our own solutions. The option space for doing this is huge. There are a lot of options that we have not even touched or thought about yet.