My discussion with Tom Dee, an author on the Yondr study on phones in schools
"If someone said they were disappointing, that would feel sensible to me"
For the piece on phones in schools that I posted this week, I interviewed two of the authors of the study where Yondr pouches were used to reduce phone use in schools. One of those authors was Thomas Dee, professor in the Stanford Graduate School of Education.
Dee was extremely careful and has a very moderate position on all of this (and in political matters related to education). He said the results were sobering, did not realize the hypothesized benefits, and could be characterized as disappointing, but he still wants to continue to work to see if phone bans can improve learning outcomes and had ideas for how that could happen. When I asked him if phones were the major reason why school performance has deteriorated, he said we need to learn more, but he offered two reasonable alternatives: the introduction and then abrupt roll back of test-based accountability through No Child Left Behind, and the Great Recession. He also said about the extent to which phones and social media explain everything, “I’m always concerned about overly facile or monocausal explanations for any large scale social phenomenon.”
Read the whole thing if you’re interested in this topic. There’s a lot of other great stuff in here. He started off by lamenting with me the fact that fads come along in education and then are stopped before the data says whether they work or not. Here’s our conversation:
Tom Dee: A concern of mine over my career is just some degree of frustration with the mercurial faddishness we have where there’s a kind of flavor of the day reform that attracts a lot of attention. Then there’s sometimes uneven implementation and then some results that are discouraging and then people simply move on instead of engaging in a kind of focused intentional cycle of inquiry, trying to refine a certain reform or understand what other contextual factors might need to be in place. I would argue this happened with No Child Left Behind, where we had test-based accountability brought to scale nationwide and we stayed with that 1.0 moment for a long time because even while people recognized the need to reform it in some ways, because it became politicized. And by the time the Obama administration came around, they were issuing waivers against the strictures under No Child Left Behind and it wasn’t until 2015 that that federal legislation got reauthorized and essentially devolved accountability to the states.
I’ve also worked on the controversial teacher performance assessment reforms in Washington DC introduced by Michelle Rhee. They were actually well-designed, well-implemented and pretty effective and other places tried to replicate that without the same political will, without the organizational acumen and produced disappointing results. It led to this narrative of “this doesn’t work” rather than “we need to know more about what’s going on in these settings to understand how to replicate and scale best practice.” That’s just a little bit of background that informs what I’m bringing to the table in interpreting these results.
But let me back up a little bit about this study. At a high level, I think three distinctive features make it unique. One, to our knowledge, it’s the first study of phone bans in the US that’s national in scope. Two, part of what made this research design appealing to us was our capacity to measure phone bans that were actually restricting the in-school use of phones. So as you may know, a lot of phone bans being adopted appear a little more cosmetic than substantive no-show bans where students still keep their phones, they’re just being asked to keep them out of sight. And we suspect that that they may have weaker effects and really be uneven in its implementation. So instead, what we did was we were able to partner with this company Yondr that’s a major seller of lockable pouches.
So schools using the lockable pouches give us a clean measure of really binding restrictions on the in-school use of phones. And we can validate that for lack of a better word, treatment manipulation in the GPS data and in survey data available to us. We see when and where schools take up these pouches, a substantial decline in the in-school use of phones. So then the third feature of the study is I think an unusually comprehensive set of outcomes: test scores, attendance, discipline measures, social-emotional survey questions of students around their subjective wellbeing, classroom attention, their experience of online bullying. Given everything that is hypothesized to be related to phone use, we thought it was important and meaningful to have that broad catchment.
I can add a fourth thing that is important to me. As you probably know in experimental social science, pre-registration has changed the field over the last 10 years. Ironically, for studies like this, exist for quasi-experimental studies that try to draw causal inference from real-world data, that open science practice pre-registration has not been a norm. I’ve been arguing publicly it should be because quasi-experimental researchers like our team have vastly more degrees of freedom than experimentalists do in terms of the measures we construct, the measures we construct, the methods we use, et cetera. And there’s empirical evidence that p-hacking is a problem, several forms of empirical evidence actually in quasi-experimental studies. So one of the other things our group did in this study and I’ve been doing my own work is pre-registering the outcome measures we used and our design before we undertook the analysis. So there’s a link to that in the study. I want to stress that’s not the standard of practice, but I personally believe it should be. (For more on preregistration, see here.)
So then in terms of the results, as you see, I’ve been in a lot of conversation with our team about how to describe these accurately and how to describe their implications in the right way. I’m going to choose my words carefully here. I would characterize the results overall as sobering. Obviously, these reforms are being widely adopted, school phone bans with some expectation that they’ll meaningfully change a variety of student outcomes. And we’re just not seeing that right away. In fact, if you look just at the core pre-registered outcomes, the only significant effects are the increase in student disciplinary incidents and an almost trivially small decrease in middle school test scores. Now, if you dig beneath that in a more exploratory posture and look at effects by period, you see that those overall effects do appear to be obscuring some interesting dynamics. The increase in discipline is concentrated in the first year and then returns to baseline within two more years.
And the null effect on student wellbeing is obscuring a really interesting dynamic where there’s a big decline in student wellbeing in the first year and by the third year it’s above what you would’ve predicted at baseline. And that’s the one I think advocates of phone bans are mostly picking up on that, oh, eventually students are reporting higher levels of wellbeing and the disciplinary effects appear to be transitory. But also there are a lot of null findings here as well. And so I characterize them as sobering. If someone said they were disappointing, that would feel sensible to me too. And this is where engaging in some of the translational commentary is a little difficult because obviously these results indicate we’re failing to realize the hypothesized benefits at least to this point.
But at the same time, given the preamble I gave you at the beginning about how we often don’t persist in trying to refine and understand policy innovations I also hope people won’t take this as nature’s final word on these policies and will persist in trying to understand what it might take for them to do better. So for example, I can throw a couple examples out here. One is it may be that phone bans don’t go far enough and that the problem is about digital devices more broadly and that the growing movement towards digital-free schools might come a little closer towards what some of the proponents are discussing. Alternatively, or even in concert, it may be that the bans are a necessary but not sufficient condition for the kinds of gains people hope to see.
For example, getting students off their phones might be a start, but once they’re off their phones, are they in high quality learning environments? Do they have effective teachers using high quality instructional materials and evidence-based pedagogy? There’s certainly some reason for concern around that. As an aside, you may have seen some of this in the current sturm und drang around the science of reading. One of the big issues in education policy right now is the growing realization that teachers have been trained to teach early reading in ways that don’t align with the best evidence on how students actually learn to read. In particular, they’ve been underemphasizing early phonics and phonemic awareness and things of that sort. So anyway, so that’s the position we’re in right now and it’s interesting to watch having spent this week watching how this was consumed and understood by different people and different journalistic outlets.
Holden Thorp: Great preamble. I guess my first question would be tell me the positive effects that you did see.
Tom Dee: So the main positive effect was that by the third year of a phone ban, it appears that students’ self-reported subjective wellbeing is higher.
Holden Thorp: And what about the teachers? Weren’t they also kind of happy with the whole thing?
Tom Dee: Yeah, we have teacher survey data. That’s a little bit more of a descriptive posture and wasn’t part of our pre-registration that we’re collecting more data and hope to do more with that. But there’s no doubt that teachers are incredibly supportive of these policies and they see good things in their wake.
Holden Thorp: I certainly like my classroom better when there aren’t any devices in it if we’re having a discussion. For sure. Yeah. All right. So for the quantitative measures where you didn’t see much effect, I know it was a pre-registered study, so you were going to publish whatever you got, but were you expecting to see a signal?
Tom Dee: There’s always this challenge of putting yourself back in that original position, but I think we were. And at some level, having been an education researcher for as long as I have, I feel maybe a little sheepish about having been hopeful because certainly I’m very experienced, have had a lot of experience, in reforms around which there was a lot of enthusiasm that when carefully examined, produced disappointing results, but we chose these measures intentionally. For example, it would make sense that if students got off their phones and were paying attention, I think it’s reasonable to think there might’ve been some meaningful learning gains in the short term, simply because the teachers now have their attention, but the disruption can confound that. If teachers are having a more difficult time, at least initially, maintaining classroom order, that’s going to complicate learning in the classroom.
Holden Thorp: And do you think that’s the main reason why you didn’t see a signal?
Tom Dee: Well, I have to be careful there. We can’t really say. I mean, a virtue of this kind of work is it’s got, I think, a strong causal warrant, but the big demerit of this type of quasi-experimental design is we can’t really tease apart mediators. So we can’t even be sure why discipline rose. The increase in discipline, for example, could be due to enforcing the phone ban and having to identify students who aren’t complying, but it could also be that in the absence of access to a phone, students who had been docile because they were sitting there scrolling TikTok or whatever, are now acting out in compliance with a ban, but acting out in a way they wouldn’t without the ban.
Holden Thorp: Yeah. So there’ve been a lot of studies on screen time and effects on internalizing symptoms and things like that, which also didn’t produce much signal. And do you see this as consistent with those or is there this difference somehow?
Tom Dee: I think to my mind, it appears consistent that these bans and the adoption of these lockable pouches drove down the in-school use of phones without producing clear academic gains.
Holden Thorp: And to the extent that there was a gain in wellbeing, it was very small and delayed.
Tom Dee: Oh, it was delayed. I’d have to go back to the paper about or take a second look at the effect size. I wouldn’t characterize it as really small, but I guess part of the reason I’m equivocating a little is, again, it’s so hard to know what mediation is behind these kind of reduced form results. It could be that phone bans in schools simply, for example, lead to some intertemporal substitution, students who are using them less in school, but maybe more intensively outside of schools in ways that attenuate any expected gains. So that would be consistent. This is why I was equivocating about, is this consistent with the meager evidence you described around social media, et cetera, because one could say, “Well, these results are consistent with a world where social media use is bad for kids, and it’s just that the bans shifted it intertemporally away from the school day to out of school time.” So that’s why I think we have to not get over our skis in interpreting these results.
Holden Thorp: Boy, do I agree with that. So Jon Haidt said in his book that the phone-based childhood is the major cause of the adolescent mental health crisis, and that’s a pretty bold statement. And I’m wondering if you agree with that and whether your study supports or refutes that or is silent on it.
Tom Dee: Well, I can answer the latter question, which is I think our study has to remain silent on it in part for the reasons I described. I mean, one could argue at the most superficial level, we reduced youth phone use and there were no clear gains. And maybe there’s a moralizing hysteria that’s wrong about digital device use among teens or it could be that Jon’s right and what happened here was that kind of intertemporal substitution. On the broader question, I would recommend you talk to some of my other colleagues, Hunt Allcott in particular, because as you probably know, they’ve done direct research on social media use and youth. So they’re more authoritatively steeped in that literature than I am. (Allcott is quoted in the piece and I will post his interview soon.)
Holden Thorp: That’s fair. But you probably have an opinion on the overall discourse on this. I mean, I think unless you read scientific journals, if you only read the millions of books that Jon has put out and the numerous appearances on podcasts and talk shows, you see a pretty simple story here that I think to a lot of people in the public would seem that the science is resolved. And do you worry about the asymmetry of that?
Tom Dee: Well, I’m always concerned about overly facile or monocausal explanations for any large-scale social phenomenon. But I will say, and this is something that may or may not be on your radar that I think contextualizes this some of the most important research I think I’ve seen done in education in the last couple years or actually (and several people have done this independently), were some descriptive pieces that took achievement data in the US and noted that the decline in achievement we’ve seen, it did accelerate in the wake of the COVID-19 pandemic, but is timed to around 2013 where you see an inflection point. And that’s the kind of thing I think Jon has latched onto and others and saying, look, this coincides with the rise of social media. But in the educational research, people also discuss how it coincides with other major changes. One is the one I described around walking back school-based test accountability with Obama’s first administration, I don’t want to get too much in the weeds if this seems off-piste to you, but No Child Left Behind in 2001 brought test-based accountability to scale across the nation, focused on reading and math achievement and asked states to devise systems that would flag schools that were failing to make adequate yearly progress towards their state proficiency standards in work that actually my co-author on this study, Brian Jacob and I did, we found that there were some non-trivial gains in learning from collecting data and trying to hold schools accountable for overall performance goals and subgroup goals, et cetera. Then largely what happened was when Obama was elected, it was around the time when everyone knew we needed to redesign that system in some ways. It was only focused on reading and math, it focused on achievement levels and not value added.
So there was some consensus around this needs to change in some ways, but it was also when the hyperpartisanship we see now began to accelerate with the rise of the Tea Party, and all the political centrism that had supported that reform evaporated. And that’s when the Obama administration started issuing waivers to states who were no longer meeting their NCLB requirements. And after a long torturous period, the federal government really walked away from that accountability, those accountability reforms and there was a devolution to the states in 2015 under the Every Student Succeeds Act. Well, that’s the other major hypothesis for that inflection point in student achievement nationally. You’ll often hear researchers say, “We took our foot off the pedal in terms of accountability and it’s reflected in our achievement results.” So this is an area of active inquiry. How do we understand these macro trends given that inflection point coincided not just with the rise of social media use among teens, but the walkback of school accountability and some other factors like repercussions from the Great Recession. So I think that’s important context that would underscore why we shouldn’t overgeneralize from national trends the effects of one particular historical change.
Holden Thorp: I did want to ask you a little more about the effects of the Great Recession. I mean, there are some data that show that particularly with mental wellbeing, access to resources is a huge determinant and especially the mental health of your parents. And so you say that the effect of the Great Recession is yet another potential hypothesis for this, right?
Tom Dee: It is. And not just working through some of the economic implications for families and what that means for students, but also school’s capacity to fund student mental health services over that period. I mean, I don’t have exact data on this to hand, but I would hypothesize that it was compromised by the Great Recession. So other things were happening at that time.
Holden Thorp: And so you’ve given me two alternative hypotheses, but are you worried that if one of those turns out to be the right thing, that it’s going to create public confusion because the story that it’s the phones and social media is so prevalent right now?
Tom Dee: Yeah, I am just generally concerned about the way we integrate research policy and practice in the US. I wrote about this in an essay at the beginning of this school year for Education Week that the integration between them, at least in education, is incredibly poor.
Part of it is that too much of the education research that’s produced by our universities and think tanks and contract research organizations just isn’t meeting the needs of practitioners. And some of that is, I think it’s overindexed on qualitative methods. Sometimes it’s very transparently ideologically coded and avoids more to sort of positivist scientific postures. But even when we know something is effective, it often doesn’t get picked up by practitioners and policymakers. The recent movement after decades of scholarship to move towards embracing teaching methods that are aligned with how kids learn to read is one important example of that where even where the research does speak with a fairly consistent voice, that doesn’t always punch through to the way teachers are trained and the kind of in-service professional development they get. And then the other concern is that when states make policy pronouncements or even districts, it doesn’t necessarily change practice on the ground.
So I think that’s a potential concern for phone bans if you believe they work, well, many states are saying, yes, ban phones, but then giving districts local control in doing so. And that could lead to implementation that is at best uneven and at worse cosmetic. So the alternative reality that I think I and other kind of technocratic evidence-based policy researchers would like to see is one where we have what are sometimes called inquiry cycles or continuous improvement, where our schools try to function like learning organizations, piloting innovation and planning for rigorous assessment of whether they’re working and then taking a moment after evaluating something to decide what to do next. Should they adapt a reform in some way? Should they take it to scale if it seems promising? Should they walk away if it’s not working? In continuous improvement, we call these PDSA cycles, plan, do, study, act, where the act stage focuses on should we abandon, adapt or adopt a reform?
And I wish we took that kind of approach to studying both phone bans and more generally, how to manage digital devices within schools. And part of the reason that matters too is there’s a hyper-localness to that approach that I think is really critical for understanding what can work and what can work in specific types of school settings or communities. So if I were advising state superintendents, I would encourage them to adopt an appropriate stance of humility about what we really know in these spaces and build a robust learning agenda that can help us answer the kinds of questions we have and build an evidence base that we can disseminate. Does that make sense?
Holden Thorp: It does. I think it’s very rigorous and has less public bombast in it than the moment that unfortunately we find ourselves in. So let me ask you, so one of the things that some of the critics of, and I don’t know if you would agree with this, but I think if you went to Capitol Hill, everybody on both sides of the aisle is focused on the phones and social media and that is probably drowning out some of these other measures that I’m sure the education community would like to pursue.
Tom Dee: It’s just a general problem when they’re so popular the political system’s going to respond to that popularity rather than the more, I think, nuanced approach I was suggesting of saying, “Well, let’s be clear about what we don’t know and let’s plan to learn.”
Holden Thorp: Yeah, perfect. So one of the things that the critics of the phone message say is that no matter how much you try to say you’re blaming the social media companies or big tech or whatever, it still comes off as blaming teenagers for spending too much time on their phones and that that has a deleterious effect. What do you make of that critique?
Tom Dee: I don’t know. That seems off point to me. I don’t think it’s necessarily about blame. Technology is a part of our lives and I can’t imagine a world in which it’s not going to continue to be so. And in fact, increasingly so. I think it’s up to us to think about how to best put guardrails around it and we can do so without us necessarily assigning blame, certainly not assigning blame to the only people involved in these interactions who are children.
Holden Thorp: Well, that covers a lot of what I want to ask you about. I will call your colleagues about the social media part. Is there anything else that’s not out there in the discourse that you’d like to suggest I bring up?
Tom Dee: Yeah. Well, a number of states are considering and moving towards digital device-free school environments more broadly. So I think that’s just what I’ve mentioned, our sense is that’s going to be the next issue and we’re trying to position ourselves to have some high quality evidence that will speak to that. Because again, that’s another explanation for the sobering results we see, which is, hey, we hear, for example, anecdotes about, yeah, the phones might be banned, but everyone’s sitting there with a Chromebook and all a student has to do is open up a slide presentation that they’re ostensibly working on jointly with their peers and all of a sudden they’re communicating with each other within the slide deck typing messages, things of that sort. So I think again, just underscore the broader concern and the ways in which we still need to learn more. And yeah, it’s been an interesting week seeing the way this gets swept up into a very reductive kind of binary maelstrom when I think our team, or at least I’ll speak for myself, want to advocate for this kind of humble approach to saying, let’s try to figure out more what’s going on here.



