Edtech Insiders

Teaching AI to Teach: The National Tutoring Observatory’s Bold Mission with Rene Kizilcec

Alex Sarlin Season 10

Send us a text

Rene Kizilcec is an Associate Professor at Cornell University, where he directs the Cornell Future of Learning Lab and leads the National Tutoring Observatory. His research focuses on learning science, AI in education, and the behavioral and computational factors that shape student success. His work has appeared in Science, PNAS, and other top journals.

💡 5 Things You’ll Learn in This Episode

  1. How the National Tutoring Observatory is building the largest dataset of real tutoring interactions.
  2. What specific “tutoring moves” actually drive learning outcomes.
  3. How AI models can be trained to use stronger pedagogical strategies.
  4. Why education needs real benchmarks for measuring teaching quality in AI.
  5. How simulation-based learning is transforming medical and language training at Cornell.

✨ Episode Highlights
[00:00:00]
Rene on aligning AI study tools with real pedagogy.
[00:02:33] Lessons from early MOOC research on equity and completion.
[00:05:58] How motivation and belonging interventions improved outcomes.
[00:10:57] The mission and structure of the National Tutoring Observatory.
[00:13:01] Why tutoring works—and why we haven’t known which moves matter.
[00:18:28] Tools for identifying tutoring moves across huge datasets.
[00:20:00] Using tutoring data to influence hyperscalers and AI product design.
[00:22:37] The need for robust benchmarks for AI tutoring quality.
[00:27:42] Why great tutors ask questions instead of giving answers.
[00:31:03] How tutoring data can improve AI models while preserving privacy.
[00:34:34] How tutoring providers can join the NTO consortium.
[00:37:16] Simulation-based learning with MedSimAI and Chitter Chatter.
[00:41:34] How AI-assisted development accelerates EdTech innovation.

😎 Stay updated with Edtech Insiders! 

Follow us on our podcast, newsletter & LinkedIn here.

🎉 Presenting Sponsor/s:

Every year, K-12 districts and higher ed institutions spend over half a trillion dollars—but most sales teams miss the signals. Starbridge tracks early signs like board minutes, budget drafts, and strategic plans, then helps you turn them into personalized outreach—fast. Win the deal before it hits the RFP stage. That’s how top edtech teams stay ahead.

This season of Edtech Insiders is brought to you by Cooley LLP. Cooley is the go-to law firm for education and edtech innovators, offering industry-informed counsel across the 'pre-K to gray' spectrum. With a multidisciplinary approach and a powerful edtech ecosystem, Cooley helps shape the future of education.

As a tech-first company, Tuck Advisors has developed a suite of proprietary tools to serve its clients better. Tuck was the first firm in the world to launch a custom GPT around M&A. If you haven’t already, try our proprietary M&A Analyzer, which assesses fit between your company and a specific buyer. To explore this free tool and the rest of our technology, visit tuckadvisors.com.

[00:00:00] Rene Kizilcec: Building sort of a bespoke tool that we hope some people will adopt is just never gonna have the same impact, even if it is much better designed than trying to affect some of what these hyperscalers are doing and trying to improve the sense of pedagogy that these tools are having. Right? And if study mode is not working well right now.

I think it's the right idea that students use something that is more aligned with pedagogical evidence and that helps them by not just giving them the answer, but by sort of talking them through like a good tutor would.

[00:00:35] Alex Sarlin: Welcome to EdTech Insiders, the top podcast covering the education technology industry from funding rounds to impact to AI developments across early childhood K 12 higher ed and work you'll find. Get all here at EdTech 

Insiders. Remember to subscribe to the pod, check out our newsletter, and also our event calendar, and to go deeper.

Check out EdTech Insiders Plus where you can get premium content access to our WhatsApp channel, early access to events and back channel insights from Alex and Ben. Hope you enjoyed today's pod.

We have an incredibly exciting and really. Special episode this week of the EdTech Insiders Podcast. We are talking with, I kid you not, I think, one of the smartest, most dedicated academics who is trying to push EdTech forward in the country, maybe even in the world. Rene Kizilcec is an associate professor in the Bowers College of Computing and Information Science at Cornell University, where he directs the Cornell Future of Learning Lab and leads the National Tutoring Observatory.

Kche studies, behavioral, psychological, and computational aspects of technology and education to inform practices and policies that promote learning, equity, and academic and career success. His work has appeared in Science and the Proceedings of the National Academy of Sciences. It's won multiple awards and he holds a PhD in communication and a Master's in Statistics from Stanford University.

Rene Kizilcec. Welcome to EdTech Insiders. Thank you for having me, Alex. So let's kick off with a little bit of your background. You have been paying so much attention to the EdTech world and how it can be more informed by serious evidence and how we can really make transformational impact for a decade or more.

Tell us about how you got into EdTech and some of your work leading up to your current work with the national tutoring observatory. 

[00:02:33] Rene Kizilcec: Happy to. I feel like I've come full circle. Since the very beginning when I started tutoring kids in a summer camp in the United States when I was a college student in the uk teaching them how to program, I was pair programming, and that's really what sparked my interest in education and teaching.

I did that for three summers in a row. My entire undergraduate before I moved to Stanford to start my PhD at Stanford, I first was working on car interfaces, something completely unrelated, and, and human computer interaction questions there. When suddenly MOOCs happened, massive open online courses, it was when, uh, the very beginning of them launching Coursera.

Just started. edX started shortly after Sebastian front created Udacity. And a group of students, PhD students at the time came together and said, this is really interesting. We wanna study this. And we partnered with, uh, faculty at Stanford in order to create the Lytics lab, the learning analytics lab at the time.

And we did some of the first analysis of what was going on in these massive open online courses. Who was in them? Why are people in them? What about the success rates? How do we define success in this kind of new space of informal learning? All of these questions we're able to answer with some of this early data.

It was a very exciting time, very fast moving. When people didn't understand what was going on. In some ways, it's very reminiscent of where we at with AI right now to make sense of what is going on and how it's being used. So I see this as this similar environment. This is where we, of course, first met Alex when you were at Coursera.

I was coming over there presenting some of the research findings that we had on motivation of online learners, and we did a lot of intervention research. So when we started with the MOOC research, we figured out who's there and why are they there. And one of the things we quickly understood is that a lot of people are there and want to finish the courses, but fall short of that goal.

And so the next set of research studies we ran at the time was around interventions to support students in those courses. Some of the more behavioral science interventions to help them with planning self-regulation met our cognitive skills. Then some were more on the motivational, psychological side, trying to induce a sense of belonging in environments that can feel a little disconnected from other people.

And sometimes if it's a course from a university like Stanford or Harvard, you might feel sort of an identity threat, even being in that course at an environment that you might not feel very comfortable in. And so we worked on some of those questions to try and promote students' engagement at scale across a lot of different courses.

After that, I moved for a brief time to Arizona State University before coming to Cornell. There we worked on the online degree programs at a SU and comparing how students and those programs were performing to students in the in-person programs and how to make course design decisions that could support students success with quote unquote real degree programs.

And then at Cornell, I founded the Future of Learning Lab, where we study all kinds of things related to ed tech. Both in K to 12 higher ed, as well as professional learning. Some of that early work focused on RCTs controlled trials, trying interventions to support student learning. There's a whole body of work on fairness and bias in AI systems and how to mitigate it.

And most recently we've had a strong focus on tutoring. And building up research infrastructure to support researchers and developers in the space to learn more about what is effective about tutoring. 

[00:05:58] Alex Sarlin: It's such an exciting project. We wanna get into the National Tutoring Observatory work ASAP and the Million Tutor Moves Project.

But before we do this one aspect of your work that I want to highlight and I'd love to hear you talk about, which is that you paid a lot of attention in those MOOC days about. Equity of outcomes about who was making it through those courses, who was not, how it divided around the world, or with people with or without existing degrees or from different socioeconomic backgrounds, and some of the interventions that you tried were actually starting to really narrow the gap.

It didn't always work, it didn't always scale, but I think that focus on equity of outcomes is relevant for your work, but it's also very relevant for what's happening now in AI and tutoring. I'd love to hear you talk a little bit about it. Absolutely. 

[00:06:42] Rene Kizilcec: Yeah. One of the first things we figured out in MOOCs is that people are very differentially successful, and we looked at is that because motivations are different?

And yes, they are somewhat different, but it's also the case that environments just work much less well for people who are less prepared to do well in them. And for people who were coming from environments from context around the world that were more different from the place that these courses were developed in.

And so one of the gaps in particular we studied was this the global achievement gap, right? Between learners who are coming in from rich, Western, you know, industrialized environments versus learners who are coming in from more of the global south countries with lower GDP, lower Human Development Index, where achievement rates in terms of completion of the course were just much lower.

And some of the interventions around self-regulation. Supporting students with their sense of belonging. Were helpful in some cases, but not all. In closing those gaps, it's a lesson in the opportunity, but also the challenge of scaling up interventions to really diverse environments. When we first tried these interventions to reduce some of these achievement gaps, we were focusing on a few courses that we so carefully selected because we knew them well.

We knew where the gaps were. When we then try to scale this up and work with my dear colleague Justin Raj across MIT, Harvard and Stanford MOOCs that were all going up, we quickly realized that there's so much variation how these courses are set up, every environment is different. And having an intervention scale reliably across all of those is very difficult.

And so we saw a lot more variance in the outcomes, but some things did work more consistently across them. And those are, you know, some lessons learned from other interventions that can be tried out in the future. 

[00:08:30] Alex Sarlin: I really agree and I think, you know, certain amounts of structure and sort of last mile delivery, having, they always said, doing courses with a friend.

I think this is one of your findings. Doing courses with a friend made you people much more likely to complete it. So having that social network or the belonging interventions where you sort of connected what you're doing in a course to your life and the actual outcomes you want for yourself made a huge difference in some cases.

So I think some of the psychological pieces that you were introducing to this are so key. To that world, but also obviously transfer really well to the tutoring world that you're in now. 

[00:09:02] Rene Kizilcec: And I love that you highlight the taking the course with a friend. This was something that came out in the early motivation research that we did.

It was just one of the motivations that was out out of many options, like, oh, I'm doing this with somebody else. And those people tended to complete at much higher rates, which we then turned into an intervention around social accountability. Finding a friend that can hold you accountable or somebody to take it together with, encourage that kind of behavior.

Which is one of the interventions that looked more promising among the set of options, and it's something that we're seeing also during COVID when we did some survey research around, so students support networks, that was an important factor. So knowing other kids in the class, even though you weren't there in person, and I still think it's a very important thing to have that social network, especially as students are turning much more to devices in order to get help.

My office hours are emptier than they ever have been. I do think that is because of chat, GPT and other tools and students going there for a quick answer rather than stand in line in front of an office of weathering, the Ithaca cult to make it over to the building. And so it's important to have those social connections and have instructors actively encourage that to happen through teamwork, team assignments, other kinds of ways to get students to talk to each other.

[00:10:15] Alex Sarlin: A hundred percent and I'm more and more convinced as I explore the AI field and talk to more people and read more, that a social version of AI is not only inevitable, but it could be transformational for education. I think this idea that one-on-one person and Chad g BT person and Gemini back and forth, is a paradigm that I hope we're gonna actually start to break and have more of AI supporting.

Human to human relationships and interactions. We'll see if that truly happens, but that's my hope and prediction. This perfect segue to what you're doing with the National Tutoring Observatory, which is incredibly interesting work. So first off, just tell us about the origins of the National Tutoring Observatory and what some of your goals are in relationship to EdTech and ai.

Yeah, 

[00:10:57] Rene Kizilcec: we clearly have a long history of personalized learning in the education literature and the tech literature, the National Tutoring observatory. It has the goal of understanding what it is about tutoring one-on-one or small group instruction that actually makes it work for different people in different contexts.

And to do that, we are working with a number of tutoring providers. To collect the largest dataset, which we call the million Tutoring Moves, dataset of tutoring, and making that data available for research and development in order to understand what it is that makes Tutor really effective. And to have better models, AI models, LLMs multimodal models that can help do the job of a tutor.

Now, it's important that the goal is not to replace teachers with this, of course, it is to make sure that when AI is in the loop, whether it is in a one-on-one fashion, or it is a supporting role for a teamwork, that it understands what good pedagogy looks like. One of the big problems right now is that.

The data that you know, LMS are sort of hoovering up in order to be trained, does not have any data of what good teachers are doing, right? Because there are just no large data sets of what good teachers are doing. And so our goal is to create that data to inform the science of learning. But also to inform better technology that can support student learning and teacher development.

Now, one of the important challenges with this work is of course privacy. And we are extremely careful about how we are de-identifying data. What data will be publicly released for this purpose. And one of the things that we pay specific attention to is sort of who can use the data and for what purposes.

And so the National Turing Observatory. Is providing this resource to advance the science, to advance development in the space while maintaining the privacy of students who are being tutored in the session, and the tutors that are the teachers in these sessions as well. 

[00:13:01] Alex Sarlin: I heard about this million tutoring Moves initiative and what you're doing with the National Tutoring Observatory.

You're also doing it in partnership with Carnegie Mellon and some really top professors there, including Ken Kager, who's a absolute legend in educational research. And as soon as I heard it, I said, oh, this is. A hundred percent needed for all the reasons you just said, right? LLMs are trained on the internet, they're trained on huge corpuses of data, out of books, huge corpuses of data.

They're buying from news outlets, none of which has successful educational conversations or what good teachers are doing or what tutors are doing. And as a result, a lot of these guided learning modes, a lot of the tutoring tools that are being built are trying to sort of guess, basically trying to figure out what should a tutor do, what would a tutor do, but.

LLMs are database. That's kind of the whole point. They're not roles based. They're database. So having that huge data set of successful tutoring interactions is incredibly valuable. And of course, one-on-one tutoring interactions that happen between a student and a teacher in a classroom or in a home setting or in a tutoring center, may not be recorded, may not be able to be turned into a tutoring data set.

But these tutoring providers that were online, many of which you are working with. Actually do have huge data sets and they have proprietary data sets. What you're doing is putting them together, taking parts of their data and combining them into a massive corpus of tutoring interactions, and also using them to figure out which types of tutoring and interactions work.

Because actually that leads to my question. People have known for many decades that one-on-one tutoring has outsized effects compared to other types of learning. And mastery-based tutoring in particular is the famous Blooms two Sigma enhancement of learning. But tutoring has been effective for a long time.

It's been known to be effective. At the same time, they don't actually know what about it is effective, which is kind of mind blowing when you think about it. So tell us about that, the idea that we've known this Panacea for learning. It's expensive, but it works. We actually haven't been able to unbundle what's actually happening in tutoring that's making that happen in a way that we can inject into other programs or products.

[00:15:08] Rene Kizilcec: That's right. It's remarkable how little we actually know about the moves of a tutor. That are effective when it comes to learning outcomes. We have a lot of data from large tutoring programs that have been implemented in the wake of COVID with COVID relief funding, and that has led to a lot of insight into just how effective tutoring programs are.

But one of the things that we don't know is what exactly it is in those tutoring programs that is causing those effects. The things that the tutors are actually doing their studies on. How large the group sites should be, the study on sort of how large of a program is more effective. And generally speaking, the more you scale these programs, the less effective they become for the individual learner.

Right? And we really need to understand the secret sauce that is in these programs that makes them work well when they work well. And that is our goal, is to use this massive data set to understand if a student is struggling with a certain concept. And says, I'm confused about this, or I don't know. What are the different ways that a tutor in that moment could respond to the student?

And which one of those options is better? Which one of those is more likely to lead to the student? Sort of more quickly realizing what the misconception is, overcoming it, correcting it. That is what we wanna be able to understand, and we are connecting the data of what is happening in the tutoring sessions, the conversation with data from whiteboards, with data from the video, sometimes if it is available.

With data about test scores, with data about the intelligent tutoring system that they might be using at the same time and doing exercises in with the end tickets that the ticket that they sort of fill out, the exit ticket that they fill out at the end of the tutoring session. Combining all of that to really get a causal understanding of what it is that a tutor does.

That helps a student move forward, progress and learn the concept in a robust fashion. And that is something that just hasn't been done because the data hasn't been available. It's as simple as that there, there wasn't enough data to be able to answer these questions before, and it has held back the science of teaching and instruction because we couldn't answer those questions.

And so our hope is to be able to make this accessible to everyone. And actually a big part of that is that we realize that the data alone is not good enough. Like we can put this data out there. And it's large and not that many researchers will have the skills of processing such large textual data sets.

Sometimes, you know, there's sort of audio involved and, and so we created a tool which allows researchers to very easily annotate what is happening in the tutoring sessions. So this tool allows a teacher to say, just like you use chat GPT to prompt, to revise your text or do other things, you can prompt this tool to find instances of revoicing or giving corrective feedback or whatever it is that you're looking for, what your theory is that you want to test about what is effective or whether tutors are doing this typically.

And the tool allows you to find those instances in the data and be able to simplify those analysis to understand what's going on. Because a data set is only so good as it is being used and the abilities of the people sort of using it. And so making it accessible to a large set of researchers, social scientists, as well as developers who wanna sort of build on top of it, is gonna be important to affect real change.

[00:18:28] Alex Sarlin: Yes. Let's talk about that last piece, about it being relevant for researchers and social scientists, but also for developers and people who are trying to build these systems. Because as you know, I think better than than most. One of the things that has been really tricky in the history of education and education technology is that some of the really robust findings in research just don't make their way into the.

Hands or into the plans of people who are actually designing tools that then get scaled and become part of everyday life for many students, the research can be divorced from the practice. And what is exciting to me about what you're doing with this million tutoring moves initiative is yes, this database is obviously incredibly valuable for researchers.

The idea of being able to annotate. Find different tutoring moves, find the impact of them and identify what is actually working is a hugely important question. At the same time, we all know how fast AI is moving and how quickly people are productizing AI tutors. Frankly, they're building them and launching them constantly and trying to improve them.

So the speed at which the productization is happening and the speed of research tend to not always be at the same speed. I'd love to hear you talk about what your sort of. Hopes are about how the findings that are gonna come out of this national tutoring observatory work might actually be able to influence the field, the policy field, the product development field, and not be captured in an academic world.

And then in 10 years, we say we finally figured out what good tutoring looks like, and for the last 10 years we've been doing it. 

[00:20:00] Rene Kizilcec: Yeah, there's a few different pathways that we have in mind for how this can affect real change, and one of them is that the science of understanding what is good tutoring will help us be more focused about what it is that we prompt these machines to do, the design of what they're doing.

All of those things can be informed by a better understanding of what are really effective ways that a tutor can support a student. That's one. Another one is for people to use the data. When they are building systems and have them grounded in what it is that actual good tutors are doing and what are effective moves, using that data in order to improve existing efforts.

And I, I would love for, you know, if I could bring the data to A-S-U-G-S-V and, and other places and say, Hey, here's a resource, please. If you're building something. Look at this, use this because it's gonna make the product be better. And I am very realistic when it comes to how products can have impact.

It is simply true that a lot of students around the world are using tools like judging G, PT and other things to help them study, right? We, we've all seen the reports building sort of a bespoke tool that we hope some people will adopt is just never gonna have the same impact, even if it is much better designed than trying to affect some of what these hyperscalers are doing.

And trying to improve the sense of pedagogy that these tools are having. Right? And if study mode is not working well right now, I think it's the right idea that students use something that is more aligned with pedagogical evidence and that helps them by not just giving them the answer, but by sort of talking them through like a good tutor would.

And so that third avenue really is to try and make the data useful too. Hyperscalers as well that are trying to improve what models are doing right now. Being realistic that millions of students are using them right now. Exactly. And they're not gonna stop using them. There's nothing we can do about this.

Right? Continues out the bottle. Right? And we want these models to do better in, in what they're trying to do. And a big part of that is also setting up benchmarks for the community, having clear benchmarks that models can try and improve on. There's way too few. Benchmarks in education right now that hyperscalers and other other tool providers can measure themselves against in order to see, you know, what does progress look like in this space?

Having more of those benchmarks that show this is what good tutoring looks like, how well you're able to identify good tutoring. Specific moves that that we know are correlated and predictive of better outcomes, causing better outcomes, ideally. Those are all things that we need more of in the community and the NTO, but, but also other, many other teams out there, research teams out there are working on developing.

[00:22:37] Alex Sarlin: A hundred percent. I think the benchmarks that you're mentioning is something to worth pausing and double clicking on because this is a aspect of the AI world that is very deeply appreciated by some and sort of totally overlooked by others. You know, benchmarks in traditional LLM application and evolution have been really instrumental, basically, you know.

Benchmarks that can assess a large language model against certain types of reasoning or certain types of behavior or certain types of tasks that it can complete, have actually really very explicitly driven, evolution and driven competition among the frontier Labs for who can push it further and, and you, you know, every time you, Gemini three just came out every time any of these large language models evolves.

Show how it did across all these industry benchmarks, which are often basically evaluating the LLM on how well it almost acts as a student is how well it can solve problems or reason or, or write really complex things or do creative writing. And I think the lack of benchmarks for pedagogy, the lack of benchmarks for what good teaching looks like has been really explicit.

I just came back from the Google AI for Learning Forum and I think I've been impressed by how Google has actually really has worked on creating. Meaningful internal benchmarks for the Google products using some core learning science principles, and they're trying to do that in a way that it holds them to a standard and says Anything we do that's learning oriented, we want to make sure it's meeting these benchmarks in a meaningful way.

But we have very little across the field right now, and there are some in development, there are some coming out soon. There are some that that foundations have been funding that are in process. But I'd love to hear you talk about your vision of. If some of these benchmarks were to be developed and were to be in place, how might it improve the development of AI for education?

[00:24:22] Rene Kizilcec: Yeah, benchmarks are driving a lot of the action in this space for better or worse. And the tricky thing about benchmarks in education or benchmarks for teaching is that it's a multi objective problem. It's not just good or bad. A lot of things you're trying to optimize for as a, as a tutor, right? You are trying to keep the student's attention.

You're trying to. Motivate them in the area to keep going. You want them to be able to come back if they have another question. You want to, of course, identify what misconception they have and overcome it. There's a lot of things that are going on. It's a complex space, which doesn't mean we should give up on it.

Very importantly, right? It, it means that we should come up with. A suite of benchmarks, which is, you know what? What is also done in other areas, you know, where people have made great progress and try and hold the frontier models to making progress on these various metrics in there. One of the things that we are thinking of right now.

Because it is hard to sort of do a benchmark for just tutoring overall is to, to start off with a benchmark simply for tutoring moves and identifying tutoring moves, right? So being able to, to correctly identify that a tutor is praising the student is revoicing. What a student has said is maybe giving a, an example, scaffolding, having a feedback loop.

All of these moves that in the literature have been shown to be. You know, correlated with student learning, making sure that a model understands when that is happening in the tutoring session, so that if you are instructing a model to do more of any one of those, that it's more likely to be able to do that, and that it's more likely to be able to use effective tutoring moves as it is tutoring students.

And it turns out that. Surprisingly, some of these are, are much harder for a model to identify right now and some models more than others, than other moves. Just as an example, we, we, we were testing the older version of Gemini. Maybe this is fixed in the new one, but it was having a much harder time identifying, giving praise.

In these tutoring sessions than was Claude and GPT and it was just a, a weird outlier given this very same prompt. And so being able to sort of look at each one of these categories, making sure that, that we are doing well on them is sort of just one example of a benchmark. Very incremental in, in some ways, but a place to start as we're building up more and more benchmarks that can help us build towards better tutoring, better pedagogy.

One example I think that has come up a lot in these sort of study mode pieces is not giving away the solution to a problem, but sort of asking a student for what, what do they think? And it's a, it's a remarkable, simple thing in some ways, right? And it's so different from what an LM is usually prompted and, uh, trained to do, which is to, you know, be a great customer.

Agent, right? And, and give you answers fast and politely. But in this case, you don't want that, right? You want the, the LM to question what the student is saying. My friend Justin Raj, uh, has this great line around, you know, a, a good tutor will question the answers of a student, whereas LM stages answer questions.

And so study mode in some ways is trying to do that well. And just a benchmark on how well a, a model does on that. Can also help us move forward. So big advocate for this work and a shout out to, to the team at Stanford, Susanna Loeb and Ryan Knight and others at AI two are, are working on a set of benchmarks in this area, which I think will be immensely helpful for the community.

[00:27:42] Alex Sarlin: Totally agree. It strikes me as I hear you talk about the questioning and we just talked to Yano from Poly gens recently and Hi. His Teach LM paper also identified that asking more questions, getting more context is a core thing that that tutors in his. Platform do that. LLMs don't tend to do. I, I, it just feels very resonant.

It strikes me as just there's this irony that, you know, one of the first major AI tools in human history was Eliza, which is incredibly basic and simple AI that basically would do almost nothing but ask questions, basically pretend to be a therapist. And it just strikes me that like. If Claude act more, a little more like Eliza, I think it would actually be a better tutor, even though that is like, almost like the dumbest logic in the world.

But if you're telling a student, what do you think about this? And they say, X, Y, Z, and you say, why do you think that about X, Y, Z? Instead of four paragraphs about what it thinks about it and and then asking you three different questions at the end, I mean, I think that would actually be better, which is just like the silliest thing to even think about.

[00:28:43] Rene Kizilcec: But it's so true, and we've known it for a long time in education, right? We, we try and get teachers to talk less, have the students talk more, right? I mean, teach FX is a wonderful example of that. They have this dashboard where they show, it's an app that teachers could use in their classrooms, and they record the, the session and then they get feedback on, on their teaching.

And one of the things that they get feedback on that's incredibly valuable is talk time. How much did they talk versus did students, did they let students talk? And you know, in some interventions that they've run trying to. Improve on that metric. I think the teacher talk time down, they are showing that there is positive effects of doing that in classrooms.

And you know, it's, it's not rocket science, right? Students are just talked at all day. We know that that is not an effective way to learn. We know that active learning is more effective, interactive learning is most effective. And so facilitating that with LLMs, but also in social settings. Is going to be a, an important challenge to work on.

[00:29:37] Alex Sarlin: I wanna come back to your comment about what you call hyperscalers, because I have a theory that if you were to take a bunch of transcripts from OpenAI, you know, chat GBT study mode, or guided learning, or co, you know, study and learn mode. You would see something that actually looks a little bit like what we're saying here.

You'd have a, a student asking a single question or saying a, a one word answer or a short answer, and then this huge, you know, word vomit from the LLM about all the things that you should do differently or you should think about. Maybe offering a practice quiz or to make flashcards, and then the student says two more words and then the thing goes.

Again, it's just a guess, but based on some experience in the field, that's my guess. It feels like we are at a moment when, as you said, these tools are being used, both the commercial consumer versions and the study modes at huge scale. Things like Conmigo from Khan Academy are being used at large scale.

All sorts of AI tutors are out there, including from incumbents, and yet they're really not. Designed for that. I guess my question is. This data set that you're building for the Million Tutor Moves project, how do you see it? You say we're designing it with hyperscalers in mind. I mean, is it a possible future that Ed tech companies and big tech companies can take this data set and say, okay, we're going to retrain anything we're doing that's trying to be tutoring, that's trying to be teaching, and make it learn from this data set about what works and what doesn't.

Is that an explicit goal of the project and tell us what that would look like? 

[00:31:03] Rene Kizilcec: Yes, it is an explicit goal that the work we're doing has the largest possible positive impact in the field when it comes to improving students', learning outcomes, students' learning experiences. And if one path to that is to go through, you know, working with Google Open AI and Andro and others, you know, given the, the reality that that is what a lot of students are using day to day.

I see that as being a, a very reasonable approach to having that impact. Now, with that said, right, it's, it's not like those companies don't have enough resources to do things on, on their own, but the critical difference is how you curate a data set like this, right? How you carefully think about what outcomes you're trying to optimize.

There's a lot of thought that goes into. Curating this, this resource that will have impacts in, in how it's then being, you know, actualized realized when it's sort of students are using those, those tools. And as where we really see, you know, our, our group fitting in. It's sort of doing that careful thought, what are the outcomes to optimize for?

How do we extract them from the data set? And of course, being extremely careful about privacy. I mentioned privacy at the beginning. It's gonna be paramount to make sure that, you know, students are not identifiable, teachers are not identifiable. To make sure that this is gonna be shared in a, in a privacy preserving manner and, you know, has maximal positive outcomes and really minimizing the risk of, of any harms that come from sharing data like this.

[00:32:30] Alex Sarlin: A hundred percent. And that privacy, the anonymization de-identification and aggregation is a huge part. I mean, you're working with seven, eight more tutoring providers. So the combined effect of having a data set that combines different types of tutoring sessions in different contexts with different types of learners, with different types of subjects is very, very valuable in terms of having a, a robust and meaningful and diversified data set that can be used in lots of different contexts.

[00:32:56] Rene Kizilcec: That is right. Let me pick up on two things there. One is we have currently seven tutoring providers that we're working with closely. We have more knocking on our door wanting to, to join the effort, being interested in contributing to the effort. And a really important part here is the diversity of the data.

We want to have tutoring sessions that are, you know, represented around the United States and then hopefully around the world as well in this data set to make sure that, you know, we don't capture tutoring just in sort of rich white neighborhoods, but we capture tutoring across the entire United States and the rest of the world.

We've started with these seven because. You know, they were partners that we already had relationships with, you know, they were excited about this, this effort. And we're looking to expand eventually to, to more tutoring providers in the United States, but also globally. We are also, right now focused on K to 12 mathematics because of the funding for the project is coming from the, the Gates Foundation and, and the Chan Zuckerberg Initiative.

We wanna hopefully expand to other areas of tutoring that are, you know, just as necessary in, in the sciences and, and hu in the humanities and writing where tutoring is also much needed. So we are looking to make it a, a, a resource that is, uh, able to support tutoring development and research in, in a number of areas beyond what we have right now.

[00:34:14] Alex Sarlin: Yeah, so just to get logistical here, if, if somebody is listening to this and works at a tutoring company or runs a tutoring company, or can think of five tutoring companies in India or in Brazil that they're like, oh, this would be a perfect match, what would be the best way for them to connect with you and help, you know, raise awareness of your work to them and their work to.

[00:34:34] Rene Kizilcec: Please reach out to us at the national tutoring observatory.org. There's an email address or contact form that you can fill out to get in touch with, with our partnerships team. We'll be able to set up a call, have a, have a conversation about sort of the, the kind of data you're thinking of, how that could fit in.

Figure out what the timeline might be for integration. But we are very interested in, in working together with more partners to make this a broader, more representative resource of what tutoring looks like. And actually for many tutoring companies, there's sort of concern around being the one that's of shares.

Data, and of course we have to respect all, you know, applicable laws and user agreements, but the fact that it is a consortium of many tutoring providers that are all contributing to this broader resource that is housed in an academic environment is I, I think something that sort of gives this whole project a different flavor.

And, you know, makes it a space for innovation and a resource to many researchers out there. We also have a community of practice, which we run a monthly call. Uh, Alex, you've joined a few times already. Amazing. And we, we come together and we discuss what's the latest in, in the research. We share findings from the data that that, that we have.

So analyzing what tutors are doing, how it correlates, how sessions evolve over time. And we talk about some of the challenges that providers are facing, right? Which can sometimes be things that's, you know, around sort of de-identification of data for sharing. It can be around of understanding if tutors are, are adhering to the training that they're receiving, and how to adjust tutor training based on what is observed in the actual tutoring sessions.

So it's a lot of topics of conversation and, and it's a vibrant space for, you know, like-minded people to, to exchange ideas. 

[00:36:11] Alex Sarlin: It's unbelievably exciting and it feels like, you know, at a time when I think the public perception of AI and education is really sort of on the fence right now. I mean, there's incredible stuff happening throughout the field, including around the world, especially around the world, I would even say.

But you also have a fear of screens, a fear of big tech, a fear of, you know, people impinging and sort of coming into. The education system with new initiatives that a lot of teachers are worried about. Academic integrity. There's sort of a lot of headwinds right now, and when I hear about a project like this that is so careful, it's so thoughtful and it's really designed to, for, to do nothing but improve how tutoring outcomes work, including LLM based or human.

It raises my sales about the future of the uh, uh, the space. We only have a couple minutes left, but you know, you do lots of other things at the future of learning lab at Cornell, and I wanted to give you just a chance to talk about some of the other. Work that you and your students have been doing.

You've been doing simulation based learning. You work with a company called hia, I think i.ai. Is there anything else you'd like to flag about the Cornell work and about what you're doing with the future of Learning Lab? Absolutely. 

[00:37:16] Rene Kizilcec: Yeah, so the National Tutor Observatory is a very big project that we're very excited about.

There's some really exciting other projects going on. Two of them actually somewhat related so I can talk about them together. Both of them are making use of simulation based learning driven by. Amazing progress in real time. Audio voice-based interactions, right? So the real time, API from OpenAI is a good example of that, but there's sort of other, you know, providers of that kind of model as well.

And one project called Metsim ai. We've been working with a number of medical schools while Cornell UCSF, Yale now Mayo as well. To, uh, create a platform which is now being used in a number of these medical training programs with their incoming cohorts of medical students and trainees to simulate patient encounters and give them structured feedback on those encounters.

Also giving them opportunities for, for self-reflection, practicing writing notes on the encounter. All of the things that I've typically done today. With human actors where medical schools are, are paying actors to come in, play the role of the patient, and students get these opportunities, but they get them way too rarely and they, you know, usually feel quite anxious for the few times that they get to do that.

There's not as much opportunity to get feedback or the feedback is very delayed. So there's a lot of space to improve there for having a sort of real time interaction that you get feedback on, you can play back. And so we're doing a lot of research on, on that. In particular, the opportunity that this yields, which was never before possible, just to have these longitudinal encounters where you can see the same patient again.

Yes. And again, over time you could sort of learn how to show empathy. You can learn how to interpret. Information that might have been generated by an AI that might be adopted soon for doing the intake questions or, right. I mean, we are aware that there's a lot of AI going into the healthcare space and it's really important to train, you know, medical professionals as well as people in the, in the, you know, in the vet school and dentistry.

We're talking to sort a number of different nurse practitioners, a number of different, of schools where. You need to have good communication skills with patients. You need to be able to manage the encounter. You need to be able to, uh, be empathetic to their concerns and provide, you know, clear directions on treatment.

So that's, that's one project that we are really excited about, especially 'cause there's so much, so much interest in this space, but for trying it out. In different areas. The other one is kind of doing a similar thing, but in the case, in the context of language learning. Where in language learning, one of the things we know is most important is speaking practice.

Right? I'm sure that if I had had more of that in, for, for my French classes, you know, I would be at a better state now after 12 years of learning it. And so speaking practice is hard to, to get by because you need to find somebody to practice with. The other problem with speaking practice is that it should ideally be.

Rounded in the curriculum of what the teacher is doing, the class that you're taking. And so in speaking practice, we are giving teachers an opportunity to create little cases for their students to practice speaking in that language, having a conversation in the language that they're practicing, and again, getting immediate feedback on that conversation so that they can try again.

We've rolled that out now in a, in a few classes. Students really like it because it's a low stakes environment to practice, right? The teacher only sees that they have done it, they don't get to hear it, and the students are getting immediate feedback on the categories of things that the teacher thinks are important to get feedback on for that.

So building a custom for, for sort of this use case, working with a lot of teachers to design it, to make sure it, you know, it does what it's supposed to do. Those are two other projects that, that we're actively engaged in that are. Very exciting. 

[00:40:51] Alex Sarlin: And that one is called a chitter chatter, right? Is that the chitter chatter project?

[00:40:54] Rene Kizilcec: Right? Chitter chatter. Yeah. So the language learning practice platform is called Chitter Chatter. That one is developed by Jayden Gathers, who's a Cornell PhD student. The Metsim AI platform is developed by a PhD student in the CS program. Here with me he is, uh, Jan Hick. Both of them are incredible. I guess it's a real testament to the, the opportunity of using, I mean I say vibe coating with, with some hesitation because it's, you know, if you see the platforms, it's much more polished than what you would usually think as vibe coding.

But it's just, you know, AI assisted coding. 

[00:41:27] Alex Sarlin: Exactly. It shows 

[00:41:28] Rene Kizilcec: you how you, how far you can get in just one year compared to what, how long it would take previously to build something. 

[00:41:34] Alex Sarlin: It's incredibly interesting work, and I recommend everybody be closely following what's happening at the Cornell Future of Learning Lab.

Rene Kizilcec is an associate professor. At Cornell University where he directs the Cornell Future of Learning Lab and leads the National Tutoring Observatory, as well as projects like Chitter Chatter and MedSimAI. Thank you so much for being here with us on EdTech Insiders.

[00:41:56] Rene Kizilcec: Thank you, Alex, for having me.

[00:41:58] Alex Sarlin: Thanks for listening to this episode of EdTech Insiders. If you like the podcast, remember to rate it and share it with others in the EdTech community. For those who want even more, EdTech Insider, subscribe to the Free EdTech Insiders Newsletter. On Substack.