
Edtech Insiders
Edtech Insiders
The Promise and Perils of AI in Education: Stanford GSE Students Share Their Work
In this special episode, we sit down with three innovative Stanford Graduate School of Education (GSE) students who are exploring cutting-edge applications of AI in education.
Michael Chrzan is a Master’s student and Dean’s Fellow in the Education Data Science program at Stanford. A former Master Teacher in Detroit, he taught Mathematics and AP Computer Science for seven years. His research uses machine learning to predict large-scale school closures and inform equitable decision-making.
Matías Hoyl is a Computer Science graduate from Chile and an edtech entrepreneur who has founded two startups focused on improving learning through technology. He led a coding bootcamp for women in Latin America, helping them launch tech careers. At Stanford, he is researching AI applications in education, including synthetic student simulations and AI-generated teaching tools.
Samin Khan is an AI researcher specializing in K-12 and higher education and currently an AI Research Scientist at Kiddom. His work focuses on developing AI models for curriculum development, lesson planning, and grading. At Stanford’s Education NLP Lab, he researches dialogue-based pedagogy and student engagement using large language models.
💡 5 Things You’ll Learn in This Episode:
- How AI-powered predictive modeling can help districts plan for school closures.
- The potential of synthetic students to improve assessment design and instruction.
- How AI is supporting teachers with curriculum implementation and real-time feedback.
- Why AI tutors may not improve student learning outcomes as expected.
- The risks and opportunities of AI in education, especially for equity and accessibility.
✨ Episode Highlights:
[00:06:07] Michael Chrzan on using machine learning to predict school closures.
[00:10:51] Samin Khan on AI’s role in lesson planning and teacher feedback.
[00:16:37] Matías Hoyl on simulating student learning with AI-powered models.
[00:24:09] Balancing AI research with real-world edtech applications.
[00:31:39] The importance of data, bias, and transparency in AI for education.
[00:46:11] Will AI improve or widen equity gaps in education?
😎 Stay updated with Edtech Insiders!
- Follow our Podcast on:
- Sign up for the Edtech Insiders newsletter.
- Follow Edtech Insiders on LinkedIn!
🎉 Presenting Sponsor:
This season of Edtech Insiders is once again brought to you by Tuck Advisors, the M&A firm for EdTech companies. Run by serial entrepreneurs with over 25 years of experience founding, investing in, and selling companies, Tuck believes you deserve M&A advisors who work as hard as you do.
[00:00:00] Michael Chrzan: know when it will be their time before it is their time so that they can try and engage in that process in a more equitable way. Dr. Pearman has some research that shows a causal link, for example, between gentrification of Black neighborhoods and school closures. And so I'm trying to help districts who really want to do this process that they sort of have to engage in.
To responsibly manage their districts. I want them to be able to do it in ways that they can engage the community and really figure out what's the best way overall for everyone involved to do it.
[00:00:28] Samin Khan: And what we've seen, especially within ELA, the curriculum can be very dense. So while you have all this great curriculum, we've heard time and time again, that many teachers are having to skip out on many sections of the lesson plans.
And so one of the areas that I'm excited about is allowing teachers to leverage their expertise to. Get the most out of the curricula that they're given and that means providing them with tools to Condense the parts that they think need to be condensed and focus more on areas that need to be focused on And
[00:00:59] Matias Hoyl: I think the big challenge here is that LLMs are too nice, right?
So if you prompt a GPT write a question for a student that it's struggling in this particular topic in math You have to answer that specific question. It's probably going to get it right because LM's want to have things right. They want to serve you. They want to agree with you. So it's very difficult to prompt them in the correct way so that they actually embody the students and stop being nice and just like.
Try to be that struggling student.
[00:01:32] Alex Sarlin: Welcome to EdTech Insiders, the top podcast covering the education technology industry, funding rounds to impact AI developments across early childhood, K 12, higher ed, and work. You'll find it all here at EdTech Insiders.
[00:01:48] Ben Kornell: Remember to subscribe to the pod, check out our newsletter and also our event calendar, and to go deeper, check out EdTech Insiders Plus, where you can get premium content, access to our website.
Early access to events and back channel insights from Alex and Ben. Hope you enjoyed today's pod.
[00:02:12] Alex Sarlin: This is a really cool and very different episode of EdTech Insiders. So we had the privilege a few weeks ago to be spending some time in Northern California. We got to be on the Google campus talking to the Google learning team, which was amazing. And we got to spend some time at Stanford at a really, really interesting AI and learning event hosted by the great Isabel Howe.
At that event, I kept running into Stanford Graduate School of Education. Students master's students who were just doing such interesting, innovative work in the field that I felt like I really wanted to amplify some of the work they were doing and get their perspective on this particular moment in edtech.
So without further ado, this is an interview with three Stanford GSE candidates, all doing very different, but really innovative. Projects in different areas of ed tech. So first, Michael Chrzan is a second year master student and a dean's fellow in the education data science program at Stanford. He's a proud Detroiter where he was born and raised and where he was a master teacher for mathematics and AP computer science for seven years prior to coming to Stanford.
His research focuses on using machine learning methods to predict large scale permanent school closures. We'll talk a lot about this really interesting project and his broader research interests are in using data science methods to help create lasting positive change in every facet of society. He can help improve from business to education and beyond.
Our second Stanford grad student and entrepreneur is Matias Hoyl who studied computer science in Chile and has founded two ed tech startups focused on improving learning through technology. He also led a coding bootcamp for women in Latin America, helping them launch careers in tech. Currently at Stanford, he's working on the applications of AI and large language models in education, including simulating student answers and basically trying to figure out how to make sense of.
Simulated AI students and creating stories that teach reading. Samin Khan is an AI researcher specializing in K 12 and higher education. He's currently an AI research scientist at KITM, EdTech, success story, really great company, where he develops and evaluates image and text AI models to improve curriculum development, lesson planning, Feedback and grading in K 12 education.
As a graduate student in the Education NLP, Natural Language Processing Lab, led by Professor Dora Demski at Stanford University, Samin conducts research on dialogue based pedagogy and student engagement, leveraging LLMs trained on expert annotated K 12 classroom recordings. Additionally, Samin loves leading roundtable discussions with education leaders and expert researchers on the risks.
And opportunities of AI and education. We will talk a lot in this conversation about both sides of that coin. He's led round tables in higher ed with numerous us college presidents in collaboration with the association of college and university educators. And in K 12 with researchers at Stanford's human centered AI institute.
Enjoy this conversation with three really interesting and very compelling Stanford graduate school of education students. Kameem Khan. Michael Cherzin, Matthias Hoyle, welcome to EdTech Insiders. Thank you so much. It's nice to be here.
[00:05:32] Michael Chrzan: Really grateful to be here. Thanks, Alex.
[00:05:34] Alex Sarlin: Happy to be here. Yes, so I'm really excited to hear from all three of you.
You're doing interesting work all around education and AI out of Stanford Graduate School of Education. Let me start with you, Michael. We met at a Stanford event. You're doing something really interesting. You're using research and AI to predict school closures with machine learning. School closure is a Hot topic.
It's a very sensitive topic. Tell us about why you chose to focus on school closures and what you're doing to use machine learning for predictive modeling.
[00:06:07] Michael Chrzan: Yeah, sure. Thanks, Alex. So, yeah, my work focuses on predicting school closures. More particularly, I'm trying to predict if a district will close a large portion of their schools.
There's potential problems that could arise that I'm trying to avoid if districts were, if I had a tool that could predict which schools would close, we could talk more about that later, but yeah, it was just a sort of pure happenstance. I got to Stanford, not really knowing what I wanted my capstone project to be.
And started working with my advisor, Francis Alvin Pearman in the GSE. And he is. on school closures and I experienced school closures. Both my elementary and middle school no longer exists. I grew up in Detroit, which had a like huge issue of school closures as I was growing up. And so it was a personally relevant topic for me.
And one that I also knew was really important. There's, I know, you know, San Francisco, Oakland, Denver. You know, schools, there are districts in Texas sort of nationally, there's becoming this trend of having to close schools in large part because they're just less students in districts, right? Birth rates have been down for years and years now, right?
And so a lot of districts, especially large urban districts, are facing closures right now and a lot more of them coming up are going to be coming. And so I'm just trying to help districts know when it will be their time before it is their time so that they can. Try and engage in that process in a more equitable way.
Dr. Pearman has some research that shows a causal link, for example, between gentrification of Black neighborhoods and school closures. And so I'm trying to help districts who really want to do this process that they sort of have to engage in to responsibly manage their districts. I want them to be able to do it in ways that they can engage the community and really figure out what's the best way overall for everyone involved to do it.
And so, yeah, I'm using machine learning of about 12 to 15 years of census and national center for ed statistics data that I'm putting together to try and build a predictive model using machine learning algorithms to be able to do that. Well,
[00:08:05] Alex Sarlin: yeah, it's really interesting. Quick follow up. So you're working thesis here is that if you are able to predict and look to the future and see where the budget is going, that closures are sort of imminent over the next few years.
That gives the entire community more of a chance to make sense of what's going on. What can be changed, what can't be changed, and instead of it sort of catching everybody unaware and then often affecting certain communities much more than others, there can be a much more of a sort of philosophical, political coming together over time to make sense of what it's going to mean.
Is that right?
[00:08:40] Michael Chrzan: Yeah, that's, that's exactly it. You know, school closure process is usually like, districts usually engage in the selection process of which schools they'll close when they decide that they need to. Really more than a year, two years max, and two years is probably rare for like an amount of time for the process to take.
And there's, you know, really clear evidence, research in psychology that shows one of the things that tends to lead to biased decisions is restrictions on time. If people have to make quick decisions, the bias tends to come out, right? And so when a system's trying to make this massive decision of this complex interweaving of things in just a year, It really comes down to like, okay, what are the most bare bone metrics we can use?
Which metrics do we have the easiest access to which are not always the most true like best metrics to decide school quality? Right, which you know again, Dr Pearman's work shows that you know, a lot of districts tend to use academic scores Which we know have a lot of bias in them. Enrollment numbers, which have a lot of bias from, you know, selection processes in certain areas.
It could be segregation or redlining historically, right? Historical inequalities. Some of Sean Reardon's work is around that, that, you know, we've looked at and incorporated. And so we're trying to really help them think about instead of just using traditional metrics that we know will lead. To more black, brown and poor schools closing.
How could we have more robust metrics to make this decision, including engaging the community that we serve as educators? And what they want to see is like, okay, if we have to engage in this process, if we just do not have funds to keep as many schools open for whatever reasons, how could we go about doing that?
And the first step is knowing in enough time that we have to do that so that we can like actually put the resources towards figuring that out.
[00:10:24] Alex Sarlin: That makes a lot of sense. I love the focus on using machine learning to give a wider window of prediction so that it gives everybody more of a chance to make sense of what's coming and not do it, as you say, in a rapid fire, often biased way.
Samin, let's shift to you. So you're doing some work with generative AI and teachers. We have seen an explosion of generative AI work over the last couple of years. Tell us about what you're working on and what opportunities and risks you see for generative AI.
[00:10:51] Samin Khan: Absolutely. I could talk about this all day, but I'll try to keep it brief.
There are two areas that I think are most promising regarding generative AI towards empowering teachers. The one area that I think I'm the most concerned about, I'll start with the optimism. So I think the two areas that I'm most interested in are in curriculum implementation, particularly in lesson planning.
And the second is on teacher feedback. With curriculum implementation, I think there's tons of really great curricula that have been emerging these last few years that are evidence based. However, going from high quality instructional material to implementing those as lesson plans in the classroom is a whole other beast.
And what we've seen, especially within ELA, the curriculum can be very dense. So while you have all this great curriculum, we've heard time and time again that many teachers are having to skip out on many sections of the lesson plans. And so one of the areas that I'm excited about is allowing teachers to leverage their expertise to get the most out of the curricula that they're given.
And that means providing them with tools to condense the parts that they think need to be condensed and focus more on areas that need to be focused on. I was listening in on an interview with a teacher last week, and they were sharing how many times they are hearing advice from a teacher next door about parts of the curriculum that they had a total pain with last year, and parts that students loved.
Now if we could leverage that peer to peer expertise, and Pump that into some kind of generative platform that allows teachers to then mold and shape their lesson plans accordingly. I think that would be a really great way to empower teachers to do what they do best. The second area is on teacher feedback.
I spent some time working with TeachFX last year that does work on analyzing classroom audio to provide teachers with automated feedback. And I was working on a platform that detects student engagement. And I think it could be really powerful to And what we've seen with talking with some teachers by by giving them insights immediately after their classrooms to show them.
Here's how much your students talk. Here's how much you talk. Here are the things that you said that led to students talking more and engaging more in academic language in the classroom. That has gone a long way because it shows the teachers immediately what's working well and what's not working well in their classrooms.
So, curriculum implementation and teacher feedback are the two areas that I'm most excited about. The area that I'm most concerned about is on AI tutors. I don't think there's any clear evidence that AI tutors are increasing learning outcomes. In fact, a paper that Michael shared with me not too long ago suggested evidence that, I believe in the paper, and correct me if I'm wrong on this, Michael, they had a group of students that had AI tutors and a group that didn't.
The group with AI tutors was performing better at first, but once you took the AI tutors away, they were performing worse than the students that never had it in the first place. And so, I think there's also work done here at Stanford by Rose Wang and Dora Demske that show that tools like ChakGPT, when given to students to help them work through math problems, it's clear that when you compare that with having teacher support, ChakGPT is very ineffective with helping them remediate errors that they're making.
When working through math problems. So I'm kind of concerned. I think there's a lot of hype around AI tutors. That being said, I think there's evidence that AI assisted tutors are really promising. Similar work done by Dora Dembski and Roswang has shown that when you provide tutors with real time feedback on suggestions that they can share with students on how to work through a math problem, there's evidence that that leads to better student outcomes than when you just have the human alone.
So, all that's to say. A. I. Tutors. I don't think they're ready yet. I think it's a promising area, but I don't think they're ready yet. A. I. Assisted tutors, though, I think in the short term are really promising,
[00:14:29] Alex Sarlin: really interesting insights. And I hear two sort of through lines in your comments. You know, in terms of the optimism, both of your examples, I think, are great examples of Places where you can sort of find a new type of data in classroom environments, which can be incredibly useful for optimizing and improving instruction in your first use case.
It's pure knowledge. It's institutional knowledge. You have many teachers often working off the same curriculum, sometimes high quality instructional material curriculum or. the same curriculum in other ways, they know which parts to lean into, what parts to lean out of, what parts engage students, what parts are more effective.
And yet that data is not collected anywhere, so it can't quite be used yet. And then TeachFX, we've mentioned on the podcast a number of times how it does something really interesting in the classroom, which is turning talk, which is, you know, technically data, but we don't usually think of it that way.
But turning the talk that happens in the classroom into Analyzable data to answer exactly the kind of questions you said. You know, was there something that you said as an educator that allowed that encouraged other students to react and start really getting involved? Or, you know, did you talk too much as an educator?
That's when they always talk about. And then in terms of the concern 100 percent that we've been definitely following the Stanford work about tutor copilot and the paper that came out about support for human tutoring as a really interesting sort of variant of what we call a I empower tutor. So instead of working with Chatbot tutor, which is one way of doing AI tutoring.
Could AI actually improve the relationship and improve the effectiveness of human tutoring? And I think it's a really rich fate. So really excited to hear that you're focusing on it in such a broad way. I want to move to you, Matthias. So you are doing something that has been a dream of mine. People who listen to this podcast have heard me mention in passing how exciting it could be if we could simulate.
students because simulating students allows a huge amount of different experimental protocol to be run. You can test effectiveness, you could test different interventions, you can do all sorts of things. You could practice your teaching. You're probably thinking about it a very different way than I am.
But I know you're thinking about simulations of student behavior with A. I. Tell us about you are studying.
[00:16:37] Matias Hoyl: Happily. And maybe I'll start with the disclaimer that this is all early stage and there's a lot of experimentation to be done. And as you said, it's an exciting field, but there's mixed results in this regard.
But the origin story is that back in Chile, I had the luck to be a math teacher. And what I struggled the most while being a teacher is creating questions, right? Like you have to do a test for your students and you're sitting in your computer. And you're typing out questions that seem right, but you're not sure if those questions would be adequate for the level of your students and you're an expert in the topic.
So they look good, but maybe they're too difficult. They're too easy. So what I thought now that I'm Stanford is wouldn't it be cool to have this army of synthetic students that could test out those items for you. So give them to a hundred students, see how that play out and not just have back like the difficulty of the question, but also.
feedback around, is this written correctly? Are the choices good enough? There's too much misinterpretation in the choices and a bunch of different things that you outlined at the beginning that would be very nice to have beforehand in your classes, right? And I think the big challenge here is that LLMs are too nice, right?
So if you prompt a GPT, Write a question for a student that is struggling in this particular topic in math. You have to answer that specific question. It's probably going to get it right because LMs want to have things right. They want to serve you. They want to agree with you. So it's very difficult to prompt them in the correct way so that they actually embody the students and stop being nice and just like try to be that struggling student.
Right? So another example in this regard is something that I've been working in the past few weeks. We're with a group of people where we're trying to simulate breakout rooms. And we're all been in breakout rooms, right? Where people are disengaged with their cameras off. If you have a task for them, probably two people are going to be back and forth and the other people are not going to say anything.
And when we tried to simulate that with LLMs, they were all so collaborative and they wanted to be, and wanted to have the task done. And it wasn't possible to have them be rude, you know? So people call this psychopathy, so like this tendency of LLMs to be hourly agreeable, and it's a difficult thing to get right in education.
So what I'm trying right now is trying to blend the flexibility of LLMs with sort of this learning science of evaluation called psychometrics, where there's a lot of math. and numbers and statistics to actually try to get right what the level of ability of a student is and what the level of difficulty of a question is.
And what I've seen in some preliminary results is that that signal is strong enough as to say to the LLM, okay, this is the particular student, or this is the specific ability I have in. I don't know, sums of fractions or whatever topic you're working with. So there's, as I said before, a lot of more experimentation to be done, but there's a tendency of LNs to actually have questions wrong.
It would make sense to have them wrong. So that's a promise to solve. And the thing is that, as you said, if this work, Yeah, you could build a platform where teachers could just input their questions and then get back like a comprehensive feedback as to how the question is going to play out for students in different levels.
Because the thing that I'm working on is that these synthetic students. I have students with high ability and with low ability and medium ability, so you could see like see in the future how that would work and maybe you could extrapolate this to other things like lesson plans and other things that teachers work on day by day.
[00:20:07] Alex Sarlin: Exactly. So. One quick follow up question for you. I'm hearing something really super interesting in your work with simulated students, with synthetic students. One is it sounds like you have had some success in sort of being able to level set their abilities, right? Their knowledge and skills to be like, you know how to do long division and you don't or you know how to do this scientific concept.
You understand that and you don't. I'm putting words in your mouth. I want to ask if that's actually accurate, but you at least have some way to do high ability versus low ability in different kinds of Domains. So the ability spectrum is really interesting. And then there's this concept of affect, right?
You say, Oh, when we put synthetic students in a breakout, they're all very collaborative and very happy. And nobody's disengaged or interested or bored or mean or anything. Do you see those two sort of Parallel paths of the sort of knowledge, skills, ability of a synthetic student and then the affective piece of what a synthetic student do as sort of related or are these two separate things to figure out when it comes to simulating student behavior?
[00:21:09] Matias Hoyl: There's one thing that I think the both scenarios share, and it's that there's a lot of prompting to be done. And prompting is just like black magic. There's not like a science to it. It just, I've been trying a bunch of different prompts. And for example, for the breakout room scenario, at one point I had to say like, remember you have a free will, you can disengage with this, like trying to actually force them to think in that way.
So I think nobody has figured out what the good prompting techniques are. You have to experiment a lot. And there's a bunch of like good practices as to how to experiment with this. Like, not so science y thing. So that's the thing that they share. But the thing that's different in the synthetic students is that I actually have numbers, right?
You have this ability number. You have this number for the question difficulty. What I've done is that you can transform the number for a rubric for the LLM. Like, the number goes from between negative three and three. And like, negative three is a struggling student. Three is like a high achieving student.
You can have a rubric, like, from three to 2. 5. This is the characteristics of the student for this specific topic. So, eventually, just everything gets transformed to words, but it's a stronger signal as to just say, this is a good student, or he has some experience with different questions from this topic.
And I think that's the thing that is making a small difference. And I emphasize small because it's seen some difference, positive differences in the experimentations, but yeah, there's still more to be done.
[00:22:36] Alex Sarlin: That's fantastic. One thing I've always wondered about with the concept of synthetic students is, as you're saying, you can prompt them to act differently than they would otherwise.
I remember we interviewed Kristen, the chairboat from Khan Academy, and she was like, the open AI people told us to yell at it, to tell it not to give the answer. So we do all caps and say, do not give the answer to the students. Right. But it was like, like you said, black magic. You don't know what's gonna work or not.
But is there the potential to literally train them? A synthetic student on different amounts of the standards like you've never heard of the quadratic formula. You literally don't know what it is. You have nothing in your database about it. Now we're going to try to teach it to you and see if it can work.
It's just I'm not asking you yet, even though it's interesting to me. But that's what I'm always so curious about is, you know, could we get to that level? So one thing that's interesting about all three of you, you're at Stanford. Meaning you are dead center in the middle of product land, Silicon Valley commercialization, and you're all doing really interesting, cutting edge work with a I for education, which is a very productizable type of entrepreneurial world.
There's lots of startups in this now. So I just want to ask you each in turn. All of the things you're working on are related to Commercial applications. Michael, you know, to start with you. Obviously, the idea of being able to predict the future of closures of the future of any kind of educational resource allotment is something people would be very interested in doing and probably would pay for.
I'm curious how all of you. I'll start with you, Michael. Think about balancing your graduate work, which is obviously the pursuit of knowledge with your entrepreneurial ambitions.
[00:24:09] Michael Chrzan: Yeah, I'm probably the worst one to start with because I'll make it quick, though. I have very few entrepreneurial ambitions, so to speak, whereas Matthias and Samin have both led and do a lot of entrepreneurial work.
So, yeah, I mean, I'm more focused on the tool being implemented quicker than productizing it would necessarily allow, right, of if the results, once I finalize the model next quarter, come out how I expect them to. It almost feels unethical to me to be like, Oh, I know these 10 districts are going to close 10 percent of their schools in the next five years, but I'll wait until I can build the product and tell them.
That makes sense. And so for me, it's much more about like letting these districts know as soon as I can. And then, you know, in terms of any like entrepreneurial or like the real work isn't knowing that they will close the schools. The real work is process of closing the schools, right? And so that's more so where I feel if there is any, you know, entrepreneurial piece or like really even from a nonprofit sense, like support for the districts, it's not necessarily making the prediction.
Uh, it's more so in like, okay, we know you're going to now have to do this in some time in the next five years. How do we go about doing that? Right? And so me and professor pyramid work really, I don't know about really, but closely with San Francisco on their recent closure process. And it wasn't an ideal process by any means, but it was still sort of revolutionary in the way they were able to engage in the community, the stakeholder feedback, how that informed the process.
It's really almost up to the end. And so, you know, that process of what metrics do we even use to make this decision? Who decided on those metrics? Was it, you know, board members who were elected or maybe not, or is it actually the community affected by the choices? I think that's really where the rubber meets the road, so to speak, of how do we actually help them then engage in the process of closures and what learnings have we had from the districts we've worked with already while I've been building this model of like how we do this well.
And so, yeah, balancing that has been quite a bit of work. This past summer, I essentially researched with a whole nother internship, working with SFUSD, while also working with Nuzella. And so, yeah, no, it's definitely a lot of work. I think part of what's really motivating for me, in the context of like my research piece of it, and like whatever entrepreneurial work comes from that, is knowing, having been a teacher, exactly what this means for the families.
And the teachers and the people who would be affected by those choices
[00:26:36] Alex Sarlin: and schools have closed as
[00:26:38] Michael Chrzan: well. Exactly. And so that to me is like, it's a ridiculous amount of work. Sometimes it's late night calls of like, Hey, we need to adjust this one thing about these algorithms, but it is worth it to know that this avenue of using machine learning and education.
Could have really powerful, positive impacts when we're hearing a lot of mixed results or really worried about biases and other mostly gen AI that requires prompting.
[00:27:04] Alex Sarlin: It's a great point that there's a difference between the sort of predictive knowing that something is likely to happen based on a machine learning prediction is different than knowing what to do now that you know that it's going to happen.
And that could be highly consultative, much deeper. It could be potentially much more complicated, almost like a crisis management situation for our district. It's a really interesting point. And yeah, I It makes sense that there's a nonprofit version of that. That may be more in line, more ethical than a for profit where you sell districts the right to know their future.
But it's a really interesting technology. Let me shift to you. This, I mean, you know, you're talking about teacher feedback, working with teacher, teach effects about lesson planning, about high quality instruction materials, all of these are entrepreneurial ventures. There are a lot of nonprofits, some for profit.
How do you balance your research with what you want to do?
[00:27:49] Samin Khan: Yeah, to me, it goes very hand in hand. So I was a startup founder for three years between 2018 to 2021. I was working on a company called Autumn where we were predicting burnout by analyzing slack messages with language models and at that time, mental health prediction via language models was incredibly new and it wasn't clear if it was going to work.
And I would say being in a space where I'm working on a problem where I'm actually not sure if it's going to work, but I'm think it would be incredible if it were. That's a space that I love being in. And I get to scratch that it as a researcher. So across working at teach effects on detecting student engagement from classroom audio to working at KITM currently.
Developing AI tools on curriculum implementation. I get to scratch that itch as an AI researcher. And a lot of this was working with the AI researchers at autumn and my startup, we know we hired a couple of postdocs and PhDs that were focused on this problem that wasn't yet solved. And I was constantly enamored and in awe of how they would bring stuff from the lab into the real world.
And so, yeah, in spirit of that, that's what I'm hoping to do now at Stanford. That's very interesting.
[00:28:59] Alex Sarlin: And how about you, Matias? Simulations of student behavior could be something that is very useful for understanding lots of things, effectiveness of interventions of diverse needs and neurodivergence and I mean, everything, but it also could be an amazing business.
How do you think about it? Yeah, I agree with that. If
[00:29:16] Matias Hoyl: I'm honest, I have to say that I'm not that good as a researcher. I'm more of a product person. Definitely Samin and Michael are better researchers than I am. I'm of the three. I'm the only one that I'm not currently researching with any advisors or lab Stanford.
I usually, when I'm curious about something, I want to build a UI. I want to see something. I want to see something in a cell phone or a webpage. I'm not sure if that's the correct approach. Most people would disagree with that. When I go into a new course at Stanford that has a project on it, I usually start working on some UI where I could see things.
And that's my approach. And before coming to Stanford, I was a startup founder. I did some math, adaptive learning back in Chile. And actually that data set is what's enabling right now the synthetic students research. So yeah, I'm a product person. I like to build things. I'm not as good as a seller, so most things don't sell, but they end up built and they end up used by people.
I love
[00:30:09] Alex Sarlin: doing that. That's fantastic. So lots of different approaches to sort of combining the research and the application. And I think what's so interesting about this moment is that the distance between research and application potentially is shorter than it's ever been because it's so quick to develop with all the new tools with no coding tools with AI tools.
You know, Michael, you're mentioning we're doing this predictive model and we're already using it in San Francisco. They're happening in parallel or it's. So, I mean, you're figuring out, you know, lesson planning and using it. We didn't get, I'm probably working with high quality instruction materials, which is one part of their core business.
It's neat to see the research and the application working much closer side by side and not the research goes in a journal. And if you're lucky, some product manager opens the journal and reads it at some point like that's never worked. You mentioned data sets, Matias, and let's talk a little bit about the data set situation, because AI particularly is, I've heard it called a data hog, right?
Data is the whole thing. Data is sort of everything when it comes to generative AI, especially training generative AI models or fine tuning them. So we're all limited in terms of what we can do to the available data sets, either ones that already exist or ones that we can collect in a proprietary way or we can find in various ways.
You mentioned, Matias, having this data set in your previous life that you could use for this. When you think about data and from your perspectives as researchers, where do you find it? What data is conspicuously missing? Are there biases in the data set that you have to control for or try to work around?
How do you think about data and how would you suggest listeners think about finding and using data?
[00:31:39] Matias Hoyl: In their work, usually in research, you have like a research question and then you go find the data. That's like the standard procedure. Right. But what I've learned in my short time at Stanford is that that can go both ways.
Sometimes you have good data for some specific reason or some specific circumstance. And then try to figure out what work, what can I mine with this data? What insights can I build with that data? And what I've learned is that there's big data sets that most people use, but they're being overused. So most insights that you could extract from them probably have been done.
And the cool research that's been doing right now and that has like success is usually people that go along ways to collect the data. So you have probably transcripts that were saved somewhere that you, didn't use because you didn't have NLP techniques that now you can use, or you have the budget to go and annotate data or go and interview people.
So that's one way. And the thing is that when you use LLMs as data, as somewhat I'm using right now, it's tricky because as everyone knows, and probably a bunch of people in this podcast has talked about, these LLMs mirror the biases that we humans have, that the internet itself has, right? So you have to be cautious as to what your results are, how generalizable are those results.
For example, in my case, I have data from Chile. Is that generalizable to the US case? I don't know. Mostly because on top of that, I have an LLM going through that data and those LLMs have biases. So as a research, you have to be very cautious as to. What you can say with your data and most times people that have success is people that have the budget to collect the data in very specific ways.
[00:33:18] Alex Sarlin: Lots of interesting insights there. I especially think the idea of data that people may have been collecting but couldn't have been unlocked without an LP or other techniques. That's an interesting way to generate a data set. Michael, you mentioned having, I think you said a 15 year data set from your advisor about school closures that seems to make a perfect training data.
more predictive model. Tell us about it. Sounds like the data must have come first. How do you match your questions to the data? And what should you do as a researcher or an entrepreneur who doesn't yet have the data? Do you have to make it? Do you have to find it? Should you not even bother and only start with questions where the data exists already?
How would you approach it?
[00:33:54] Michael Chrzan: I think it depends on how much work you want to put in, right? To Matias's point, you know, places that find success are ones who are able to get really good data, getting data is a lot of work. It is hard. It is expensive. You have to know which questions to ask, right? Particular like modeling strategies depend on you having like every covariate that matters, for example, right.
With, uh, you know, linear regression. And so these. Questions about like what data you have, what data is missing, who's in the data, who is not. Those are like very tricky. And so, you know, overcoming each of those barriers depends on, they're just definitely possible. They're best practices. It's more or less about how much work you want to put in.
The data set I got from my advisor, you know, one of the things, and you know, that I'm actually joining it, we found he did work on combining years from the NCS data set. He already had this data set before. I got here and this is all publicly available data. So that's another one of the things I wanted to mention is there's a lot of data out there that we just haven't been able to examine, especially in education.
There's a lot of data we collect. That's part of what drew me to the education data science program. I was really aware as a mathematician exactly how much data was being, you know, revolved around my students lives, how much Data was collected, how much data was publicly available. And I saw so few spaces, if any, really, where that data was being used meaningfully to like draw insights and not be data driven, the data shouldn't drive anything because the data is not like that reliable, but it should inform us, we should, it should, it is good enough to tell us, give us some idea of what's happening and what we might be able to do about it.
And so. Yeah, my advisor already had the NCS data, I've cleaned it up a little bit more, and now we're bringing in this ACS, American Community Surveys data from the Census Bureau. They have this really great data set called the EDGE data set. So sorry, I do not remember what the acronym stands for. But, that data set has ACS estimates.
Based on school district geographies, so I'm able to estimate, okay, this district has, you know, the average housing prices, the number of people of different ages and ethnicities and all these things for the community that the school is nestled in, because I think there's, you know, there's just such a body of research, both in education, sociology, all these different places.
That schools are not separate in any way, shape, or form from the communities they serve, right? And so there is data that exists out there, but, and I think, you know, Sean Reardon's done a really great job of this with his education opportunity data set. That they've been developing here at Stanford, bringing it all together to actually be able to draw insights on.
That's one of the like really big hurdles for education. I know California is trying to do some really good work with that. We're trying to build their cradle to career data set. I know other States, I think Michigan is building similar data sets as well. And so that's really like the forefront of that is putting that data together.
And then doing the exploratory analytics that we know as data scientists, we should be checking on our data set. How balanced is it? Who is missing? How does it compare to what we know? You know, are these data sets that we want to join telling different stories? What does that mean about combining them and really interrogating those things?
And sort of the unfortunate reality, especially when it comes to like generative AI, there is no such thing as a unbiased data set. That's one of the like things I've sort of just had to come to accept in the last couple of weeks in one of my courses on NLP. And so we're going to be trying to build LLMs.
That served purposes for us. The question I feel happy to hear if Tamina Matias think differently about this is Not how do we get rid of the bias in the data set because that is folly We won't be able to do that The question is what do we do with the bias that is there and could we get a bias that is actually?
Useful to us, maybe even in some way, right? So for example, if we're building LLMs to help judges, you know, come up with sentencing, if that LLM could have a bias towards mercy. That would be fantastic, as opposed to, you know, the racial biases that are, like, prevalent in a lot of judicial data, right? On education, if we have LLMs that are trying to help teachers grade, which, please God, I hope no one's doing that right now.
With the biases we know exist. You know, I'm so sad about it. As a teacher, working with a student, you just have more data than an LLM would ever have about that student and what you've worked with them, what work they've done, right? So you would know like, oh, okay, well, if Michael, you know, tried this problem and I see what he was trying to do, partial credit here is acceptable versus like a one or a zero, got it right or got it wrong, right?
Those decisions are things that we as the like people using the data using the technology still have to be able to make. So, you know, there's a lot of conversation around keeping humans in the loop when AI is being introduced into these like different avenues. I think that is insufficient. The humans should still be in command of the loop.
It's not even just like as a part of the process. It's like, we have to be the ones still making the judgment. Letting the LLMs and, you know, any data really inform us, but they should not just be like this separate individual cog in the loop, especially in instances like education, where it's so crucial that the decisions we make are made well, because they affect people's like futures and communities, futures, not even just our students.
[00:39:08] Alex Sarlin: Yeah, two super quick thoughts. Talk to Gautam Thapar, who is a entrepreneur building a company called Enlighten. ai, and it is a sort of grading assistant. But the whole premise of it is that, as you say, it's not only a human in the loop. Not only does the teacher steer the feedback and get to decide what happens or not, but actually the model is trained on each individual teacher.
So it watches teachers grade over a certain amount of time and trains the students. To try to emulate that teacher. And then, of course, you never deliver a grade without it going through a teacher. So it's an interesting combination of the human running the loop, and then maybe over time the human becomes in the loop, but not having to do as much of the work.
It's all a very tricky thing. The racial bias, I think, you know, an interesting question for you is just You look at your 15 years of data, and as we know, you know, the school closures are disproportionately impacting, you know, marginalized communities and black and brown students and teachers. And the question is, you know, would the algorithm, would a predictive algorithm replicate?
That bias, if it could, because it would say, Oh, this is what traditionally happens. So this is what will happen in the future. And I'm sure there's a name for it. But that a I bias saying what happened in the past is should inform the future is something that is very, I think, an interesting philosophical thing to wrestle with at this moment.
To your point, Matthias, about LLMs being biased and sort of replicating the human biases on the Internet with human biases and punishment are all these things that we've known to be true. How do we change those biases? In the other direction, like to, as you say, toward mercy or towards equality. It's a really interesting question.
So, Samin, how do you think about data? Because you've mentioned TeachFX, KITM, and learning data that's in teachers heads about what works and what doesn't work. Learning data that's in the air, you know, in a classroom. These are new types of data sets. How do you think about data? Is it a combination of new data that's coming out of classrooms and existing data historically or data that can be unlocked by new methods?
Yes,
[00:41:08] Samin Khan: but simply yes. But also building off of Michael's question, I wanted to talk about, I like what Michael was saying around Making use of bias that can be in the data towards our purposes. It reminds me of a project that I worked on a number of years ago with a company on building an automating tool.
And part of what I was interested in, we had race and gender and other information related to the students and the teachers. And one of the things that I was curious about was knowing that AI models can perpetuate bias. Humans also can and always do. And so part of my question was, we already introduced an AI tool here.
How much bias does it create? And in order to get that, we need a baseline. And so lo and behold, with the study, with the Sunbeam company, it turned out that there was bias in their process with, with grading, and it's not surprising because part of what teachers can view is the name of the student, the picture of the student.
And. We've seen in past studies, when you provide an evaluator with an essay that has a name and an essay without a name, the racial identities associated with that name do influence the evaluator to be biased in their assessment. Whereas with an AI, you can choose what inputs to put into it, and so we can explicitly not put in the name.
That being said, it can still gleam, you know, social identities from the way that the writing is written to dialect and whatnot. But part of what we actually saw in that study was that AI did not increase bias along some metrics, and in some areas it actually did better than what the humans were doing.
So I think there's a case to also be made for areas where it can actually reduce bias, but getting that baseline is tricky. A lot of educational organizations don't want to do that work to evaluate how much bias is already in their process, because they don't want to Expose more, more pain points to go to the question of the data sets.
I'm really excited about work that Dora Dempsey is putting out to her lab, the education NLP lab. She released a data set of math classroom transcripts with NCT that has 5, 000 hours of classroom audio and corresponding transcripts, which I'm really excited about. And when we think about generative AI applications in any field, I think about advice I got from some Stanford faculty that are working on interdisciplinary AI projects through the Human Centered AI Institute.
And part of what they said is, is if you're working on an, as an AI practitioner in a multidisciplinary space like education. You have to embed yourself in the space. So if you're developing an AI tool, visit a classroom, sit and watch the teacher teach. And if you can't do that, do the next best thing, watch a recording and try to immerse yourself in that perspective of the stakeholders that you're building your tools for.
So I think data sets like the NCT are a great way to immerse yourself in that environment if you're working on AI in the classroom space.
[00:43:42] Alex Sarlin: Is that data set available yet?
[00:43:45] Samin Khan: Yes, it's public. I think you have to request it. I don't think it's just like unfettered, but you can request it and it's available for research.
Fantastic.
[00:43:52] Alex Sarlin: We'll follow up and put some resources in the show notes for this episode about some of the data sets and research that you're mentioning here that when people if people want to follow up on it, because I think that these data sets like that, Are incredibly powerful right now for many different people in many different ways for research, of course, but I think it's an interesting moment for data because suddenly you have these models that can do so much with data.
We suddenly realize how there is some good, really good publicly available data. There is some data that's outdated. There's some data we have to transfer context. You've mentioned. The data story is, I think, something that everybody is grappling with. You know, we interviewed the CEO of seesaw, and he's like, we have recorded our students portfolios of their learning artifacts for years and years and years.
We have millions of them, but we would never use them to train an A. I. Or anything like that, because that's not how we collected that. That's not what we collected them for. We wouldn't have the rights to do. We would never betray our You know, our users privacy for that. And it's just an interesting moment where some of these proprietary data sets are locked for a variety of different reasons.
So you've all mentioned equity in different ways, some more openly than others. But I think one thing to think about here, I'd love to hear each of you on this is as we're hitting this AI and education moment, there's Been a very polarizing reaction and in my experience, you know, I think some people are incredibly optimistic.
I count myself there saying finally, maybe you could break down some of the traditional problems in education. Maybe that kind of, you know, I would call it strategic blindness. You just mentioned to me not knowing things about the students that would make it a more fair and meritocratic process. That we could never do in regular life.
Maybe we can finally overcome some of the traditional issues. And then others are saying, well, no, because generative AI is based on human biases. It's based on our past. It's based on what we know and what we think we know. And it may just create even more of a gap and create even more of the same issue.
Maybe make people even more isolated and working only with AI and not each other. Lots of different reactions. So I'd love to hear each of you talk about sort of your own level of optimism or pessimism. And I know that's, you know, that's a blunt mechanism, but a little bit specifically with regards to do you think I will be able to improve equity in education?
Are you bullish on that or bearish on that? And let me circle around and start with you, Matias. It
[00:46:11] Matias Hoyl: depends on the day. I would say sometimes I wake up and say, Hey, AI is going to change the world. And then the next day I just wake up and I say, AI tutors are the worst. What I would say is that the difficult thing is that if you test all these awesome tools that are around like Magic School or the new update of Duolingo or Canmigo, if you use it, they're awesome.
So it's a good tool. It adds value. I think they're well built. Really smart people are behind them. People that know about education, people that know about products. So the products themselves, I think it's not the problem. I would say the problem is it's twofold. The first one and most obvious is connectivity.
And maybe I'm not sure about the statistics in the U S but in other countries, not all students are connected to the internet and financing infrastructure to get them connected. It's. It's difficult in my country. It is a long country. There's a lot of rural communities that are not connected to the internet.
So for them, this discussion makes no sense, right? So that's the first and obvious one. But the second one is that if you see the statistics of people that actually use these products as they intend to be used. They're mostly, uh, uh, students that come from higher income families, or that have this, the cultural backgrounds that are needed to use these tools.
There's this study, I think that the people from Khan Academy themselves, they have kind of a lot of backlash around Khan Mu, right, but themselves A few years ago, they did a study that said exactly that. So only, I don't know, like 10 percent of the students that were using Duolingo, they were using it the way that it should be used, right?
30 minutes a day. And the other 90 percent weren't using it at all, probably. And when they look specifically to the demographics of those students, the students that are high achievers or they have the best grades are the ones that are using it the most. And the students that come from more difficult contexts weren't using them.
So, to your question. I do see a world where the gap is bigger, gets widened just because the students that have the means that are connected to the internet and have the families that support them are actually taking the value out of these tools that are good tools, as I said before, and the students that are struggling, that are less motivated, that have more difficult context to use these tools, they probably aren't getting all that value, and that's a scenario that I'm worried about.
How about you, Michael?
[00:48:34] Michael Chrzan: Yeah, I think Matias brought up a great point, and you know, he spoke to the context he knows best in Chile, but the digital divide is a reality in America, like a really potentially, like, troubling one. You know, it's a well documented phenomenon in the U. S., exactly which students have high speed internet, which students have technologies, how those tend to correspond with things that we, that were already issues in the U.
S. around, you know, race and income. And so those concerns are very prevalent, should be just as prevalent here in the U S and so for, you know, all the reasons we've talked about and stuff and including stuff like that, I think I'm more bearish. On AI, I think the things we want AI to be able to do are things we already have the power to do, we just don't have the political will to do in education.
I mean, I don't think having technology that could make it faster will increase that political will necessarily, but to the example, you know, Samin and you talked about just a second ago, Alex, I think AI is a really great bias detector. It's fantastic at that. So if we wanted to use it to make ourselves and our institutions better, that would be a fantastic use.
And so I think I sort of started in the throw of disillusionment in the hype cycle for AI when it first came out. I remember, you know, the weekend that chat GPT first launched their, you know, chatbot framework with GPT 3. 5. I remember I was sitting. Working and I was like, Oh my God, this thing can put out something that looks like a lesson plan.
Like I was a teacher and still just playing around like, Oh, what could it do? And it was not anything I would use ever. And I mean, I'm sure it's gotten better since then, but it was just interesting to me that it could do it. Right. So I think, you know, projects like Samin, Matthias and mine and the work other people in our cohort are doing are sort of showing me the places that AI could be really helpful.
And I think very few of those require prompting is the thing I think that is sort of different than the current ed tech sort of trends using AI and machine learning is that like the real power of this will not come from a really great prompting machine.
[00:50:41] Alex Sarlin: Really interesting thoughts. I love the sort of metaphor of if we're not aiming in the right direction, if we don't have the political will to do the right thing, then doing it faster, doing what we're already doing faster, doesn't make it better.
That's a really interesting metaphor. I'm going to really, that one's going to stick with me. I think it's up to all of us as an ed tech community right now and as an educator community to aim, I call it like aiming the cannon, right? We have to figure out where to Push. What problems are we trying to solve with a I and how do we not just put it in the service of the same system we've used forever?
And we know to not be effective, certainly for everybody. So I mean, how about you? How do you think about equity? Are you bullish or bearish on a I
[00:51:18] Samin Khan: think I'm generally optimistic because I feel like the conversation has gotten less binary. I think like a little over a year ago, I felt like most of the conversations that I was having with AI people, education people, and everything in between was often like, this is going to change the world or this is going to ruin the world.
I also definitely fell victim to that. I was on like the other side of where Michael was, where I, you know, I saw Sal Khan's Conmigo talk and I was like, Oh my God, AI tutors are going to change the world. And then I saw the AI tutor papers and I was like, yeah, maybe not. We balance each other out, you know, get that people like Michael.
But yeah, so I would say the conversation has gotten less binary and more precise. I feel like people are more. Diving into specific use cases where they feel like AI is good and where it's not good and like how we started this conversation up. We talked about areas that we're, we're optimistic about for specific reasons and based on evidence and then areas that we're not.
Part of where I've seen that is I've been doing these roundtable discussions with college presidents and I've been absolutely amazed by some of the forward thinkingness of some of these areas. So for example, Sunam Beaton Garcia, who's a college president, Of Chippewa Technical Valley College in Wisconsin, who you should talk to if you haven't already had her on the show.
She's amazing. They basically were applying different uses of generative AI video picture text to help their nursing faculty develop new learning resources. And, you know, at community colleges, Developing new like learning resources are more consistent on demand. A lot of what they're doing is a workforce development and shorter courses.
And so they were able to take a heavily strapped faculty and develop new resources that were also diverse in representation. And there's probably videos of this online somewhere, but she talks about. You know, using these, and it probably took a lot of prompting because we know that using DALI, it's very difficult to get, you know, people of color in the complexion, but you're able to have more diverse representation of people through these generative means, which also is its own ethical hot topic of using generative AI to get images of people of color.
But that being said, I think there are precise applications that can promote equity and diversity. And I think the conversation is shifting there. So I'm generally optimistic now. I mean, I always was. I'm more critically optimistic.
[00:53:25] Alex Sarlin: I like that point that maybe the polarization is getting more nuanced and people are starting to have a little bit more delicate positions about, you know, it'll be useful for this, but not this yet.
Or maybe it could be it's further along in support for human tutoring than it is on, you know, one on one chatbot tutoring. I agree. And I think that's been a huge progress for the field. I mean, there was a time when nobody would talk about anything but cheating with A. I. Then there was this sort of polarization.
And now I feel like we're getting nuanced. And I think the work that you're all doing is pushing it even further and thinking about some really interesting use cases of how AI could really support. HQIM implementation in the classroom, the synthetic students. I'm just, I can't wait for that moment where we, where we can maybe crack that one and the idea of prediction of the future of schooling or school closures.
And your particular case, Michael. This is really great work. If people want to find your work or you all online, are you open to discussions from some of the listeners of this podcast? Definitely.
[00:54:21] Michael Chrzan: Yeah, absolutely. Yeah.
[00:54:22] Alex Sarlin: Happy to. LinkedIn, just find me on LinkedIn. So, terrific. That's really nice. You're doing really interesting work.
We'll put your contact information in the show notes for the episode. This is Michael Cherzin. Matias Hoyle and Samin Khan, they are all graduate students at the Stanford GSE, the Graduate School of Education, doing really interesting and cutting edge work on AI and education. Thanks for being here with us on EdTech Insiders.
[00:54:45] Samin Khan: Thanks for having us.
[00:54:46] Michael Chrzan: Thanks so much, Alex.
[00:54:47] Alex Sarlin: Thank you, Alex. Thanks for listening to this episode of EdTech Insiders. If you liked the podcast, remember to rate it and share it with others in the EdTech community. For those who want even more EdTech Insiders, Subscribe to the free EdTech Insider's newsletter on Substack.