Skip to main content

3 Ways to Use AI for Listening Activities

by Brent Warner |

I’m really excited to think about the potential for how AI can help students with their listening, but I’ll be the first to admit: AI for serious listening practice is not yet quite where it needs to be to integrate fully into the language learning classroom. The tools are still not always consistent or precise enough to trust without verification, and we can’t fully rely on AI properly understanding our expectations as we get into things like tonality, accents, and all of the tricky bits that make studying another language both fun and frustrating.

Still, there’s a lot to play with, and there’s no reason not to explore, as long as we’re clear with our students that at this point generative AI continues to make mistakes, so they should confirm anything they’re not sure about with their teacher, and not with their computer. 

Tutoring With ChatGPT Voice

ChatGPT activated its voice feature a while back and it does a powerful job of sounding like a fluent English speaker, though it’s not always skilled at making the adjustments we know students need to understand spoken English. Still, with a huge corpus of data, strong pronunciation skills, and the ability to be used anywhere, students can use some hacks to help them practice. Here’s one way to try it out:

Using the ChatGPT app on the student’s phone or tablet, have them paste in the following prompt:

I am a beginning English language learner. I want to build my listening skills. Can you help me build an understanding of the difference between the “R” and “L” sounds in English, pausing regularly to test my understanding as you go? It is important that you only ask me one question at a time. Please let me know if you understand, and wait for me to say “Let’s Begin” before you start. Remember: this is a LISTENING activity, not a pronunciation activity, so please quiz me by checking whether I can hear the difference between the sounds, or with multiple-choice quizzes. Do not make me pronounce the sounds at this point. Please speak slowly so I can follow along.

This will activate the intention behind the activity, though the output will be stronger on the paid ChatGPT 4.0 than on the free version 3.5. Remind students to be flexible in their expectations.

Once the prompt is in, students can tap on the headphones icon on their screen, which will switch ChatGPT from text mode to voice mode. From here, it’s a simple matter of listening along and trying to respond to the questions they are being asked.

In my own tests, this worked quite well with ChatGPT 4.0, and passably, but with some issues for 3.5. Remind students that when they are done using the Voice Chat feature, they can close it down and ChatGPT will have a full transcript of the conversation they had. ChatGPT 3.5 tended to tell me I was right, even when I was wrong, so if students are working with the free version, you may encourage them to go back and read the transcript when they are finished.

Of course, students can change any part of the prompt above, perhaps focusing on tricky vowel sounds instead of minimal pairs, or, as things get more advanced, you can help them change the prompt to build inference skills rather than have them focus on distinguishing certain sounds.

Remember that this type of AI is getting stronger day by day, so even if it’s not perfect at the time of writing, being aware of it; revisiting it from time to time is the key to being ready for the day when it inevitably becomes a listening practice powerhouse!

Voice Generation

As language teachers, we all have times when we just need another voice. Whether it be for samples, dialogues, voice-overs, or anything else, access to more voices that speak English in different ways has always been either too much work or unobtainable. But now, we don’t have to bother our friends or colleagues across the hallway or the world to sit them down for a recording session: AI can do it for us with voice generators.

ElevenLabs is one of the first high-quality voice generators that came out during the AI explosion, and it’s still one of the best. You can make about 10 minutes of content per month for free and download it as an mp3 to use as you need it.

The interface isn’t the most intuitive, but it doesn’t take long to figure out.

    1. Click on Speech on the left-hand column.
    2. Select “Text to Speech” at the top of the page.
    3. Under “Settings” you can select the voice you want to use (don’t worry too much about the other options).
    4. Paste what you want it to say into the text box.
    5. Click “Generate.”
    6. Once the audio is made, you can download it using the three dots menu under “Generated Audio” on the right-hand column.

One powerful reason to use ElevenLabs is the variety of accents they have, including a small, but hopefully growing selection of nonnative English accents.

Notice the different types of voices that capture a wide variety of accents, pitches, and linguistic styles students will benefit from hearing.

In the image above, we can see the voice option for Giovanni, who speaks English with an Italian accent. I used ChatGPT to create a quick and informal speech, and this is what it came out sounding like, processed through the Giovanni voice:

Some possibilities include

    • embedding it in your LMS for students to listen to,
    • uploading it to a PowerPoint or Google Slide,
    • emailing it to your students to listen to on their own time, and
    • using it to create a dialogue when you don’t have a friend or colleague you can record together with at the moment.

Extra: Here is the prompt I used in order to have ChatGPT create an impromptu speech:

Create a highly informal impromptu speech from the perspective of an ESL student explaining to other students why it's important to be able to distinguish vowel sounds in English. Make the whole speech 1000 characters or fewer.

YouTube Quizzes

YouTube is a great tool for listening practice as students can find content on any topic under the sun and try to make out what the channel’s host is saying. For a long time, there have been ways for students to work on their listening skills through YouTube videos, but they’ve always required a lot of proactive work on the side of the listener just to set things up. This ultimately leads to a more passive experience, which doesn’t sharpen students’ listening skills as much as we’d like.

There are several tools out there that can now make instant quizzes out of YouTube videos, including one of my perennial favorites, Quizizz, and one of the current darlings of the EdTech scene, MagicSchool.ai. While those may offer some great options for understanding YouTube content, I’m currently intrigued by Quizium, an extension that puts pop-up quizzes directly on top of YouTube videos while you watch. This can encourage students to pay attention as they’re listening, much in the same way that happens when working with EdPuzzle or PlayPosit, but without all the setup time.

Sample of a learning report showing me how I did on a quiz.

Quizium is a new tool, so it will need some time to fine-tune the output (e.g., I’d love to be able to have it simplify its language into designated CEFR levels), but the current beta is promising; watch just about any YouTube video and it will auto-generate a quiz for you that tests your comprehension as you’re watching. Quizium pauses the video for you as it asks questions, and summarizes how you did at the end. At the moment, it’s nothing you can use for formal assessments, but it could be a great share for students to practice listening to authentic and relevant materials on their own time.


While there are a lot of interesting ways to play with AI and listening, I’d like to reinforce the idea that we’re still in the exploration stage here, and the speaking/listening side of things seems to get less attention than the reading/writing side at this point. The activities and resources above are meant to be fun explorations into the possibilities to come, so please be sure to treat them as such.

As always, if you have your own resources you’ve been playing with, please share them below! We’re all better together, and I’d love to know more about what you’re experimenting with.

About the author

Brent Warner

Brent Warner is a professor of ESL at Irvine Valley College in California, and an educational technology enthusiast. He is co-host of the DIESOL podcast, the only podcast with a specific focus on EdTech in ESL. He frequently presents on the crossroads of technology and language learning, focusing on student engagement and developing learner autonomy. Brent likes his coffee black and his oranges orange. He can be found on LinkedIn at @BrentGWarner.

comments powered by Disqus

This website uses cookies. A cookie is a small piece of code that gives your computer a unique identity, but it does not contain any information that allows us to identify you personally. For more information on how TESOL International Association uses cookies, please read our privacy policy. Most browsers automatically accept cookies, but if you prefer, you can opt out by changing your browser settings.