"English-learning students’ scores on a state test designed to measure their mastery of the language fell sharply and have stayed low since 2018 — a drop that bilingual educators say might have less to do with students’ skills and more with sweeping design changes and the automated computer scoring system that were introduced that year.

English learners who used to speak to a teacher at their school as part of the Texas English Language Proficiency Assessment System now sit in front of a computer and respond to prompts through a microphone. The Texas Education Agency uses software programmed to recognize and evaluate students’ speech.

Students’ scores dropped after the new test was introduced, a Texas Tribune analysis shows. In the previous four years, about half of all students in grades 4-12 who took the test got the highest score on the test’s speaking portion, which was required to be considered fully fluent in English. Since 2018, only about 10% of test takers have gotten the top score in speaking each year."

  • @michaelmrose@lemmy.world
    link
    fedilink
    English
    10
    edit-2
    3 months ago

    It’s possible for both things to be true. Human reviewers might be biased towards awarding higher scores and the computer could be dog shit at scoring. I have no idea how this can meaningfully be grading fluency. Fluency in a spoken language consists of vocabulary, grammar, and pronunciation.

    I have seen plenty of people who were very fluent who speak with an extremely noticeable accent who were none the less comprehensible. Software is extremely likely to perform poorly at recognizing speak by non-native speakers and fail individuals who are otherwise comprehensible. Because it wont even recognize the words its nearly entirely testing pronunciation and then denying such students access to electives that would allow them to further their education.

    • @ArbitraryValue@sh.itjust.works
      link
      fedilink
      English
      13 months ago

      It’s quite possible that you’re right. I haven’t been able to find any research that attempts to quantify how accurate the software is, and without that I can only speculate.