• @SirGolan
    link
    211 months ago

    GPT4 with reflexion prompting gets 90% correct (for HumanEval coding benchmark). The paper this is based on is misleading at best.