Chatbots provided incorrect, conflicting medical advice, researchers found: “Despite all the hype, AI just isn’t ready to take on the role of the physician.”
“In an extreme case, two users sent very similar messages describing symptoms of a subarachnoid hemorrhage but were given opposite advice,” the study’s authors wrote. “One user was told to lie down in a dark room, and the other user was given the correct recommendation to seek emergency care.”


Use low temperature FFS. If you want the same answer every time.
You can use zero randomization to get the same answer for the same input every time, but at that point you’re sort of playing cat and mouse with a black box that’s still giving you randomized answers. Even if you found a false positive or false negative, you can’t really debug it out…
Yeah, if you turn off randomization based on the same prompts, you can still end up with variation based on differences in the prompt wording. And who knows what false correlations it overfitted to in the training data. Like one wording might bias it towards picking medhealth data while another wording might make it more likely to use 4chan data. Not sure if these models are trained on general internet data, but even if it’s just trained on medical encyclopedias, wording might bias it towards or away from cancers, or how severe it estimates it to be.
I see it like programming randomly, until you get something that is accidentally right, then you rate it, and it now shows up every time. I think that’s how it roughly works. True about the prompt wording, that can be somewhat limited too, thanks to the army of
idiotsbeta testers that will make every kind of prompt.Having said that uh…it’s not much better than just straight up programming the thing yourself. It’s like, programming, but extra lazy, right?