Wednesday, March 20, 2024

Generative AI: Some Fruit are Poisoned

(WSJ image)
Data integrity, aka garbage-in, garbage-out, has been a problem with computing since its inception. With the rise of artificial intelligence and our projected future dependence on AI, the phenomenon has been given a frightening name: data poisoning. [bold added]
This is when malicious actors insert incorrect or misleading information into the data used to train an AI model with the aim of spreading misinformation, undermining the chatbot’s functionality or getting it to do something bad, such as share sensitive information.

While data poisoning is a concern with all types of machine-learning algorithms, some researchers say generative AI models could be particularly vulnerable because they must ingest vast amounts of text, imagery and other data from the public internet to gain the knowledge they need to create something on their own.

Researchers say this reliance on a vast number of data sources from the open web—rather than curated, locked-down data sets, which are harder for hackers to penetrate—could make it difficult to spot and eliminate poisoned data, only a small amount of which is needed to affect AI’s outputs.
In 2023 lawyers were caught citing fake cases in legal briefs produced by artificial intelligence (ChatGPT).

In recent months Google's Gemini image generator, which clearly had racial diversity as a bedrock principle, produced images that were the object of ridicule:



This latter example is not so much "data poisoning" as it is a matter of algorithmic over-ride of good data.

Artificial intelligence is still in its early stages, but already it has given users reasons to distrust its output. The lazy or the indifferent won't bother with performing checks, so I'd bet that we never get rid of the poison in the system.

No comments: