ethinically ambigaus

sexy_peach ( @sexy_peach@feddit.de ) · 11 months ago

ethinically ambigaus

520 ( @520@kbin.social ) · edit-2 11 months ago

That explanation makes no fucking sense and makes them look like they know fuck all about AI training.

The output keywords have nothing to do with the training data. If the model in use has fuck all BME training data, it will struggle to draw a BME regardless of what key words are used.

And any AI person training their algorithms on AI generated data is liable to get fired. That is a big no-no. Not only does it not provide any new information from the data, it also amplifies the mistakes made by the AI.

driving_crooner ( @driving_crooner@lemmy.eco.br ) · 11 months ago

They are not talking about the training process, to combat racial bias on the training process, they insert words on the prompt, like for example “racially ambiguous”. For some reason, this time the AI weighted the inserted promt too much that it made Homer from the Caribbean.

520 ( @520@kbin.social ) · edit-2 11 months ago

They are not talking about the training process

They literally say they do this “to combat the racial bias in its training data”

to combat racial bias on the training process, they insert words on the prompt, like for example “racially ambiguous”.

And like I said, this makes no fucking sense.

If your training processes, specifically your training data, has biases, inserting key words does not fix that issue. It literally does nothing to actually combat it. It might hide issues if the data model has sufficient training to do the job with the inserted key words, but that is not a fix, nor combating the issue. It is a cheap hack that does not address the underlying training issues.

Primarily0617 ( @Primarily0617@kbin.social ) · edit-2 11 months ago

but that is not a fix

congratulations you stumbled upon the reason this is a bad idea all by yourself

all it took was a bit of actually-reading-the-original-post

520 ( @520@kbin.social ) · 11 months ago

?

My position was always that this is a bad idea.

Primarily0617 ( @Primarily0617@kbin.social ) · edit-2 11 months ago

the point of the original post is that artificially fixing a bias in training data post-training is a bad idea because it ends up in weird scenarios like this one

your comment is saying that the original post is dumb and betrays a lack of knowledge because artificially fixing a bias in training data post-training would obviously only result in weird scenarios like this one

i don’t know what your aim is here

Gamma ( @GammaGames@beehaw.org ) · edit-2 11 months ago

You started your initial rant based on a misunderstanding of what was actually said. Stumbling into the correct answer != knowing what you’re reacting to

lars ( @lars@programming.dev ) · 11 months ago

Yes. The training data has a bias, and they are using a cheap hack (prompt manipulation) to try to patch it.

phx ( @phx@lemmy.ca ) · edit-2 11 months ago

Any training data almost certainly has biases. For awhile, if you asked for pictures of people eating waffles or fried chicken they’d very likely be black.

Most of the pictures I tried of kid-type characters were blue eyed.

Then people review the output and say "hey this might still racist, so they tweak things to “diversity” the output. This is likely the result of that, where they’ve “fixed” one “problem” and created another.

Behold, Homer in brownface. D’oh!

Primarily0617 ( @Primarily0617@kbin.social ) · 11 months ago

any AI person training their algorithms on AI generated data is liable to get fired

though this isn’t pertinent to the post in question, training AI (and by AI I presume you mean neural networks, since there’s a fairly important distinction) on AI-generated data is absolutely a part of machine learning.

some of the most famous neural networks out there are trained on data that they’ve generated themselves -> e.g., AlphaGo Zero

i_love_FFT ( @i_love_FFT@lemmy.ml ) · 11 months ago

They could try to compensate the imbalance by explicitly asking for the lesser represented classes in the data… It’s an idea, not quite bad but not quite good either because of the problems you mentioned.