Assessing Political Bias in Language Models

Gaywallet (they/it) ( @Gaywallet@beehaw.org ) · 1 year ago

Assessing Political Bias in Language Models

Nick Cocklin ( @relaxdontdoit@masto.ai ) · 1 year ago

@Gaywallet I’m coming to think that expecting models to produce human-like values and underlying representations is a mistake, and we should recognize them as cognition tools which are entirely possible to misuse.

Why? LLMs get worse at tasks as you attempt to train them with RLHF - and those with the base models will use them without filtering for a significant intelligence-at-scale advantage. They’ll give the masses the moralized, literally dumber version.