24/02/2023 La fabrication des contenus langagiers par l’Intelligence Artificielle

La fabrication des contenus langagiers par l’Intelligence Artificielle

Le langage tel qu’il s’est développé chez les espèces vivantes disposant d’un minimum de neurones associatifs permet aux individus qui en sont dotés de signaler leur présence aux autres et dans un deuxième temps de faire connaître afin de la faire partager la façon dont ils voient le monde et y réagissent.

Dans un rapport publié le 22 avril 2022 dans Nature, la chercheuse en Intelligence artificielle Shobita Parthasarathy de l’Université du Michigan USA nous avertit du fait que les outils logiciels dits LLM (Large Language Models) développés par des systèmes d’Intelligence Artificielles pour faciliter l’accès du public aux questions scientifiques aient des effets contraires à ceux recherchés.

D’ores et déjà, il est impossible de s’en passer dans les moteurs de recherche ou pour proposer des résumés de questions scientifiques un tant soit peu complexes. Nécessairement les auteurs humains produisant ces outils ont du mal à rester scientifiquement objectifs. Ils imposent, sans d’ailleurs souvent s’en rendre compte, des points de vus déjà datés ou reposant sur des a-priori philosophiques ou politiques. Le plus souvent par ailleurs ils servent les intérêts des Google, Facebook and Microsoft. Mais comment se passer d’eux, faute de compétences et même de temps suffisants pour leur porter la contradiction.

Le problème n’est pas nouveau. En France, dans la Sorbonne de l’ancien régime, pratiquement tenue par des jésuites, il était quasi-impossible de trouver des formations s’inspirant de matérialisme, fut-il timide.

Référence

https://www.nature.com/articles/d41586-022-01191-3
28 April 2022

Pour en savoir plus, lire ces extraits de l’article explicitant le point de vue de Shobita Parthasarathy

How might LLMs help or hinder science?

I had originally thought that LLMs could have democratizing and empowering impacts. When it comes to science, they could empower people to quickly pull insights out of information: by querying disease symptoms for example, or generating summaries of technical topics.

But the algorithmic summaries could make errors, include outdated information or remove nuance and uncertainty, without users appreciating this. If anyone can use LLMs to make complex research comprehensible, but they risk getting a simplified, idealized view of science that’s at odds with the messy reality, that could threaten professionalism and authority. It might also exacerbate problems of public trust in science. And people’s interactions with these tools will be very individualized, with each user getting their own generated information.

Isn’t the issue that LLMs might draw on outdated or unreliable research a huge problem?

Yes. But that doesn’t mean people won’t use LLMs. They’re enticing, and they will have a veneer of objectivity associated with their fluent output and their portrayal as exciting new technologies. The fact that they have limits — that they might be built on partial or historical data sets — might not be recognized by the average user.

It’s easy for scientists to assert that they are smart and realize that LLMs are useful but incomplete tools — for starting a literature review, say. Still, these kinds of tool could narrow their field of vision, and it might be hard to recognize when an LLM gets something wrong.

LLMs could be useful in digital humanities, for instance: to summarize what a historical text says about a particular topic. But these models’ processes are opaque, and they don’t provide sources alongside their outputs, so researchers will need to think carefully about how they’re going to use them. I’ve seen some proposed usages in sociology and been surprised by how credulous some scholars have been.

Who might create these models for science?

My guess is that large scientific publishers are going to be in the best position to develop science-specific LLMs (adapted from general models), able to crawl over the proprietary full text of their papers. They could also look to automate aspects of peer review, such as querying scientific texts to find out who should be consulted as a reviewer. LLMs might also be used to try to pick out particularly innovative results in manuscripts or patents, and perhaps even to help evaluate these results

How might LLMs help or hinder science?

I had originally thought that LLMs could have democratizing and empowering impacts. When it comes to science, they could empower people to quickly pull insights out of information: by querying disease symptoms for example, or generating summaries of technical topics.

But the algorithmic summaries could make errors, include outdated information or remove nuance and uncertainty, without users appreciating this. If anyone can use LLMs to make complex research comprehensible, but they risk getting a simplified, idealized view of science that’s at odds with the messy reality, that could threaten professionalism and authority. It might also exacerbate problems of public trust in science. And people’s interactions with these tools will be very individualized, with each user getting their own generated information.

Isn’t the issue that LLMs might draw on outdated or unreliable research a huge problem?

Yes. But that doesn’t mean people won’t use LLMs. They’re enticing, and they will have a veneer of objectivity associated with their fluent output and their portrayal as exciting new technologies. The fact that they have limits — that they might be built on partial or historical data sets — might not be recognized by the average user.

It’s easy for scientists to assert that they are smart and realize that LLMs are useful but incomplete tools — for starting a literature review, say. Still, these kinds of tool could narrow their field of vision, and it might be hard to recognize when an LLM gets something wrong.

LLMs could be useful in digital humanities, for instance: to summarize what a historical text says about a particular topic. But these models’ processes are opaque, and they don’t provide sources alongside their outputs, so researchers will need to think carefully about how they’re going to use them. I’ve seen some proposed usages in sociology and been surprised by how credulous some scholars have been.

Who might create these models for science?

My guess is that large scientific publishers are going to be in the best position to develop science-specific LLMs (adapted from general models), able to crawl over the proprietary full text of their papers. They could also look to automate aspects of peer review, such as querying scientific texts to find out who should be consulted as a reviewer. LLMs might also be used to try to pick out particularly innovative results in manuscripts or patents, and perhaps even to help evaluate these results

Votre commentaire

Entrez vos coordonnées ci-dessous ou cliquez sur une icône pour vous connecter:

Logo WordPress.com

Vous commentez à l’aide de votre compte WordPress.com. Déconnexion /  Changer )

Image Twitter

Vous commentez à l’aide de votre compte Twitter. Déconnexion /  Changer )

Photo Facebook

Vous commentez à l’aide de votre compte Facebook. Déconnexion /  Changer )

Connexion à %s

%d blogueurs aiment cette page :