Community Session Report: The Ethics of Teaching LLMs in Carpentries Workshops

In January, The Carpentries hosted the first of three pairs of AI and The Carpentries community discussion sessions. The topic of this first pair of sessions was The Ethics of Teaching Large Language Models (LLMs) in Carpentries Workshops. They featured an exploration of various ethical implications of the development and use of generative AI tools such as ChatGPT, Gemini, and GitHub Copilot, and a conversation about the scope for engaging with those implications while teaching Carpentries workshops. This post summarises the main points raised and the perspectives shared by community members who joined the sessions.

Are you interested in contributing to the ongoing discussion of these topics in The Carpentries community? The next pair of AI and The Carpentries sessions will take place next Tuesday, 25 February, at 12:00 UTC and 21:00 UTC. You can sign up to join these discussions on the community sessions Etherpad. There are also active discussion threads on the #general channel on Slack, and I encourage you to join in with either/both.

Ethical concerns about the use of LLMs

Early in the sessions, participants were asked whether they had ethical concerns about the use of LLMs in general. All participants who responded answered that they had either minor concerns (58%, n=33) or major concerns (42%), illustrating the need for this discussion and the importance of handling the topic with care.

Asked to elaborate on the concerns that they had, community members listed considerations that roughly fell into the following categories:

Environmental costs associated with training and running LLMs. Although the recently released DeepSeek R1 model is reported to have required significantly fewer resources to train than previous tools like ChatGPT o1, LLMs have typically required tremendous amounts of power to train, in addition to the raw materials needed to build computing components and large volumes of freshwater needed to cool them.
Possible copyright infringement and uncredited use of material to train the models. The developers of LLMs are widely accused of ignoring licensing terms and using copyrighted material to train the models without seeking the permission of its authors.
Exploitation of workers in model training. Models have been trained in part with the help of low-paid and unpaid workers who will not receive credit or reward for any success of the final product and may not have made a fully informed choice to contribute to the process.
Harmful use of the tools. LLMs often produce factually inaccurate output; they can be intentionally used to produce misleading content (misinformation and deepfaking, for example); they are being used to replace humans for tasks that still require human oversight, etc.
Reducing critical thinking and problem solving ability. The use of LLMs may breed reliance on these tools among learners, reducing their ability to find information, devise and implement solutions to problems, critically assess results – or prevent them from developing these skills in the first place.
Propagation and reinforcement of biases. Biased training data, i.e., data with an overrepresentation of some classes, drawn from an insufficiently diverse range of sources, etc., results in models that produce similarly biased output.
Data privacy and operational security concerns. Data shared with LLMs can be transferred to the system hosting the tool and may be incorporated into the training data, potentially exposing it to other users in the output of their own prompts.
Unequal access to models. The cost of paid LLM services and the disparity between the quality of results from paid and free-to-use models exacerbate existing inequalities by allowing only those with more financial security/living in more economically developed regions to access the best-performing tools.

Discussing these issues in a workshop

Participants shared many ideas and examples of how a workshop could cover ethical concerns. Some highlights included:

Facilitating a discussion among learners of the pros and cons of using LLMs for programming and other related tasks.
Demonstrating some examples of use and helping learners explore the output together. Highlight and discuss what the tool did well and what it got wrong.
Facilitating only a minimal discussion of the tools and an explanation of why they will not be used in the workshop. Potentially directing learners to a separate resource(s) where they can learn more if they want to.

As most Carpentries Instructors know, fitting a lesson’s content into the time available is often challenging. Several participants pointed out that any discussion of these issues (and of LLMs in general) would take time away from the other topics that need to be covered in a workshop. This would need to be kept in mind when preparing any new content to be added to lessons: content will need to be carefully planned to ensure efficient delivery of the most valuable information/concepts/skills to learners, perhaps with dedicated resources/lessons to cover the topic in more detail elsewhere.

Nevertheless, 71% (n=31) of participants who responded to a poll during the session considered it either quite important or very important to discuss some or all of the concerns listed above during a workshop. The challenge will be balancing this aspect with any practical exploration of the use of LLMs and keeping that all from encroaching too much on the other important content of the workshop.

With this in mind, participants were asked in the final part of the discussion to identify the aspect they felt was most important to address in a workshop. Many participants favoured an examination of the potential hazards of using the tools themselves: primarily, the risk of “hallucinations” (factual inaccuracies in the generated output) and the importance of having enough fundamental knowledge and skills to debug the output. Others encouraged a discussion of the environmental costs of training and how material used to train the models was sourced.

Next steps

The most immediate next step is another pair of community discussions, scheduled for next week, to identify Essential Knowledge and Common Misconceptions on this topic. What are the most important things for Instructors to teach learners about LLMs in the context of a Carpentries workshop? What are the misconceptions we encounter in others and/or have had ourselves about these tools, how they work, what they can be used for, etc? If that interests you, I hope you can join one of the sessions taking place next Tuesday, 25 February, at 12:00 UTC and 21:00 UTC! You can sign up to join these discussions on the community sessions Etherpad.

In addition, the Curriculum Team is drafting content to be added to existing Data Carpentry, Library Carpentry, and Software Carpentry lessons on this topic, informed by the community discussions that have taken place up to this point. You can learn more about those plans in a previous blog post.

Community Session Report: The Ethics of Teaching LLMs in Carpentries Workshops

Ethical concerns about the use of LLMs

Discussing these issues in a workshop

Next steps

Table of Contents