Generative AI and LLMs

TL; DR: My recommendation is that you do not use ChatGPT, NotebookLLM, or similar large language model (LLM) tools in this class for substantive tasks (annotating/summarizing readings, writing solutions), though they can be useful for programming tasks. If you do use these tools, please do so after independent effort, and clearly document how you’ve used them.

I am not opposed to the thoughtful use of LLMs. Contrary to the endless hype and marketing served by their creators and the media, these tools have strong limitations and over-reliance on them can greatly impede the discovery and learning processes¹. However, once you understand these limitations and have enough domain knowledge to look for red flags, they can be useful for certain (albeit limited) tasks, and learning how to engage with them responsibly is a legitimate professional skill. But you should be careful — many of these tools are provided at a loss to their companies and there may come a point in the near-to-medium future where they are not available to you without a massive cost.

¹ Plus, the scalable models are provided at a great monetary loss to their companies, which means they might be far more expensive (or even not available) in the future — do you want to be dependent on them if that occurs?

The framing and policies on this page are heavily influenced by Andrew Heiss and Ed Zitron.

Bullshit

The fundamental problem with LLMs is that they are bullshit generators (Hannigan et al., 2024; Hicks et al., 2024). Bullshit, in the philosophical sense, is text produced without care for the truth (Frankfurt, 2005). It is not a lie, which specifically exists in opposition of truth, or a mistake, which is subject to correction when exposed to divergence from the truth, but it is agnostic to the truth; it simply exists to make the author sound like an authority. Truth simply does not matter to a bullshitter.

Hannigan, T. R., McCarthy, I. P., & Spicer, A. (2024). Beware of botshit: How to manage the epistemic risks of generative chatbots. Bus. Horiz., 67, 471–486. https://doi.org/10.1016/j.bushor.2024.03.001

Hicks, M. T., Humphries, J., & Slater, J. (2024). ChatGPT is bullshit. Ethics Inf. Technol., 26, 1–10. https://doi.org/10.1007/s10676-024-09775-5

Frankfurt, H. G. (2005). On bullshit. Princeton, NJ: Princeton University Press.

² To paraphrase Cosma Shalizi, the appearance that LLMs “reason” or have insights is an ember of autoregression sprinkled with wishful mnemonics

³ Even worse, there is evidence that LLMs will just give a plausible-seeming argument instead of following logical steps or transparently communicating their reasoning, even when they are instructed to make their reasoning transparent.

⁴ As put by computational linguist Emily Bender, “If someone uses an LLM as a replacement for search, and the output they get is correct, this is just by chance. Furthermore, a system that is right 95% of the time is arguably more dangerous tthan [sic] one that is right 50% of the time. People will be more likely to trust the output, and likely less able to fact check the 5%.” It’s worth reading the whole post about how using LLM summaries as a substitute for search harms information literacy and disrupts sense-making processes.

LLMs literally exist only to produce bullshit. They use a predictive statistical model to guess what next word (or sequence of words) is likely; there is no reference to whether the underlying idea produced by this sequence of words is truthful or even coherent². This means that, once an LLM goes off track, they are subject to wild hallucinations where they may invent concepts or artifacts (books, articles, etc) that do not exist, merely because they seem plausible as a string of text³. This fundamentally affects the reliability of LLM results for information retrieval⁴. Additionally, given that LLMs are trained on publicly available text, an increasing amount of which is now generated by LLMs (so-called “AI slop”), the uncritical use of these results can just perpetuate the bullshit cycle.

As we are environmental scientists and engineers, there’s another problem, which is the impact on the environment of the computers needed to train and run LLMs. The more judicious we can be with these tools, the better we can manage their energy and water needs for when they’re actually useful.

Writing And Reading

Since LLMs only produce bullshit, they cannot help you with the process of writing or engaging with readings. Writing, whether prose or technical solutions, forces you to clarify your ideas and confront where they are vague or half-baked. This is a critical part of the educational experience! Producing plausible-looking but substantively-empty text cannot achieve this goal.

If you have written your own initial text but would like to clean it up (grammar, concision, etc), LLMs may be helpful since the substance is already present in your text⁵. This type of engagement with LLM output can be useful.

⁵ Just make sure to carefully edit it to ensure that your ideas are still present and clear and weren’t changed!

If you do use an LLM at some part in your writing process, you should cite it and make clear how you engaged with the output. This includes:

What prompt(s) did you use?
How did the LLM output influence your writing or framing?

This not only makes it clear to me whether you used the LLM responsibly (otherwise, this borders on plagiarism), but it actually helps you in case there is some bullshit that made it into your answer that is not actually a reflection of your understanding. I will not bother trying to guess if your writing is AI-generated⁶. Your work will be graded on its own merits, and since we’re looking for thoughtfulness and engagement in your written work, LLM-generated materials are likely to be penalized⁷ However, if your submission contains evidence of plagiarism or hallucinations (including references), at a minimum you will get a zero and, if the evidence is clear enough, this may be reasonably conclusive that you violated the academic integrity policy by using an LLM without referencing it.

⁶ While there are tools that purport to do this, they do not reliably work.

⁷ And if you did not disclose the role of LLMs in generating the work, this will not be a convincing reason for why you should not lose points.

Coding

Using LLMs for programming is a little different. There are many programming tasks (autocompletion of syntax, interpreting error messages) that are greatly facilitated by the use of LLM tools such as ChatGPT or GitHub CoPilot. Even people who already know what they’re doing tend to solve syntax and debugging problems by Googling or going to forums like Stack Overflow. LLMs are a shorcut for this approach⁸.

⁸ Though you should still be careful, LLM code hallucinations can result in security risks. Caveat emptor.

⁹ This is not going to be a problem in our class, but might be in your future.

If you’re trying to learn how to program (or program in a new language or using a new toolkit), the use of LLMs, even for debugging, can be greatly detrimental to this process. As they training data for LLMs are often didactic examples, the output code is often wildly inefficient⁹. There are also likely to be errors due to the generation mechanism: all the LLM can do is guess what the next line of code is, not reason about whether the overall logic of the code makes sense or if it will run. It’s hard to track down these errors if you played no role in the development of the code. This is particularly true if you’re dependent on an LLM to think through debugging, since you won’t know how to find where the LLM.

My suggestion for how to use LLMs for coding are:

Try to write your own code first. At the very least, think through the logic of what a solution would look like and write down any relevant equations. Then try to write down a version of that in code form. It’s okay to use syntax-checking and autocomplete tools here, but try to think about what the command is and look at some documentation (or ask on Ed Discussion) if it’s not clear to you.
If you run into errors, first see if they’re obvious. In Julia, for example, many errors are the result of not using broadcasting. Being able to spot these common error messages is useful and fast.
If you run into further errors, or cannot tell why a particular piece of code is not working, then feel free to use an LLM¹⁰. Just make sure you don’t just copy and paste output, but, if the code works, try to understand how it differed from your own so you can not make the same mistakes next time.

¹⁰ Appropriately documented, of course.