Low-Background Lexicography

Almost 79 years ago, in the deserts of New Mexico, atoms were torn apart, releasing colossal amounts of energy, and very importantly not igniting the atmosphere causing nuclear armageddon. Can you tell I’ve watched Oppenheimer recently? Politics, ethics, and morality aside, the 16th of July 1945 is a watershed moment in time. However, we may be standing at another brink in human history, and one that may have a reach further than nuclear fission.

Fireball 25ms after the detonation of Trinity.

Following the first nuclear detonation in 1945, they continued largely unabated until the Partial Test Ban Treaty in 1963. As well as the immense loss of life in Hiroshima and Nagasaki, populations were displaced, and unknowing people were exposed to nuclear fallout. A lesser known effect of those 18 years is the mass dispersal of trace radioactive isotopes throughout the atmosphere. Although in the main these trace elements don’t reach harmful levels, they impact us in ways that aren’t immediately apparent.

It’s no secret that humans, as well as all other currently discovered life on Earth, are carbon-based lifeforms. However, there are a few different forms of carbon to choose from. Radiocarbon dating relies on measuring the amount of the rarer radioactive carbon-14 relative to boring old non-radioactive carbon-12 within a sample. The gist of it being that living things accumulate carbon during their lifetime, then stop on their death, and that carbon-14 begins the slow process of decaying to carbon-12. By comparing these ratios against known amounts of carbon-14 in the atmosphere at various times in Earth’s history, you can get a pretty good idea of how old something is, or at least how long ago they breathed their last. However, atmospheric nuclear testing generated tons of carbon-14, seeding it throughout the carbon reservoir, requiring radiocarbon scientists to rethink their calculations.

Now, in an apparent non-sequitur, onto steel, the natural upgrade from boring old iron. Depending on who you ask, the first major industrial process for producing steel was invented in the 19th Century by Henry Bessemer, or William Kelly, or had already been going on for several hundred years in East Asia, take your pick. One feature of steel-making, and many of the subsequent improvements, is the use of atmospheric air or oxygen to oxidise out impurities in the iron. Therefore any steel smelted since atmospheric nuclear weapons tests contains trace radioactive isotopes from these detonations, and therefore has higher background radioactivity.

For your everyday cutlery, girders, and novelty paper weights this has no bearing. However, for some specialist applications, such as scientific instruments, even this miniscule amount of activity can cause problems. Modern steel is effectively contaminated beyond usefulness for these purposes. This has led to a market for steel without this inherent activity, known as low-background steel, with one of the major sources being shipwrecks, thanks to the efforts of deep sea divers. Such is the demand that some have turned to looting war graves for this valuable material¹.

But, why, I hear you cry, does this have any bearing on the current day, while the spectre of nuclear way hasn’t vanished, it’s not as pressing on the collective psyche as during the heights of the cold war. Artificial intelligence, I reply, making the hack observation that AI may again represent a similar Pandora’s box moment. Bear with me.

Large language models have sprung onto the scene in the last couple of years, and have similarly glued themselves onto the zeitgeist. All of the major technology companies have at least had some attempt at including LLMs, and non-tech companies are increasingly implementing them into their offerings.

One the key features of large language models is the large bit. They’re fed frankly ridiculous amounts of human literature. Many of these models are trained predominantly with the Common Crawl² corpus, a body of text that comes from scraping the web and stretches into the terabytes, equivalent to hundreds of billions of words. To some extent the models are only as good as the content that is fed to them.

Prior to November 2022 we could be reasonably confident that a good portion of these words were actually written by humans, not all, but maybe a hesitant most. However, we are seeing these assumptions strained in content released since then. We know LLMs are being used to write news articles, fiction, and store listings.

Therefore, we can say that up until fairly recently, LLMs probably were trained on more human-derived text than LLM-derived text, but we don’t entirely know the consequences of what happens when this is no longer the case. Some companies have already had embarrassing moments where their chatbot/LLM/new robot overlord starts to have an identity crisis and quotes their competition³.

One possible outcome is that LLMs trained on partially LLM-generated content just become a bit crap. However, as they become more ubiquitous, and regular old humans become more dependent on their AI assistants, could they become a driving factor in the changing landscape of language? Hallucinating AI that think halloumi is a type of small harmless rodent could be reinforced both by other models trained on the initial model and by the humans who’ve believed them. Overtime we could see these collective LLMisms coalesce, and the formation of an international interweb creole.

Another reaction could be to increase the hunt for genuine human-written words, more fuel for the training furnaces. In this future, all writing after November 2022 could have the potential to be contaminated. A future in which deep-internet divers are handsomely paid for their efforts in seeking out another isolated shipwreck of human writing. See, I told you the steel thing would be relevant!

Now, I admit, there’s more than a hint of alarmism here. In actuality, test ban treaties were implemented, clever scientists accounted for fallout in radiocarbon dating, and steel can be made with purified oxygen instead. Given their immense potential, I doubt it’s possible or even desirable to collectively put this genie back in the bottle. However, I think we owe it to ourselves and our heirs to be mindful of how fundamentally the human condition might change. The aspiration of technology used to be to remove the need for manual labour, freeing us for leisure and creative pursuits. DALL-E and ChatGPT have set their sights on art and literature instead.

George's Marvellous Medicine

Trainee anaesthetist, reader, and tinkerer

Salting the Earth with Artificial Intelligence