26 Feb 2026

Order from Chaos

The current theme in software seems to be levering “artificial intelligence” into everything – even Microsoft Notepad, previously the ultimate in lightweight text editors, has CoPilot integration, and that has not been without issue – see the recent CVE notification. 

Ignoring for now the ethics behind the training data sets of LLMs and other content generators, the environmental impact, and the impact to the consumer of general computing gear (have you seen the cost of RAM recently?  Or video cards?  And by various accounts, costs of hard drives are going similarly skyward), causing frustration to computer enthusiasts and professional buyers… 

Underlying all AI is data.  If it’s your environment, then it’s your data underpinning your LLMs, your ML, your carefully-crafted models, etc.  The question, then, is – how good is that data?  And what’s the effect of the less-than-good data on the rest?  If LLMs of more or less any size can be poisoned with as little as 250 pages of data, what reliance do you put on LLMs trained on your data? 

And this triggers various quotes in my mind, including: 

 


On two occasions I have been asked, – “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?”  In one case a member of the Upper, and in the other a member of the Lower, House put this question.  I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. 

Charles Babbage, “Passages from the Life of a Philosopher” 


 

A more modern coinage of a similar sentiment: 

 


GIGO – Garbage In, Garbage Out. 

Unknown origin 


 

It has long been known that tiny changes to input values can have a massive effect on output values – see, for example, any number of fractal images, or Edward Lorenz’s “Butterfly effect”. 

 


Shall I make spirits fetch me what I please? 

Resolve me of all ambiguities 

Christopher Marlow, Doctor Faustus 


 

I’m not saying that Marlow had an insight into our times from 400 years ago, but the phrase “Sweet Analytics” does appear nearby in the same work.  

And then, from the world of Radio 4’s sitcom “The Department”, aka the minds of (mainly) Andy Zaltzmann, John Oliver, and Chris Addison: 

 


Statistics are like a ventriloquist’s dummy.  Shove your hand far enough up them, you can get them to say whatever you want 

“The Department”, BBC Radio 4 


 

(some people have the next part of this quote as “but only children will listen” – I have not yet found this in the recordings) 

Oi, mind, stop drifting. 

So, philosophically, what? 

Get your data in order.  Clean it, tidy it, make sure it’s fit for human (or machine) consumption.  Otherwise, you’re building your castles on quicksand.  (And now my mind is thinking about that Monty Python scene about the castle in the swamp…) 

Tools are available to help with data classification, and may also assist with identifying data that is out-of-bounds, or sensitive data being stored in places which are not named as such.  Try, for example, Redgate’s SQL Data Catalog, and see what classification issues are lurking in the depths of your data. 

If you need our help with this, please get in touch with our consultancy team via info@coeo.com