Navigating the AI Fog of War
Business isn’t war, thank God else I would have been shot very dead very long ago, but I still learn from military writers. Coping with incomplete and contradictory intelligence, Clausewitz’s ‘Fog of War’; and the importance of planning ‘every battle is won or lost before it is fought’ as Sun Tzu puts it.
Trying to pull focus on the shadows I glimpse through the AI fog and deciding what they mean for Geolytix’s future battles is one of my main jobs. It has always been true that talking to real experts in real life over real coffee (or real wine) is a great way, maybe the best, to learn. Which brings me to the inaugural Geographic Data Service (GDS) partner conference that I went to last month. The GDS is the successor to the Consumer Data Research Council (CDRC) and features the same wonderful roster of the world’s best researchers, industry heavyweights, and policy makers. The fifteen talks on Friday were given by some of the best in our little corner of the spatial data science world, and they all touched on AI. We were spared vendor pitches, TEDx wannabes, and case study humble brags. All thriller no filler. Rather than review the talks I have tried to distil the day down to themes.
AI-enabled Data Creation & Enhancement
AI’s and agentic collections of them are scarily good at doing data creation and augmentation. From mining the Mapillary image stream to find Starbucks stores, to feature additions for existing definitive data sets (backfilling opening years for our UK supermarket data coming soon for example) to generating real time maps from trace data. AI out the box is just very good at doing this today. The best way of harnessing this appears to be an open data model, but that faces the same perennial challenges so eloquently summed up by Stewart Brand and so frequently misquoted as ‘data wants to be free’; the actual quote is ‘information is valuable and thus wants to be expensive, but the cost of distribution is constantly decreasing, leading to a desire for freeness’. Meta, Amazon, Microsoft are spending a lot of effort here. Expect to see Overture and related projects rapidly gain utility, traction and coverage. Businesses based solely on proprietary data are in for a heck of a challenge.
AI’s first foundational shift in our industry is happening now in data creation.
Text Based Disciplines Facing Disruption
Where an industry (law, strategy consulting), study area (psychology), or human skill (writing computer code) is fundamentally text based the power of today’s LLM’s is astonishingly superior to any individual human. Here, it does feel like we are in Wile E Coyote territory, off the cliff but he hasn’t realised he is about to fall yet. I hadn’t really joined the dots, so to speak, on the importance of whether a discipline is at its core text based or not. Numerical and spatial disciplines will come later, but where something is captured via writing the collections of LLM’s properly corralled are out the box able today, to pretty much solve any challenge thrown their way.
Text based subjects fall first to AI, in order of text purity.
The Web is Going Dark and so are the old Web Business Models
We, I’m sure, are all noticing this, but maybe haven’t figured out what happens next. Google’s omnibox and new ‘AI mode’, plus Microsoft Copilot and Apple Smart Search are changing the way people interact with the great card catalogue that enables humans to navigate the world wide web. In the good old days, we would search for ‘chocolate cake recipe’ receive a ranked list of pages with cake recipes and then visit one (generating an advertising stream) and read about cakes. Now we get to specify a bit more and get our recipe direct. This new process relies on training an LLM on vast amounts of content concerning, chocolate, cakes, recipes; and chocolate-cake, cake-recipe, chocolate-recipe and chocolate-cake-recipes. This LLM then generates a string of text direct to us bypassing the source that the card catalogue summarised. The giant merry-go-round of income where clicks led to money, and visits led to more clicks and more money is breaking down. The (God I hate this phrase) content creators are responding by banning the robots from reading and so training on their text. Over 60% of top tier news sources have strict no crawl and robot banning policies.
The open web model and associated click based monetization is over.
Data as Fuel & Model Collapse
All AI starts with data, and is itself data. People who deeply understand the data they are using to fuel their models build better AI. People in the data business have always known this, but it's worth reminding everyone. All data is created with a reason and unless you understand that reason you won’t understand the data’s weaknesses and strengths. If location data streams from an app on a phone when the phone acquires signal… yeah you are going to see hotspots outside of places where people pop up out of tunnels. If you are examining spatial variations in food prices it is sort of useful if you know that all the UK supermarkets operate national price files so any variation you are seeing is not spatial but is brand mix driven. Using the public web as your fuel carries many risks. Core code itself becomes data via the tokenisation of massive source code repositories. It is turtles, or rather data, all the way down. A foundational risk is model collapse. As content, both fundamental training data and the software code itself, becomes AI generated you are training on second generation content. As the content/training merry-go-round spins your model generations exponentiate and your models start to misbehave and eventually collapse.
It is data all the way down, and eating your own output is dangerous.
Foundational Spatial Model R&D
Gardener ‘Frame of Mind’ placed Linguistic, numerical and spatial/visual as the core of our standard ideas of intelligence. His other dimensions like music, body, and interpersonal intelligence are out of reach for AI for now but will have to be developed for a true AGI. There is a huge effort to bring maths and statistics within the current AI realm to sit alongside the linguistic skills of LLM’s. The visual is somewhat covered by the image generators but true spatial intelligence… not so much, for now. A strange set of circumstances are holding us up for now. Lack of critical mass of rare skills, inability to freely access the sweet, sweet gravy of rich comprehensive behavioural training data, regulatory and reputational risk. These all combine with the problem that spatial data or behaviour is not canonical in the same way that words or numbers are. We struggle with old problems like the Modifiable Areal Unit Problem, the curse of dimensionality and the ecological fallacy. These all make spatial AI plain harder, not impossible but hard, currently unrewarding and therefore rare.
There is, as far as I can tell, little foundational research into explicitly spatial AIs.
Human Judgement is Hard to Replicate
Content can be magicked out of the stray 0s and 1s of an AI cluster. But judging that content still seems a task humans are supremely good at for now. Maybe the AI don’t yet have the deep memories and bank of experiences humans do. Maybe they need bodies before they can really know how to advise us on well bodies. Perhaps there is some weird interaction between our genes, our wet human chemistry and the electro-chemical substrate of our minds that AI can’t yet fathom when it comes to music and beauty. But for now, an art expert can spot a real Monet hiding amongst 100 AI replicas. And by extension in the human areas of expertise in our business humans still triumph. Site visits remain essential for now.
In areas requiring deep human experience and multi-faceted tangential insights humans win.
Ethics and Guardrails are Intractable.
Words can kill people. Words are now written by robots. The ethical pitfalls are obvious. We don’t need grenade dropping drones to face the danger of robots killing people. It happens in the mundane chat of therapist bots, the poor decisions made by autonomous vehicles, the edge cases not seen by the algorithm designer. One such example shared with us was in the Californian wildfires. The in-car routing engines seek out roads with low traffic, roads being engulfed by flames had low traffic, people got routed into life threatening fires. Who was to blame? I don’t know but I do know we all have to ponder that question. In the legal arena, are we going to pass laws(!) that insist only humans can give advice or make judgement? When a distraught human reaches out to be heard, does it matter that the listener is a robot? Do we need to control powerful AI as we control nuclear weapons? Our industry is niche, and we sometimes pooh-pooh these worries, but they will bite us, and bite us hard, somewhere some time.
We need to talk about morals more.
(No AI was used in the research, composition or editing of this post… but I do use AI assistants daily and they shape my worldview now, I did use Google to fact check a few bits, and MS Word for a grammar/readability check so I guess AI was used).
Author: Blair Freebairn, CEO of GEOLYTIX
Title Image: Photo by Luke Jones / Unsplash