March 19, 2024 - 15 minutes read #developer , #Generative AI

What are GPTs good for?

This post was originally written for 101 Ways

Introduction

ChatGPT, OpenAI, Dall-e. Midjourney etc. have been in the news a lot since the beginning of 2023. It would be impossible to miss them if you have any interest in data, technology, knowledge work, or automation. But what are they and what can we use them for in business?

What are GPTs?

A GPT (Generative Pre-trained Transformer) is effectively a sequence predictor - if you train a GPT on a set of data and then give it a new example it will try to continue the example by generating the data it predicts would occur next. A system like this when trained on a huge dataset of natural language (e.g. a complete crawl of all public internet documents) is called a Large Language Model (LLM).

In an LLM the text corpus used to train the system is broken up into tokens where a token is roughly a word, but also word parts and words with punctuation. Each token is represented by a set of numbers - a “vector”. Each token means what other tokens like it mean - i.e. where the vector contains ‘similar’ numbers - there is no pre-defined knowledge about verbs, noun-phrases etc. or any kind of definitions of terms.

In the 2-dimensional layout above each concept would be represented by a vector of 2 numbers, and you could make a good guess at what the 2 axes represent. But where would you place “a table”, “the colour red”, “cold”? If you have enough dimensions then it’s possible to represent all concepts well in an efficient way. ChatGPT for example builds an embedding - a vector database - with 1,536 dimensions.

Along with this embedding, the trained system produces an “Attention matrix” that describes how important tokens are to each other in a sequence and how they relate to each other¹.

Together these form a “Generative Pre-trained Transformer” (GPT) that, when given an input sequence of tokens, is able to make a probabilistic guess at what the next token should be, and the next, and so on - this is the basis of the various “Chat” GPT systems and can be very powerful, allowing users to build systems that can: translate texts between languages, answer questions (based on tokens seen in the training corpus), compose seemingly new poems, etc.

Aside: “GPT” is a general architecture spawned by the Attention is all you need paper. They were used for LLMs initially but the company OpenAI slightly skewed the landscape by calling their product “ChatGPT” - so there’s now a common misconception that GPT and LLM are synonymous, whereas really the GPT architecture is applicable to many domains as we’ll see below.

But are GPTs “Intelligent”, what can we use them for, and what are their limitations?

What is Intelligence?

Thinking fast/slow - System-1 vs System-2

Daniel Kahneman popularised an understanding of intelligence as a combination of two types of behaviour:

System 1 operates automatically and quickly, with little or or no effort and no sense of voluntary control.

System 2 allocates attention to the effortful mental activities that demand it, including complex computations. The operations of System 2 are often associated with the subjective experience of agency, choice and concentration.

When we think of ourselves we identify with System 2, the conscious, reasoning self that has beliefs, makes choices, and decides what to think about and what to do. … The automatic operations of System 1 generate surprisingly complex patterns of ideas, but only the slower System 2 can construct thoughts in an orderly series of steps … circumstances in which System 2 takes over, overruling the freewheeling impulses and associations of System 1. Thinking, Fast and Slow: Daniel Kahneman

GPT-based tools - that is, generative sequence predictors - are wholly an automation of System-1 thinking.

Think of the following sequence of exchanges:

Person: What do you get if you multiply six by nine?

Bot: Forty two

Person: I don’t think that’s right

Bot: I’m sorry, let me correct that

It may seem that the Chatbot has been induced into performing some kind of reflection on its answer but, at the time of writing, that is not the case. The first 3 exchanges were simply collected into a prompt and presented back to the bot verbatim for further sequence prediction: in the current state of chatbot technology the forth exchange, the last response from the bot, is still just a System-1 creation. Even a Mixture-of-Experts model is not doing any self awareness or self reflection.

These programs differ significantly from the human mind in their cognitive evolution, limitations, and inability to distinguish between the possible and impossible. They focus on description and prediction, rather than causal explanation, which is the mark of true intelligence, and lack moral thinking and creative criticism.

Spookily enough, the above text was auto-generated by an AI summarising a much longer essay, which was a response to the original “Noam Chomsky: The False Promise of ChatGPT” (unfortunately behind a paywall: the Chomsky essay opines that ChatGPT and the like are nothing more than high-tech plagiarism machines).

This lack of self-reflection - lack of System-2 thinking - is a serious problem, as I’ll discuss in the section on “Hallucinations”.

Pattern Recognition and Hallucinations

GPT systems generate text by making a weighted-random selection from the set of the most likely next token, and then the next, and so on. But “most” likely doesn’t necessarily imply how likely that token would be, and the GPT has no way of checking (an arbitrary value limit is likely to be different as you move around the vector space). So, under some circumstances a GPT can produce complete fiction - known as a “hallucination”.

(Eugene Cernan was the 11th person on the moon, the rest is roughly correct).

So when do GPTs respond with reality and when do they hallucinate? Here’s the thing:

A GPT is actually hallucinating ALL the time!

It’s just that sometimes the hallucinations correspond closely enough with the human’s perception of reality that the human reader thinks it’s true and intelligent…

Human minds are pattern-recognition engines generously attempting to make sense of everyday reality. The impact of this includes the effect of Pareidolia where you see faces, or other objects, in random patterns.

Is this The Beatles? Che Guevara? Someone else?
Is the toast hallucinating or is it just being toast?

So GPT chatbot systems do seem highly intelligent to the casual observer - and even pass some forms of the Turing test - but, like chess playing machines, now we know how they work it’s clear they’re not a general form of intelligence. It’s a fluke of having a large enough training dataset that the “probable next token” calculations are matching our expectations closely.

In the chat hallucination example above it may be that when counting the people who have ever been on the moon the number 11 is the biggest number that ever occurs. So maybe the GPT has formed some kind of counting model for moon visitors where 97th is also regarded as the biggest number. But is that true or am I just anthropomorphising with my guess? Without studying the weights in the model - a notoriously difficult task - it’s impossible to know. So, likewise, any claims about GPTs having full AGI (Artificial General Intelligence) should probably be met with scepticism.

The trick with GPT-based systems is to keep their use strictly within the domain they’re trained for, in this way there’s a high probability their hallucinations will actually match reality. The question of how to guarantee this is currently undecided. In the next section we’ll cover some use cases that may be surprising but are constrained well enough to be confident of a good accuracy.

Surprising GPT/LLM Applications

Weather Prediction

A number of mechanical weather prediction systems until recently have been based on Computational Fluid Dynamics (CFD) simulations - where the world is broken up into cells and the Newtonian mechanics of gas and energy flows are modelled mathematically. This is based on a physical understanding of nature and is reasonably effective though hugely costly in computation effort - for example the UK weather forecasting supercomputer had 460,000 cores and 2 Petabytes of memory: The Cray XC40 supercomputing system - Met Office A weather forecast 1 week ahead is actually pretty remarkable given the complexity of the problem.

But it’s possible that a transformer style deep-learning system, with no pre-knowledge of physics, can simply use a vast amount of historically measured weather data to predict a sequence of future weather events given a recent set of conditions. In this model the surface of the earth is broken up into a grid of columns, each column being a vector of numbers that represent physical values like temperature at certain heights, wind flow, moisture, rain at the surface etc. and some relative location information. There is data for this, measured and imputed, at 1 hour intervals for 40 years.

This particular research effort has been going for around a year, but has already shown predictions at least 70% as accurate as the CFD technique but 10,000x cheaper to run.

Solving geometry problems

Geometry problems can sometimes be solved mechanically simply by exhaustively searching through all the inductive steps from a given starting set of problem descriptions. But many cannot be solved this way. Often it requires some form of “intuitive” step, that experienced human solvers are good at, to add a new descriptive item to the problem (e.g. “let’s draw a line from this point to the opposite side of the triangle to form two right-angle triangles”) which makes the problem somehow easier to solve.

The AlphaGeometry system is able to do that “intuitive” step. By being trained on a large set of geometry solutions, it’s able to read a previously unseen problem and make suggestions for constructive additions that make the task of the mechanical solver possible.

Extracting Knowledge Graphs

You may have a large knowledge base written in vague and unstructured text - e.g. wikipedia - but want to represent it in a way that is deterministic that can be searched and reasoned about mechanically - e.g. Resource Description Framework (RDF), Semantic Web, Neo4j graph database, etc.

This is effectively a translation problem for an LLM. Similar to translating English into French, we can train an LLM with e.g. a large corpus of Cypher code so it can form a coding language model along with the natural language, and also some example translations of text to code so it can form a mapping prediction model. Note that the major LLMs like ChatGPT are trained on all publicly available text, including github, so have probably already been exposed to these graph languages.

How to use GPTs practically

The moving parts

There are 4 major categories of subsystem to deal with:

Large Language Model - built out of a massive corpus of text, images, videos or other data source, e.g. a full scan of the entire Internet. Billions of documents and data points, and costs millions of dollars to produce. There are only a few of these (from OpenAI, Meta, etc.)
Parameter Efficient Fine Tuning (PEFT)² - using a largish corpus (perhaps 100s-1,000s of examples) to produce a tweaking layer on top of an LLM. These cost only a few thousand or hundreds of dollars to produce and, therefore, there are an unmanageable amount of options to choose from (e.g. around 400,000 models stored in open-weight repositories)
Vector (Embedding) Database - can be built out of the corpus of documents from within your business, very cheap to produce.
Prompt Engineering - the details of the prompt sent to the GPT system: what to include in the “context window”

All integrations with an AI system have to navigate through at least these 4 parts.

Performing a full training of a base model is outside the reach of all but the largest business or organisation - only companies the size of Google, Meta, X/Twitter, OpenAI, Apple etc. can even contemplate this.

Usually a business needs to only consider the last 3 (i.e. basing the system on a pre-trained LLM base model) and, more commonly, even organising the training to fine-tune a base model is unnecessary - leaving the business to either produce an embedding database from a corpus of documents if required, or just engineering the prompts required to use an available model.

Selection of the most suitable base-model or fine-tuned model is very important though, this is just a small sample of the options available:

General
- ChatGPT
- Claude
- Llama
- HuggingChat
- Mixtral
Instruction
- Aligning language models to follow instructions
- Training language models to follow instructions with human feedback
Time series
- TimeGPT
Coding
Etc.
- The best AI chatbots in 2024

There are a number of commercial integrations, that have fine tuned a base model, that provide a conversational “Co-pilot” of some kind. Here are a few examples:

Hallucinations are a problem with many of these tools though. They can act as very well-read interns, doing lots of tedious work very quickly, but it’s important to have a competent human in the loop to edit their output. Certainly these systems shouldn’t be in a position to make promises on behalf of a company: e.g. via a public-facing “live chat” automated service agent, or by sending automated follow-up emails.

Prompt engineering

Prompt engineering involves crafting precise and effective instructions or prompts to guide AI systems in generating desired outputs. It’s like giving specific instructions to a highly intelligent assistant, teaching it how to respond in a way that aligns with our intentions.

The exploratory nature of Prompt Engineering makes it very similar to Data Science, but the tools and skills needed are different as one drives the tools using normal natural language.

Chat with your Docs & RAG

It’s possible to train an LLM/GPT on the contents of an internal knowledge base of documents, either via PEFT to produce an enhanced LLM or by creating an “Embedding” vector database.

The resulting models can then be queried conversationally. But the trouble is that when querying the resulting LLM model you have no idea if the GPT is telling the truth or hallucinating a completely fictitious answer.

So another option available is known as Retrieval-Augmented Generation. In the RAG architecture the response is generated in two phases:

In the first the user’s query is read by an LLM and a search query is generated against your document store to find any documents that may be related to the query
In the second phase a new query is constructed by first reading in all the documents found in phase 1 and then instructing the LLM to produce an answer based on that content (though it will still use the base model as a language foundation, so hallucination is still possible).

This technique requires the least amount of pre-training of models or embeddings, but does require an advanced technique for prompt engineering. Another advantage of RAG is that if no documents are found in the search phase then the LLM can be instructed to respond with an “I don’t know” style answer rather than making something up from the contents of the base model.

Mechanised queries

An exciting area of AI co-pilot integrations is to provide a conversational interface to the task of building mechanical deterministic searches into a data store.

If you have a knowledge management system of any kind that you want to search or visualise in some way, there is likely a query language, report builder, dashboard builder, or some other tool that lets you do that. But often these tools, by necessity, are expressive and complex and require engineering expertise to use correctly.

But LLM GPT systems are able to construct correct code, queries, or configurations based on a conversational input from a user.

So a powerful integration is to use the LLM GPT to generate a query or report generation config and then use that to run queries on the knowledge base. In this way you at least know that the results of the query are deterministic and factually correct: the data in the knowledge base has not been summarised or re-interpreted by a model in any way, the query is being run on source data and the results returned verbatim.

It may be that the generated query isn’t quite what the user meant, but it’s possible to have a sequence of conversational exchanges with a GPT that refine the query until you do see what you need. The impact is like having an expert programmer as a personal assistant - one that’s happy to help in any way as you explore ideas.

Examples of this include:

Summary

Generative AI is an extremely new field in software development. There is a lot of hype around levels of intelligence of these systems, but equally their shortcomings may be overstated. The “Plateau of productivity” has not been reached yet and there are many exciting developments to be had.

Companies are pouring resources into research, snapping up top talent, and striking strategic partnerships to gain an edge. In this race, the winners aren’t just the ones with the flashiest technology; they’re the ones who can translate AI prowess into real-world impact. The AI race in PLG

If you have any comments or feedback about this article, please use the Linkedin thread

March 7, 2024 - 24 minutes read #developer , #business organisation , #pontification

The role of enterprise architect and principal engineer

This post was originally written for 101 Ways

How to read this doc

We’re all impatient and time-poor, and this document has ended up with a lot of words, so each section has a “Summary” block - you can read each to get the full content of the preceding chapter, and then only dive into a chapter if you need the explanation of the conclusions…

The balance between vertical and horizontal teams and the Role of the Principal Engineer in delivering value

Before we talk about roles of people, let’s look at how software systems evolve over time in large organisations.

Once a software service of any kind has been deployed within an organisation, either to be used internally or as part of company’s public product, we need to know who will support it - either for adding new features or, particularly in the case of a bug or failure being noticed, who do we call up and prioritise to get things fixed in a hurry?

The idea of “it’s everyone’s responsibility” doesn’t work because if everyone is responsible then no-one is: if you have a set of teams each with their own OKRs or other measures to meet, then which product manager is going to volunteer to fail at their team’s goals in order to fix a problem that another team should do? And even in a world of perfect altruism there is the practical consideration that fixing systems takes deep knowledge, so a request to fix a problem will gravitate towards the team that’s done it before and who will have a head start on understanding the fault.

So this drift towards each service being owned by a single team is completely natural and shouldn’t be resisted - indeed it’s recognised as best practice in some descriptions of microservice architecture (and there’s frustration when a service doesn’t have a clear owner), e.g.:

“With small, long lived teams in place, we can begin to improve the ownership of software. Team ownership helps to provide the vital ‘continuity of care’ that modern systems need in order to retain their operability and stay fit for purpose” - Team Topologies: Matthew Skelton, Manuel Pais

Services without an owning team are a tech debt and potential risk.

Note that these teams are often miscategorised as “horizontal” teams. In theory a team should own a set of services that form a coherent sub-domain within the business. But in practice the need to have deep understanding of software ecosystems means that services get parcelled out to teams based at least as much on the implementation tech choices as the domain. Which is why teams end up labelled “back end”, “front end”, “api”, “data” etc. Teams can be organised in a “Reverse Conway” to include tech choices but it’s quite difficult.

But the notion of team autonomy - each team working to their own roadmap - has its downsides. If you want to deliver a new large piece of functionality within a business you tend to touch more than one service and therefore more than one team will be likely to be involved. Unfortunately what happens is the communication between the teams, and the siloisation of their roadmaps, gets in the way of progress - you end up with the product managers of each team trying to align the request against their own roadmaps and multiple other demands which are seen as being “external” to their team.

In the face of this the senior leadership get frustrated with the seeming inefficiency of the engineering capability and decide, for a new product or functionality, to create a vertical team to deliver it. Note this may also be true for implementing any standards, conventions or governance that are supposed to be applied organisation-wide. This new team is often given a groovy, dynamic name like “Task Force”, “Hit Squad” or “Tiger Team” and is meant to cut across all the communication and planning barriers. The team is created by taking experienced developers from every service involved and they are highly motivated to deliver something new instead of the mundane day-to-day work.

The thing is, this works extremely well for the first feature delivered!

So the approach is seen as a success and then, upon completion, the team is disbanded and, maybe, some members are reconstituted into a new “Hit Squad” assembled for the next feature - or, worse, two or more are created at the same time.

After a year or so of this, certainly within two years, the system landscape of the company is a complete mess - it’s not that vertical teams are bad, it’s that they get disbanded! After a while no team owns anything, no experience is retained in the company, no clear responsibility for maintenance, there is “drive by coding”, the engineers describe everything as “being held together by chewing gum” or some other such idiom and a “rationalisation project” is started to migrate the organisation back to STOSA. The pendulum swings…

So if neither vertical nor horizontal teams are the way to deliver value in large-scale projects what can we do about this? How can we keep the best points of service ownership with capability-organised teams, plus the scale of vision and delivery effectiveness of an end-to-end team?

What works well is to create a virtual team using just the senior staff you need - for example only an experienced principal engineer and product manager. They can then act as shepherds or ambassadors for the project. Between them they should be able to articulate the new product or business function they wish to create and see how the different parts of the current landscape should fit together, be amended, extended or new services created in order to fill the gaps needed by the function.

The characteristics of a good Principal Engineer include:¹

Strong multitasking ability - balancing a couple of “hot” projects with some growing future projects.
Strong analytical skills and outcome oriented.
Strong communication skills, the ability to explain complex technical issues in a way the listener can understand - a C-suite listener will be hearing very different things to a developer.
Excellent organisation and leadership skills, in particular the ability to network and “lead upwards”.
Experts in multiple technical domains through proven experience in building complex systems.

In particular the principal engineer will bring their experience of the different architectural options that are available and know how to choose the most appropriate option given the balance of constraints that the current landscape is in, either technically or socially given the knowledge of the engineering teams available. In some cases they may determine that a whole new subdomain is required and a new team should be formed to implement and support it long term - but this is not the same as a “hit squad” that gets disbanded!

The pair will then be able to negotiate with and empower the tech leads of each team to contribute to the larger project, to make sure their roadmaps are aligned, that APIs will have the required capabilities, that the solutions in general are supportable within the company and align with agreed best practices, check up on progress and hold individual teams accountable for contributing to the greater whole. With careful alignment, and a sense of urgency, the new feature will emerge out of the parts created by each involved team.

Summary

Engineering teams always end up service aligned, it’s natural and inevitable to support long term maintenance.
“Hit squads”, created to cut through the silos, end up producing a mess by the 3rd project as services end up with no clear owner - particularly if the hit squad has indulged in drive-by coding and even more so when the squad is disbanded.
For a large scale project, in large organisations, a combination of product manager and principal engineer empowered to sit across teams will make sure the various team roadmaps and outputs are in agreement with the business strategy and standards, and conventions are being followed, and therefore the teams are aligned, energised and valuable.

The Role of the Enterprise Architect

Given what we’ve said in the previous section, it seems that a healthy community of Principal Engineers should fill the function of Enterprise Architecture, ensuring a coherent engineering practice and delivery. So is there a specific role for Enterprise Architects as such?

We wish we had tools that would allow engineers to document their systems in such a way that any team can use it without help, or the same team can use it at some point in the future. But these tools can be difficult to use, still need alignment across the company, and the act of documenting can be uninspiring to the sort of person who enjoys creating and building.

One solution is the architecture librarian. They can be empowered to research the whole systems landscape and form a clear picture of how the systems fit together and depend on each other. Or they may have been at the company for so long that they naturally have all the retained knowledge.

There are many tools that can help the architecture function of a company document the whole system landscape:

Plain old visio / draw.io / Miro diagrams
Diagrams as code e.g. c4model
Documentation as code e.g. openapis, asyncapi
“Enterprise” documentation tools

But the value of the documentation seems to decrease the more distant the documenters are from the people writing the systems in the first place. So a balance needs to be reached whereby tech team leads, or Principal Engineers, are assisted in producing, maintaining and extending this documentation themselves. Architects should not be seen as senior to the Principal Engineers - or vice versa - they each perform a different function in mutual support with each other as peers.

The customers/consumers of the software architecture function will be the principal engineers and tech leads. And for every team in an engineering department the goal is to automate away work as much as possible - to enable self-service by the consumers - not to make oneself redundant but: 1. to enable greater scale, 2. to free oneself up for more interesting things.

Self-service at the level of abstraction of a Principal Engineer means being able to make systems design choices without having to wait for approval. This implies a well agreed set of standards - the right way is the easy way.

So another role for the Enterprise Architect function is enabling processes for reaching agreements on architectural solutions. Note, it shouldn’t be the role of EA to impose any particular solutions - no matter how experienced the EAs are that way leads to “ivory tower” style behaviour and diminishing respect.

Summary

Self-service exists for Principal Engineers to achieve scale - at this level of abstraction that involves systems design choices and architecture.
This abstraction level can often be agreed, in advance, between Principal Engineers and Architects using tools like Domain Driven Design combined with some agreed standards and conventions.
The correct approach is trust but verify - all senior members of the company are highly experienced: let them innovate when necessary.

The balance of tools and the paved road

One part of the art of leadership is to hire people smarter than yourself and then get out of their way. And this applies to the strategy of Enterprise Architecture too. But that doesn’t mean there should be a free-for-all when it comes to the choices of tools, languages and techniques.

In general there is a balance to be found between the chaos of everyone doing their own thing, and the inadvertent shackles of central planning.

Engineers want “The best tool for the job” - this is chaos, every job ends up with its own tool and there’s minimal sharing of knowledge across the company. CV driven development is also a significant risk.
Enterprise Architecture has a tendency to Command and Control style management and wants to declare “This is the tool to use”. This doesn’t scale to empowering creative solutions.
The Goldilocks solution is “The best set of tools for all the jobs” - the “paved road”, with some scope for bespoke solutions if there’s an articulated need. As with all defined processes there must be a way to challenge the chosen tools created at the same time that the tools are defined.

Teams are empowered to use whatever tech they see fit, but it is made clear to all the stakeholders (i.e. product managers) that there are consequences of being 1st and using stuff away from the paved road and the product teams need to be accountable for such choices. For example, if the platform support team is not familiar with a technology choice then the team that’s chosen to use it may have to provide on-call support (or the Product Manager has to acknowledge there may be downtime out of office hours). However they are also empowered to engage their peers and see if multiple teams want that feature/capability and then either make a request against the platform team’s roadmap, or use an “internal open-source” model to enhance the platform for all.

The platform team makes the building blocks of the technical paved road - make the right way the easy way.

“The same principles of good design and functional architecture apply in the world of choices as well. Our primary mantra is a simple one: if you want to encourage some action or activity, Make it Easy” - Nudge: Richard Thaler, Cass Sunstein

The customers of these building blocks are the EAs and Principal Engineers. Technical teams code the products with the PEs and EAs ensuring teams don’t fall into known traps, which involves knowledge and experience of system design and understanding the tools.

The role of Enterprise Architecture is to guide and enforce the method for coming up with standards - the EA may have an influence on the resulting standards, but shouldn’t be coming up with them in isolation.

Any standards determined by any Enterprise Architecture function need to be empowering in some way. I have experience of an EA team writing out a set of standards like:

“All software should be designed to be flexible and allow change”
“All services should minimise the cost of infrastructure”

and so on… The EA team spent weeks writing them out into the internal wiki. The thing is they were all platitudes: nothing untrue about them in any way, but none of them useful. That EA team lost the respect of the Principal Engineers and any influence they may have had (NB. influence is about being likeable, connected, and credible)

Summary

If you apply control you freeze progress.
Make the right way the easy way - the paved road: show the general direction and let smart people charge forward.
Platitudes are no help, standards should be informative and enabling.

Internal Open Source

Internal open source (“Inner source”) is often discussed but rarely practised. There’s no magic to it but people who haven’t run or significantly contributed to an open source project always underestimate how much management it requires. Linus Torvalds invented an entirely new source control solution just to help him manage the Linux Kernel project (fortunately, because it is open source, Git is available to everyone).

To effectively manage inner source every project needs several practical things including:

Good documentation that details the purpose of the project
Installation instructions and usage instructions - these will be required anyway by any new member joining the team, so why not make them readable by other teams too.
Contribution instructions for people not in the team - e.g. communication channels used, how to find the product manager for that service, minimum requirements for raising issues or pull requests.
The documentation, or pointers to the documentation, should live with the code where it’s obvious - often in the README file.

This is already more documentation than most teams are willing to do for their projects. However by agreeing to some organisation-wide conventions creating these guides can become a template exercise. Particularly if there’s a high degree of similarity in the tech stacks.

“Every part of the software system needs to be owned by exactly one team. This means there should be no shared ownership of components, libraries, or code. Teams may use shared services at runtime, but every running service, application, or subsystem is owned by only one team. Outside teams may submit pull requests or suggestions for change to the owning team, but they cannot make the changes themselves. The owning team may even trust another so much that they grant them access to the code for a period of time, but only the original team retains ownership.

Note that team ownership of code should not be a territorial thing. The team takes responsibility for the code and cares for it, but individual team members should not feel like the code is theirs to the exclusion of others. Instead, teams should view themselves as stewards or caretakers as opposed to private owners. Think of code as gardening, not policing.” - Team Topologies: Matthew Skelton, Manuel Pais

In a practical sense, when using git to manage code, the phrase “grant access to the code” means having one’s ID added to the CODEOWNERS file so one can approve pull requests. This implies a lot of trust by the owning team and is a double-edged sword: it is both flattering but implies an obligation to care.

As an aside, the term “inner source” seems to have been coined by Tim O’Reilly in December 2000. That post also details some virtues of “open source” development style which include: robust, well-designed, carefully documented, having an available specification/extension process, an existing reference implementation, and an open and responsive stewardship of the software and the standard by those who control it - all these virtues also seem appropriate to software systems created within the community of a company.

Small, nimble teams inspired to fulfil mission objectives with the freedom, flexibility, and empowerment to get it done under any circumstances, contributing where necessary, while respecting the long-term stewardship of code are the key to making rapid progress on a number of opportunities simultaneously.

Summary

If your team is spending all its time reviewing external contributions then that is a success - you’re enabling and leveraging the whole company to support the growth and value of your services.

Things that don’t (often) work

Architecture Review Boards

Those meetings where when a principal engineer or tech lead comes up with a system design for a new function they have to get it “approved” at the fortnightly governance meeting. This may be slightly controversial as ARBs can be valuable under certain circumstances, but those circumstances are difficult to get right.

In my experience architecture governance boards often end up as low-value talking shops and getting disbanded. Symptoms of a bad ARB include:

Attendees of the meeting don’t read the proposal beforehand so a presentation has to be done in the meeting (effectively reading the proposal aloud)
Attendees end up bickering about some tiny detail.

In the face of this, proposers will end up gaming the meeting in order to get a design through to meet their targets - figuring out what they need to say, and also what to leave out, such that meeting attendees are given the impression they’ve been consulted, and given the pleasing opportunity to exercise their authority, but without being given the opportunity to be obstructive.

ARBs can work where they are seen more as a mechanism for exploring possibilities, i.e. behave more like systems analysis than a clearing house. If there is an agreed set of loose principles to follow then the solution-space may focus onto a clear outcome more quickly - once a problem has been analysed into a domain then only minimal architectural exploration should be needed.

Summary

ARBs can descend into politicking.
They can work if positioned as agreeing and disseminating loose standards for thinking through a solution space, rather than mandating particular solutions.
Need to be enablers rather than controllers.
A good tech strategy with buy-in enables self-service at the Principal Engineering level.

Architecture Decision Records

When looking at a particular piece of a system, the engineer may be asking themselves “Why was it done THIS way?”. If no-one is around who worked on the system at the time then it may be difficult to find any answer that hasn’t been passed down as some kind of folklore by word-of-mouth…

Architecture Design Records have been suggested as a solution to this problem - unfortunately they don’t work. I’ve seen it attempted 4 times in my career and, in every case, the person who proposed it was the only one to write any, and after a few months they gave up. Even in the case where a senior leadership insisted on them they only got written (as a backfill) when that senior outsider was having something explained to them and no-one else ever found them useful.

There are several reasons for this:

Firstly: there is a great mismatch of incentives / value for writing ADRs. The cost of writing them is incurred now - and it’s a very boring task - whereas the value is gained by someone else or even your future self, but only in some far off future which possibly may never happen.

Secondly: what constitutes a valuable decision worth writing down? The choice of event streaming vs. REST? Using Python vs. Java vs Typescript? How can you tell the difference between something that is new and innovative vs. something that’s obvious? Are ADRs just making up for experience? You will end up with arguments about bothering to write the ADR (remember, they’re boring and, at the time of writing the ADR, the question they answer is now obvious) and they get done either as a labour of love or under compulsion by a senior colleague (with all the reluctance and bad feeling that comes from that…).

Thirdly: even if the first two were not an issue, human beings cannot produce a projection on an event stream. ADRs are written as a sequence of events over time, whereas we want to know why a certain system is designed the way it is now - i.e. the accumulation of all those decisions over the whole time. But ADRs are not indexed that way, particularly when a subsequent ADR can supersede and invalidate a previous one. So, just like producing a “projection” in a CQRS system, they have to all be read in sequence with the reader keeping track of all the consequences until they can understand the whole picture. The value of this mental effort is too low given the cost.

The solution is the architect librarian, while this sounds like some multi-class dungeons and dragons character it is perhaps the most important role an enterprise architect can play. Consider them the village shaman. They retain the long history of decision making within their team, their oral history is far richer and more valuable than anything they might write. They should be consulted before important decisions are made.

Summary

The incentives for writing ADRs are misaligned.
Humans cannot produce projections on an event stream easily (it’s done very slowly over time at great cost and called “experience”).
The answer is an enterprise architect.

Working to specification

Reputably a main problem with outsourcing was that suppliers always delivered what you asked for, rather than what you needed. So detailing the specification correctly became such an issue that outsourcing was no longer a benefit.

I’ve worked at a couple of dysfunctional places where “working to spec” was a strong smell: At one place, a team always demanded very detailed specs in their definition of done and delivered against that - because they were fed up with being told they’d done the wrong thing against vague problem descriptions…

At another, a team delivered a service and the team leader took the attitude “this is the spec of the micro-service, we won’t consider changing it, we have other things to do” (even though it was a core config service and not fit for purpose. I had to build a parallel config service that ended up being used by a third of the company). That particular team lead had, shall we say, some inappropriately domineering behaviours in other areas too…

Specs are needed of course, they form the basis of a service/data contract and enable testing etc. But they work well when all sides are contributing to the spec and collaborating on creating a solution to a business problem (“we all work for the same company”). When a spec is created one-sided then it results in silos and the worst kind of “ownership” (teams being blocked) etc…

Specifications can take many forms. They might be product specifications expressed as stories in the team’s backlog. Equally they might be in the form of engineering principals the engineer community has agreed to abide by such as the minimum-necessary responsibility principle². Specifications can come from standards such as defining a particular language to be used for specific types of work. They might come from an artefact repository where security signed versions of packages reside to ensure teams are using the right libraries. They might come from the available SAAS solutions such as the cloud the company uses. All of these things and more can form specifications.

Summary

Working to spec is a smell of a team protecting itself for some reason or attempting to dominate.
The ethos of “we’re all one team” has been destroyed somehow and you need to fix that root cause before the symptom of self-protection or dominance can be addressed.
Fixing the root cause can mean taking some of the burden away from teams. This is where an Enterprise Architect, working alongside Principal Engineers can coalesce specifications from the business strategy, product designers, and the engineering community.

Business Lifecycle Model

One way of modelling the stages of a business includes:

	Seed	Start-up	Scale-up	Growth	Maturity	Transform or Decline
Annual Revenue	0m	0m - 10m	5m - 100m	50m - 500m	100m - 1bn	10m - n bn
No. of people	1 - 3	2 - 20	10 - 500	200 - 2000	Thousands	Thousands
Character	What are we doing?	Say yes to everything	Can start to say no, focus and get big quick	Dominate a market	Optimise efficiency	What do we do next?

But as companies grow in size and tackle grander projects, so may the number of people in the engineering department. As this happens the communication structures within the department will change, this can be loosely modelled by the Dunbar Number.

Aside: The actual numbers of where the boundaries lie are hugely debatable (the original ethnographic animal studies were extrapolated to humans), and the association between brain size, intelligence (what kind of intelligence, there are many), and group size is somewhat dodgy and, shall we say, “of its time”… But under all that is just a very general observation about the different kinds of communication styles that happen in groups of various sizes which shouldn’t get lost. There’s definitely a change in communication style within a community as it grows from “close friends / confidants” to “family” to “clan” etc.

As a company moves into the “Scale-up”, “Growth”, “Maturity” and “Transform” stages the roles of the Principal Engineers and Architects will become more and more valuable. They will be the people that hold the long term knowledge, and have the leadership skills, to open the communication channels and help the teams in the clan to move forward in the same direction.

Summary

In the early startup and scaleup stages of a company all engineers are effectively potential Architects simply because they are the ones making all the decisions.
As a company grows it will be more and more useful to consider explicit Principal and Architectural roles to: 1. Hire in a wider range of technical experience, 2. Keep open the needed communication channels as the group dynamic changes.

Overall Summary

There are no easy fixes to the problems of delivering value in software engineering. But there are a few principles we can draw out of the above that may help:

Principal Engineers can keep tech teams aligned and magnify the value of their output. They will do this by sitting across teams as needed, probably varying project by project.
In particular it’s better to create virtual teams with e.g. a Principal Engineer and a Product Manager, who can shepherd a project, than it is to deal with the fallout of a disbanded full end-to-end team.
Both Enterprise Architecture and Platform Engineering need to take on an enabling rather than controlling role. In fact this is a good general principle for all leadership roles.
Standards need to pave the road rather than be either restrictions or empty platitudes.
A platform engineering team should make the right building blocks available and the easiest to use - where “right” is agreed across the Principal Engineers and Architects - while still enabling teams to explore novel or edge-case solutions.
Command and Control is a dysfunctional leadership style, better to “Inspire, Align and Empower” (alignment and control are very different things).

If you have any comments or feedback about this article, please use the Linkedin thread

https://handbook.gitlab.com/job-families/engineering/development/management/principal-engineer/
https://www.linkedin.com/pulse/what-principal-engineer-anyway-douglas-w-arcuri/
https://engineering-manager.com/2020-03-21/what-is-principal-engineer-role ↩︎
The phrase “single responsibility principle” has become over-emphasised and quite damaging in some circumstances - search: single responsibility principle considered harmful ↩︎

April 24, 2018 - 3 minutes read #pontification , #project management , #water cooler

On Platitudes

This post was originally written while I was at LShift / Oliver Wyman

I hope we can agree that ad hominem attacks in discussion are undesirable, but I’ll suggest that platitudes can sometimes be the other side of the same coin, it’s rebuke being delivered in a wrapper of inoffensiveness.

“A remark or statement, especially one with a moral content, that has been used too often to be interesting or thoughtful.” Oxford Dictionaries

To distinguish a platitude from something merely banal, try to state the opposite of it and see if it still makes sense. E.g. the phrase “We should lower taxes” is often used as a seemingly irrefutable political position, but the opposite “We should raise taxes” is a position that could plausibly be argued (if, for example, public services are currently underfunded). Therefore the former is not a platitude. On the other hand “I want to keep our children safe” seems impossible to negate: “I want to keep our children unsafe”. Really? Throw them to the wolves and see what happens? Send them to school surrounded by guns? (maybe I need a better example). The problem with platitudes in technical discussions is that they get used merely to terminate discussion.

Some examples for your game of Business-Meeting Bingo:

“I just want the simplest thing” – whereas the ‘simple’ in KISS and ‘minimum’ in MVP are useful notions, this phrase so often means “I don’t want to learn anything” or “Coordinating with other people is hard and messy”. The cry of “I want to keep things simple” can kill conversation with the result of teams, or individuals, heads down in silos churning out code with integration as an afterthought.
“This is the standard way of doing things” – While a common language and approach can aid communication in teams, an appeal to ‘standard’ can be quite aggressive. It implies there is only one way of doing things and any dissenter is foolish. Whose standard (there may be many)? When was it agreed? When was the last time it was challenged? If ‘tradition’ is the only reason for doing things then perhaps it’s time to re-evaluate.
“I want the best of breed” – thank you for pointing that out, personally I was hoping to integrate something less.
“We should be using best practice” – this is similar to ‘standard’. ‘Best practice’ depends on the circumstance, and if we’re engineering something new then maybe there isn’t any history or experience to compare with.

I had a hard time thinking of a conclusion to this post: an exhortation to ‘do better’ seems pompous and I feel unqualified for many concrete suggestions. Maybe we can just fall back on the general social ideas of generosity and humour. If someone supports their position by claiming ‘standard’, then perhaps jovially inquire if they are referring to ‘the’ standard or just ‘a’ standard. When faced with a drive for the ‘simplest thing’, agree that everyone wants that and then gently point out the pragmatic trade-offs for short-term gain and long-term maintenance. When asked for ‘best of breed’ inquire whether a Shire Horse or Racing Greyhound would be appropriate (hopefully generating a conversation about requirements).

That feels like best practice.

March 5, 2018 - 3 minutes read #project management , #water cooler

'5 Whys' Considered Harmful

This post was originally written while I was at LShift / Oliver Wyman

Adverse events happen – a website breaks down, a project doesn’t get delivered on time – and a proposed technique to find ‘the root cause’ is to ask the “5 Whys”. Attributed to Sakichi Toyoda in the 1930’s and adopted by Toyota and other formal techniques it’s basically the technique of listing a fault and then asking “Why did that happen?” – repeat until you get to a cause that ‘feels’ like the root. The name comes from the observation that 5 repetitions are usually enough.

I find this problematic for several reasons:

This is the analysis technique of an attention-seeking three year old
It often raises the question “Why did you do that!!!” and the resulting blame game never helps…
In my experience there is never a single root cause

Let’s pick a well-known example: “why did so many people die in the Titanic disaster?”

The watcher didn’t see the iceberg
The message didn’t get back to the helm
The management insisted on cheap rivets
The bulkheads were too low because of the ballroom
There weren’t enough lifeboats
Nearby boats didn’t recognise the meaning of flares
The “SOS” radio sequence wasn’t well known (at the time)

And so on… Maybe some of these are debatable, but the point still stands: in any significant failure it’s usually the case that a whole sequence of partial failures had to happen for the main failure to occur. Fixing any one of them would prevent the disaster happening again, but it’s clearly better to fix as many as possible (which may also prevent other, related, failure scenarios).

This is why I prefer the 4 (or 5) whats:

What happened (what were the symptoms)? Be precise and objective:
- “the service melted down” is not enough (if you look in the server room there will be no puddles of plastic or aluminium in sight)
- “the service had a latency greater than N seconds resulting in new connections being rejected” is objective.
What did we do as an immediate workaround?
What was the damage (and what do we need to do to make up for it)?
What do we have to do to ensure it never happens again? This will usually be a set of actions, not just one.

This avoids any blame game, avoids the futile attempt to pin down a single ‘root’ cause, and, most importantly, in light of the new information empowers the creativity and ownership of your engineering team to come up with the best solutions. After all, they should be the people who know most about the system.

January 28, 2018 - 4 minutes read #project management , #water cooler

Just enough design

This post was originally written while I was at LShift / Oliver Wyman

On the one hand it’s become a bit of a cliché to say that Waterfall doesn’t work (in fact ‘waterfall’ may never have existed), but we know that rigid projects don’t deliver—when the level of resources is the only contingency in a project then budget overrun and missed deadlines (or lowered quality) become almost inevitable.

On the other hand, the original “Agile Manifesto” is now more than 15 years old and is starting to seem like it’s missing a few things. Certainly some of the processes inspired by the manifesto seem to be of a benefit, for example: the frequent touch points of the daily stand-up, regular retrospectives (as long as there’s the organisational will to implement the recommendations), personal and regular communication, and so on. But there are occasionally huge drawbacks—how many times have developers heard things like:

“You’re doing sprints now so you can deliver in less time yes?”
“Yes I know I’ve changed my mind 4 times but you’re Agile so you can cope right?”

Also ‘Agile’ seems to encourage projects where the deadline and budget have been set but only against the vaguest notion of a requirements list; the development team is gathered together and the first question is “what is it we’re doing exactly?”.

At the other extreme, I have seen projects that languish with the Enterprise Architecture team for years while the Transformation Maps are drawn to full impact scope along with software designs, object integration designs, data schema, and vendor evaluation, etc., in the finest detail so “the programmers can just get it done (and properly this time)" – and perhaps never even really starting.

But some projects just have a nature that isn’t amenable to fast-spinning agile.

“This style of short-term planning, direct customer contact, and continuous iteration is well suited to software with a simple core and lots of customer visible features that are incrementally useful. It is not so well suited to software which has a very simple interface and tons of hidden internal complexity, software which isn’t useful until it’s fairly complete” – David Jeske
“A designer cannot claim to have truly designed something until they also know how that thing will be made” – Terence Conran, designer, and founder of the London Design Museum.

So what can we do? I’m certainly not claiming that design and architecture are bad things – commercial software projects shouldn’t start unless there’s confidence of delivery – but there needs to be a balance between rushing in blind and agonising over the unknown (and my contention is that his balance should be found before the main budgets and timelines are set).

Well, AgilePM (formerly DSDM) introduces a notion of governance to an Agile project:

The circle in the middle represents the ‘traditional’ sprints (a manifesto that’s 15 years old can be referred to as ‘traditional’ by now I think…) but supported by a well-reasoned set of gateways and light-touch documentation. With regards to Enterprise Architecture, the important columns in the diagram are the two near the left marked Feasibility and Foundations: they represent the notion of ‘lJust Enough Design’.

LShift (acquired by Oliver Wyman in December 2016) has a 17 year history of successful major software product delivery using agile methods including DSDM/AgilePM. The technical Feasibility and Foundations phases are what might be termed Architecture and Design (Enterprise or otherwise) under other formalisms but there are important differences:

It needs to be Just Enough and no more. Just enough to clearly communicate the vision and scale of the project and bring confidence of delivery but without so many trees that the wood becomes obscured.
It must closely involve at least some members of the team who are likely to produce it. Any Enterprise Architecture team completely separated from implementation inevitably loses touch with the reality of delivery.

Just Enough design – the technical Feasibility and Foundations – is essential, but once it’s been done the process of delivery should be handed over to the engineers. I’ve been on both sides and it’s clear to me when, in an Architect role, I should get out of the way. With apologies for taking an example from the military, “Intent and Initiative” is becoming seen as clearly more effective than “Command and Control”.

Like the three bears, Big Design Up Front is too much, and a ‘headless chicken’ Agile Manifesto is not enough – but maybe the processes of AgilePM can be Just Enough governance. Certainly software projects need design to be successful – not too much or too little, Just Enough.

January 2, 2018 - 4 minutes read #APIs , #technology

GraphQL is really TreeQL (and that's ok)

This post was originally written while I was at LShift / Oliver Wyman

Let’s have a look at GraphQL. It came out of Facebook as a replacement for REST style requests for querying data. It was initially developed from 2012 and made open source in 2015. As Facebook’s main database is the “social graph” it was naturally named GraphQL but, as we’ll see, that’s not a completely accurate name (though that doesn’t really matter in practice).

In GraphQL object types are described in a type schema and queries are specified as templates referring to those objects.

query BooksWritten {
  author {
    name
    books {
      title
      genre
    }
  }
}

Note these templates are document templates and not graph templates. Document-like objects such as these are effectively trees; there’s only a parent-child relationship between items (in fact, a container-contained relationship) and no natural way to reference an item in another part of the document. Databases (“in-memory” or SQL) often represent full, possibly cyclic, object graphs – but GraphQL can only express tree-like queries (SQL only expresses table-like queries even though the joins may be mapping over a cyclic graph, and you’d never expose raw SQL as an API).

A full graph query language would be able to specify cycles directly by naming nodes and referring to them elsewhere in the query. For example, in fraud detection it’s often useful to find loops in a graph to show where a group of people are suspiciously acting in common and graph query languages like cypher do this very well, even if the size of the loop is not known in advance. Another example relating to unknown path size is the “7 degrees” question which, in Cypher, is a one-liner:

MATCH p=shortestPath(
  (bacon:Person {name:"Kevin Bacon"})-[*1..7]-(meg:Person {name:"Meg Ryan"})
)
RETURN p

In GraphQL you’d have to construct a “recursive tree” query, with all the problems of combinatorial explosion, and search through all the intermediate nodes and leaves.

{
 actor(name:"Kevin Bacon) {
   movie {
       actor {
         movie {
           actor {
             movie {
               actor {
                 movie {
                   actor {
                    movie {
                     actor {
                      movie {
                       actor {
                        movie {
                         actor
}}}}}}}}}}}}}}

A query like that would probably return the entire database.

On the other hand this may be a benefit to some degree – unconstrained full-graph queries can clearly bog down the database if not developed carefully, so it would be a mistake to expose an API which allowed that. By restricting queries to trees the consumer has to think carefully about the queries they write as combinatorial queries become obvious, and most GraphQL implementations also support restrictions on query execution time in practice.

Once you’ve decided what your set of root query objects are going to be, and the schema that relates them to other objects, then you don’t need any other endpoints to fetch data related to any of those roots (or data related to data related to that root, and so on…). For a social graph a convenient root is the ‘user’, for a bookshop root queries may include ‘book’ and ‘author’ etc. The schema can be evolved by adding more ‘root’ queries. In this way a GraphQL endpoint can usually be extended over time by adding new items in a self-describing way without needing versioning or new endpoints to be created.

Evolving a GraphQL endpoint to remove items is slightly more tricky. Object types, fields, queries etc. can all be marked as “deprecated” such that IDEs and other tools subtly hide them from the developer and encourage their disuse and refactoring in existing client code. But they still need to exist for as long as there are clients that rely on them. Over time those clients diminish, as security and functional patches are adopted, until those items can be finally removed from the server. It may occasionally, but rarely, be necessary to rudely remove support for “long tail” clients – but this may be no bad thing as it merely forces the user to upgrade (and pick up security patches).

GraphQL can also be used to update data via the same endpoint, but rather than using limited http verbs like POST, PUT or DELETE the schema specifies the set of available mutations. These are more expressive than REST endpoint updates: they look more like function calls and attributes can be complex objects that fit the schema.

Evolving mutations can also reasonably simple: adding and removing mutations are dealt with in the same way as queries, and parameters are objects from the schema so are self-describing and can be evolved in the usual way.

So GraphQL is an extremely useful framework with several major benefits:

removing the “N+1 requests” problem of REST where you have to do a request to get an object, and then make further requests to enumerate various linked objects. All data requested in the query template is returned in one round trip.
REST has many other problems that GraphQL schemas solve
Schema evolution which avoids the need for versioned APIs.

It’s just that the query language itself should really be called TreeQL.

February 24, 2017 - 5 minutes read #pontification , #programming , #project management

Given When Then

This post was originally written while I was at LShift / Oliver Wyman

There are, of course, a large number of techniques described as being The Way To Do Software Engineering. I’m sure I’ve not come across them all but the ones I know about and use currently include at least: Impact mapping, Pert charts, Gantt charts, Personas, Wardley mapping, Agile, DSDM, MoSCoW, SMART, INVEST and BDD (I’ve not done Prince for two decades at least), and then actually writing the software in Sprints which, in turn, includes: PRL, DDD, Unit Testing, Integration Testing (‘Continuous’ or otherwise), TDD, and Fuzz testing, etc. It’d be nice to bring all of these into some kind of coherent model we can utilised to produce correct and usable software.

Books have been written about each of the above techniques and more which I’m not going to repeat here though it may be useful to quickly skim through at least some of them. I’m bound to say something slightly contentious or incomplete in these summaries; if I do then I defer to the greater literature.

Impact mapping covers the ‘Why are we bothering to do this?’ of the system:

Four main concepts: Goal/Why, Actor/Who, Impact/How (the Actors change their behaviour or do their task), Deliverable/What
Personas are useful descriptions of various aspects or variations of the Actors
Agile Stories can then be read directly out of the Impact Map. I.e. “As an [actor/who,] I want to [deliverable/what], so that I can [impact/how].”

Agile was publicised at least as a manifesto for communication and rapid iteration. Early versions of the manifesto continue to be criticised, but this is unfair as ‘Pragmatic Agile’ techniques like DSDM (Dynamic System Development Method) are now well developed.

Software is manufactured in a series of timeboxed Sprints or Iterations (in the order of weeks rather than months or years) with a retrospective and refocus at the end of each sprint.
During a Sprint a list of Stories is considered for solving. Ergo if a story is too big for a sprint, or it will take a whole sprint (or, even, more than a few days) then we re-label it as an ‘Epic’ (really just a very big story) and break it down into more manageable parts (which we then call Stories – i.e. they should still be expressed in the ‘as a [], I want to [], so I can [*]" triple).

Possibly a good way to break down an Epic is to write out all the acceptance criteria (see below); if done well it’s likely the criteria can be grouped into orthogonal sets where each set is likely a candidate for a story of its own. Acronyms like SMART and INVEST are often thrown around at this point. SMART has many possible meanings, but a common one is ‘Specific, Measurable, Achievable, Relevant, and Time-bound’ – but there are a lot of tautologies in there (how can something be either Time-bound or Achievable without being both Measurable and Specific?). INVEST seems more useful: ‘Independent, Negotiable, Valuable, Estimable, Small, and Testable’.

Note that slide 6 on this deck is misleading:

It seems to indicate a straight line top left to bottom right that one could execute in order and good software just pops out at the end. But really it’s the grey background boxes that show the correct timeline: i.e. everything overlaps. And as for suggesting there should be a tiny ‘do the coding’ box at the end, well, let’s move on…

BDD (Behaviour Driven Development) simply describes the set of Acceptance Criteria and/or Scenarios that should be attached to each story to give the ‘Definition Of Done’:

GIVEN a condition, WHEN something is done, THEN something happens

Wardley maps are a way of modelling what parts of the system are most visible to the user; perhaps giving an indication of priority for tackling them:

Graph of ‘real things’ and ‘depends-on’ links between them
Things tend to be nouns
The things more visible to the user, the ‘User Anchors’, are the Deliverable/What/Epic/Story we saw above – which, in turn, are a set of Behaviours.
The things less visible to the user are likely to be the underlying substrate of system units and integrations that make those Behaviours possible.

Planning is quite useful:

Given the set of stories we’ve now identified for the system, and the relationships between them, we need to decide in which order to tackle the stories so we can have the best chance of producing something useful within the allocated time frame and budget. So at this point we can use a MoSCoW (Must, Should, Could, Won’t) technique to prioritise each feature and end up with the PRL (Prioritised Requirements List).

We can now use the PRL to populate the sprints directly or, as an extra bonus, create a Pert chart and a (probably aspirational) predictive Gantt chart to plot our course.

Software that satisfies the set of stories is, in some way, made up of units of capability and integrations between those units. In vague terms unit tests examine the behaviour of each unit in isolation and integration tests check how the units behave when communicating with each other and the glue between them (no surprises there). But it may be that producing a behaviour takes a whole collection of units and, likewise, a unit may be fundamental to a whole set of behaviours. So we may end up with a matrix of behaviours against units and integrations – i.e. a many-many join from behaviours to unit and integration tests

This may all seem like a huge palaver, and indeed it will be if one is tempted to follow all the techniques to the letter! But in practical development, and by looking at the scale of the problem, it’s often easy to do at least some of these techniques in one’s head. That is: each technique should be followed ONLY if it adds value to the current project, and fastidiously following all of them may even get in the way. There’s an interesting balance to reach:

none of these techniques are strictly necessary, and following all of them is no guarantee of success – however, the correct techniques used at the appropriate time can greatly improve one’s chance of success.

So do we have a coherent model now? Perhaps something like this:

What could be simpler?

October 11, 2016 - 5 minutes read #pontification , #programming , #water cooler

Programming is not a performance

This post was originally written while I was at LShift / Oliver Wyman

Programming is more like writing a novel then executing a performance. No I don’t mean the likes of If Hemingway Wrote JavaScript – I mean, apart from ridiculous job interviews involving a whiteboard and pen (NB. LShift never does that) coding is very unlikely to be a performance in an instant of time. Usually when writing software, developers get to test, review, seek opinions,evaluate, rewrite and polish an artefact useful to the customer (well, when given appropriate resource and a professional culture at least). If you want to take an artistic analogy; programming seems more like sculpture, carpentry or, for this blog post, writing a novel, rather than singing a song or playing an instrument.

To continue the analogy, students of creative writing have access to a wealth of training e.g.: courses, books and videos, writers' circles etc. Similarly for programming there are: reams of open source code hosted on public servers, peer review and feedback through pull requests, discussions on aesthetics and elegance , and so on.

Using the musician analogy it may be tempting by some to regard a Lead Developer as a kind of orchestra conductor, but I’m more attracted by the analogy to a literary editor (or cat-herder, which may be the same thing “I love deadlines. I love the whooshing noise they make as they go by”) not least when it comes to the issue of Coding Style. For example, in my opinion, the best book on code style ever written is Perl Best Practices by Damian Conway – not because of perl as such (superb in its day, but it’s time to let it go for all but the simplest text munging) but for these 3 paragraphs on page 6:

Other readers may object to “trivial” code layout recommendations appearing so early in the book. But if you’ve ever had to write code as part of a group, you’ll know that layout is where most of the arguments start. Code layout is the medium in which all other coding practices are practised, so the sooner everyone can admit that code layout is trivial, set aside their “religious” convictions, and agree on a coherent coding style, the sooner your team can start getting useful work done.

…

But remember that each of piece of advice is a guideline … Whether or not you agree with all of them doesn’t matter. What matters is that you become aware of the coding issues these guidelines address, think through the arguments made in their favour, assess the benefits and costs of changing your current practices, and then consciously decide whether to adopt the solutions offered here.

Then consider whether they will work for everyone else on your project as well. Coding is (usually) a collaborative effort; developing and adopting a team coding style is too.

[my emphasis] and it’s often useful to pick an off-the-shelf style and stick with it (for a project) rather than agonise about tiny optimisations.

So can we perhaps look further to literary practice to guide us in software development? Well, here’s part of Gwynne’s introduction to Strunk’s “The Elements of Style”:

Remember always, when you are writing for anyone other than yourself, that you are giving. Do not, therefore, write to suit yourself: write with your readers constantly at the forefront of your mind. Put yourself in their shoes when you are deciding how to express yourself. It is not enough that you yourself can easily understand what you are writing down; you are not writing for yourself. Will they understand it? Can you make what you have just written clearer, so that there is no possible excuse for their misunderstanding it, or even for their having to pause over it for a second or two in order to see its meaning?

[Gwynne’s emphasis]

And a key paragraph from Strunk’s guide itself (NB. this was written in 1918, so I apologise for the jarring gender bias):

It is an old observation that the best writers sometimes disregard the rules of rhetoric. When they do so, however, the readers will usually find in the sentence some compensating merit, attained at the cost of the violation. Unless he [the student of English writing] is certain of doing as well, he will probably do best to follow the rules. After he has learnt, by their [tutor’s] guidance, to write plain English adequate for everyday use, let him look, for the secrets of style, to the study of the masters of literature.

I don’t think the similarity of the sentiment in these three pieces is a coincidence. Quite simply I hold the opinion that programming should be the act of communicating clearly to another programmer (it’s a useful coincidence that the computer can also understand what’s desired and even prove it in some circumstances).

This is a notion I was trying to convey in my post on Scripting vs. Engineering; software engineering (as opposed to mere scripting) is a team activity, not a solo one, and readability and clarity is everything even if the team is just you and your future self.

Here are some more ideas in that direction:

“the glitz of such code just wastes time & brain cycles” related to the notion of modesty in elegance.
“Thou shalt make thy program’s purpose and structure clear to thy fellow man by using the One True Brace Style, even if thou likest it not, for thy creativity is better used in solving problems than in creating beautiful new impediments to understanding.” (https://www.lysator.liu.se/c/ten-commandments.html but I think it’s older than that)
Optimising for Human Understanding
Without the comment, this bugfix wouldn’t make any sense (the bug caused 100% CPU usage deadloop and was introduced in an earlier fix, presumably because the original code wasn’t clear enough)

So how could my colleague and I have such a differing opinion ? Well, Matthew is great friend and an accomplished musician, and I have (some minor) aspirations of being an author – i.e. perhaps, like all people, we each understand and explain in terms of what we know. Unfortunately though, with my seemingly limited imagination I’m more likely to continue writing software and blogs than getting rich rich rich from that novel… Oh well, back to the creation.

September 14, 2016 - 7 minutes read #pontification , #tools , #water cooler

Why bother testing?

This post was originally written while I was at LShift / Oliver Wyman

It’d be nice to be able to make a definitive case for the benefits of software tests, but I can’t due to this one question:

Is it possible to prove the correctness of a program using tests?

The answer is unfortunately “no of course not” and I’ll show why below. But all is not lost – please keep reading while I take a meander through the realms of confidence…

Imagine attempting to prove the correctness of a simple Fibonacci function using just tests (e.g. in a TDD based process) when paired with a malicious programmer – how can you specify the full range of inputs and outputs? If the input parameter is a long do you write 2^64 tests? Worse than that, in languages that allow buffer-overrun bugs, and in a function that takes an unchecked array parameter (yes, it happens), the number of input states is effectively equal to the total number of different states of the entire process address space 256^(2^64).

While writing this blog I’m reminded of something I first came across as a fresh-faced undergrad in the 80’s reading Douglas Hofstadter’s version of Zeno’s paradox in Godel Escher Bach:

Achilles told the tortoise, “If A and B, then C. You must accept that”

The tortoise says, “Yes, but if A, and B, then C, is a logical premise, then the premise you just stated, which I’ll call ‘D’ is also true. So, if A and B, and C, then D”.

Achilles relents and says “Okay, but surely you must stop there!”.

The tortoise says, “Yes, it is certainly true that I can stop there, but that would be yet another premise related which I’ll call ‘E’. If A and B and C and D, then E”.

And so on…

Frameworks like Idris or Coq attempt to be a solution but, to some, seem harder to use than just writing the software.

In any case, if absolute total correctness is what you’re interested in then completely different techniques are available. For example the take-off control systems of the space shuttle

“Take the upgrade of the software to permit the shuttle to navigate with Global Positioning Satellites, a change that involves just 1.5% of the program, or 6,366 lines of code. The specs for that one change run 2,500 pages”

that is, only 3 lines of code from a whole page of specification…

So what about the 99% of the rest of software that we all write, where the budget and incentive just isn’t there for this sort of thing. For that we can at least improve confidence as much as possible using testing as a series of layered techniques, and we have to come to terms with the fact that our confidence can only ever be an asymptotic curve at best.

But why even do tests at all? I did in fact come across a programmer once who maintained that “unit tests prove you’re a bad programmer – if you need to write tests then obviously you don’t understand your code!” Needless to say this person wasn’t exactly a team player, instead maintaining “I’ve been programming this way for 20 years and I’m not going to change now” It was a PHP shop, but we won’t hold that against them…

Tests can exists at various levels of abstraction and scope:

Unit tests; automated
Nearly integration tests; automated
End to end (installed system) tests:
- Feature acceptance test; if a ticket claims to add a feature or fix a bug, then manually check that feature only
- “Poke around a bit” tests; scripted and/or a short, manual, defined sequence
- CAT / full regression smoke tests; scripted and/or a probably very long, manual, defined sequence
- Real users; manual, exploratory testing producing horribly vague bug reports (often shared with their friends rather than you)

And this last type is the real point: if you don’t do testing, and find the bugs, then your users will!

Each of these techniques, particularly given the previous, can result in more confidence, unfortunately each type tends to be far more expensive and painful to set up than the one before. End to end tests can be automated but may take a large test server farm or container system for the whole installation, and may not even be possible if there are external service dependencies. Also some tests may be of a visual UI – automated tests can only tell if the expected data objects in the UI model have the expected settings, but it takes a manual test to determine subjective usability issues.

But aren’t unit tests the most fine-grained and focused tests? Can’t we just stop with them? I think not as there is a mock/fixture chasm. I.e. to unit-test a client and server one mocks the server in the client tests and uses client-fixtures in the server tests. But this is a declaration of contract and not a test of the contract – every project I’ve ever seen (ever!) has had at least one failure in deployment because a mock and fixture didn’t agree. And here’s a philosophical question too: when is a mock so complex you’d be better building a stub service instead? Compare with the notion of a “fake”: in Freeman and Pryce and the problem of verified fakes.

So should we just get rid of unit tests (as some may suggest)? Well, again, no. Integration tests can only give vague “smoke signals” when a test fails – to find the exact location of the bug will take unit tests, interminable logging, or single-stepping with a debugger (which is the punishment for not writing unit tests). Unit tests can form essential documentation – the language Pyret even makes unit tests a first-class citizen rather than an add-on – and this is particularly true for interface tests:

“Your interface is the most vital component to test. Your interface tests will tell you what your client actually sees, while your remaining tests will inform you on how to ensure your clients see those results.”

While we’re on automated tests let’s talk about the cult of 100% code coverage. We all know that 100% coverage means nothing as we can fake coverage with stupid tests. But there’s this odd idea that “if thing X doesn’t completely prove what you want there’s no point doing it” which seems to forget that “failure of X really does show you have a problem!” 100% static code coverage certainly means nothing, but having only 10% code coverage really means a lot. There’s an asymptotic diminishing return worrying about code coverage, and do you really want to waste your time writing unit tests for accessor boilerplate etc? By the way – due to an incompatibility in JVM tooling, using Java with Lombok means it’s difficult to get more than 50% coverage in any case – which is a good thing as it highlights how annoying java is, just use Kotlin or Ceylon instead…

The technology has moved on enough now though that static code coverage measurement should be retired in favour of mutation testing e.g. pitest.org

Speaking of technology, let’s consider automated continuous integration. If you’re using source control with a sensible team policy, e.g. git with “branch per feature”, and why wouldn’t you be, then I hope you also make use of code reviews on merge/pull requests. The load on your team-mate and reviewer is greatly reduced if your software actually works! Maybe you are a genius and always remember to run the tests before pushing a commit, but most people aren’t (myself included) so having the review/CI system automatically report on tests is a valuable safety-net. Github can be configured to enforce this. Your peer-reviewer is not your debugging tool – well, unless you feel you already have too many colleagues…

In short:

Just because you can’t get to 100% it’s still worth getting to 80% (and to recognise that 100% is possibly a waste of time)
No single technique will solve all the problems
In fact, using all the techniques may not solve all the problems
Your end-of-sprint customer-facing demonstration should go flawlessly because your “Automated Demonstrations” have run through it many, many times already…

Scripting vs. Engineering

This post was originally written while I was at LShift / Oliver Wyman

I’ve come to the conclusion that the terms like “programming”, “coding” etc. have become horribly ambiguous which has enabled:

organisations to offer courses on html/css editing as “coding”
people to make claims like “nodejs is more productive than java” (which is a nonsense statement either way)
various arguments along the lines of “is X a PROPER programming language”.

I think it’s more helpful to think in terms of “scripting” vs. “engineering”:

Scripting is geared towards the short-term, small team or solo, quick prototyping.
Engineering is geared towards the long-term, larger teams, maintainability(i.e. readability), refactorability and extensibility.

It’s probably impossible to agree on whether a language is “scripting” or “engineering”, but obviously I have personal opinions about some characteristics of each:

In the long term, the time taken writing software is 90% reading what’s already there. Languages that don’t support readability (or IDE analysis support) fall towards the Scripting end. IMHO the reason nodejs has encouraged microservices is because nodejs is quickly unreadable – you don’t extend a nodejs app, you just write another service…
Monkey-patching automatically precludes a language from Engineering (it violates the Principle of Least Surprise) – how can anyone reason about code that can extend an object’s methods at runtime?
Static-typing moves the language towards Engineering by increasing correctness and readability. Completely pathological strict typing languages like Haskell may be a learning barrier though (along with Haskell programmers' irresistible temptation to use punctuation as function names it seems – quick, quick, tell me what the difference is between foldl and foldl')
If the language has static typing then good type inference cuts down on the boilerplate. These days I wouldn’t write any Java without using lombok.
Immutable objects are good Engineering. Likewise functional languages can aid Engineering, except that the temptation towards one-line-itis reduces readability.
Encapsulation and clear techniques for Dependency Injection help Engineering as it supports mocking in unit tests.
Automatic resource management aids Engineering.

So I can possibly come up with a completely unscientific and arrogantly self-opinionated table of languages: 5 is good, 0 is bad (if the square is blank then I haven’t formed an opinion):

	Readability	Least surprise	Static types	Type inference	Immutable Functional	DI Mocking	Resource Management
Java with lombok	4	4	4	3	2	5	3¹
Kotlin	4	4	4	5	4	5
Ceylon	4	4	5²	5		5
Rust				5			5
Clojure	3	3³	1	3	5	5	4
Python	4	4			3
Python with Coconut					5
Javascript	3	2⁴	0		4		0⁵
PHP	3	2⁶	1⁷	0	1	3	0
Bash (just for fun)	4	3⁸	0	0	0	0	4⁹

“try” with resources
inline Union types look interesting
things get turned into null at the drop of a hat
https://www.destroyallsoftware.com/talks/wat
everything is in a global namespace
https://eev.ee/blog/2012/04/09/php-a-fractal-of-bad-design/
annotations for run-time checking
space handling in values
all resources are in pipes

Scripting vs. Engineering databases

I think we can extend this to databases too. The NoSQL/schema-less fashion of databases are definitely up the scripting end:

“The problem is that in the absence of a formal schema, the “schema” is an emergent property which is reflected in the code used to manipulate the database” – comment in Why You Should Never Use MongoDB

The “MEAN stack” – mongodb, express, angular and nodejs – is certainly for prototypes only. It should be called the WONGA stack: Write Only Node, monGo and angulAr (possibly only UK readers will get the reference). Angular and React are good for “single page application” building, though possibly vastly improved by using typescript (and flow) instead of javascript.

Responses to “static typing is bad / gets in the way”

The post “Static vs. Dynamic” Is the Wrong Question for Working Programmers, in the section"static typing benefits", makes the correct statement “Proof that certain kinds of dynamic errors are impossible” but I’d maintain the suggested “drawbacks” are incorrect:

Increased verbosity or reduced expressiveness – not true if the language has good type inference, and type annotations improve readability immensely.
Rejection of otherwise correct programs – how can a program be correct if it has a type error? See how an uninitialised variable cost $90,000.
Slower programmer iteration (possibly lengthy compile/run cycles) – IDE integration and incremental builds removes this completely
A need for the developer to learn “static typing” language feature – I’d suggest that if you really know what your program is doing then you do know about types (unless you’re programming in the Stringly style)

Reading through some python documentation blogs and this paragraph caught my eye:

If you’re coding in Haskell the compiler’s got your back. If you’re coding in Java the compiler will usually lend a helping hand. But if you’re coding in a dynamic language like Python or Ruby you’re on your own: you don’t have a compiler to catch bugs for you. – https://codewithoutrules.com/2016/10/19/pylint/

And that’s why I like types (which can be added in python)

Older Newer

What are GPTs good for?

Contents

Introduction

What are GPTs?

What is Intelligence?

Thinking fast/slow - System-1 vs System-2

Pattern Recognition and Hallucinations

Surprising GPT/LLM Applications

Weather Prediction

Solving geometry problems

Extracting Knowledge Graphs

How to use GPTs practically

The moving parts

Prompt engineering

Chat with your Docs & RAG

Mechanised queries

Summary

The role of enterprise architect and principal engineer

Contents

How to read this doc

The balance between vertical and horizontal teams and the Role of the Principal Engineer in delivering value

Summary

The Role of the Enterprise Architect

Summary

The balance of tools and the paved road

Summary

Internal Open Source

Summary

Things that don’t (often) work

Architecture Review Boards

Summary

Architecture Decision Records

Summary

Working to specification

Summary

Business Lifecycle Model

Summary

Overall Summary

On Platitudes

'5 Whys' Considered Harmful

Just enough design

GraphQL is really TreeQL (and that's ok)

Given When Then

Programming is not a performance

Why bother testing?

Scripting vs. Engineering

Highlight

Latest Articles