Return of the King

This is part three of a three part post detailing my RSE work on the MetaWards project. In part one I talked about trust in software and data, and why my first step was to port the original code from C to Python.

In part two I introduced the two challenges of research software - robustness and speed, and how I overcame these by walking back to C.

In this part three I will cover a lot of topics, as the last six weeks have been very eventful. This is a very long post, so, in advance, thank you if you reach the end.

I will cover both RSE-type topics, such as adding flexibility to code, “tutorial-driven development”, and “plugin-based design”, before exploring more general topics such as the role of the RSE, my opinions on the Covid-Sim events, my opinion on the behaviour of some connected to government, and finally, why I believe a second wave is inevitable and what we as a culture can do to protect ourselves.

Part 1: RSE-type topics

Climbing Minas Anor

There is a big difference between normal software and research software. This is why research software looks different to normal software, and why what can be “good software” in academia can look like “bad software” to a “professional software engineer”.

Research software is different to normal software for three reasons;

  1. We don’t know the requirements of the software when it is written. There is a vague idea of what is required. But research, by it’s nature, is an exploration of the unknown. The software thus has to adapt and evolve throughout that exploration. Adaptability and flexibility are thus key.

  2. The researchers, who are the users of the software, are often also the software developers. They have little or no training in coding, let alone software engineering. Thus research software has to be adaptable and flexible for people with little coding knowledge.

  3. Research software can live for decades. And what’s worse, we often don’t know which software is going to be useful in 10 years time. Often long-lived codes start as quickly written scripts of short C or Fortran programs. Most developers would never believe that their code will outlive their own research careers, and certainly don’t write it with that expectation.

This need for flexibility, coupled with the need for the code to be understandable by a novice coder, makes good research software extremely challenging to write. And worse, this need for “understandable flexibility” is at odds with the other two requirements of robustness and speed.

But why is that? Why does flexibility make codes slower and less robust? Robust codes can only have a limited scope of applicability. Testing gives robustness, but you can’t test everything. And as flexibility increases, so to do all of the edge case errors and integration errors. Plus, inexperienced coders tend to cut corners or introduce bugs.

Similarly, optimised codes are written in complex languages and utilise the full suite of expert knowledge to maximise FLOPs and MOPs, often at the expense of code readability. Optimised codes are extremely intimidating for novice researcher developers, and are extremely challenging to adapt and then debug.

This means that research software often turns into a poorly tested, yet highly optimised monolith, full of the scars and hacks of generations of PhD students and postdocs who were intimidated by the sheer bulk of the code and the complexity of the build, test and deployment systems that emerged from the cruft.

I did not look for it in a Ranger from the North

Good research software engineering balances the three requirements of flexibility, robustness and speed, while remembering that the primary audience are user-developers with little coding experience, and the secondary audience are the generations of researchers who will work with the softare over the coming decade(s).

In my opinion, the best way to achieve this is to build software as collection of modular and re-usable building blocks. And what’s a really good language for that - well that’s Python.

Why do I say that? Surely other languages are better? C and Fortran are surely faster? R is better for data, no? Rust is best for bug-free code? Julia is much more understandable? Javascript has the best package manager? C++ is simply the most poweful language (and shotgun) ever invented? So why Python?

Python is not the best at anything, but it is the “second best at everything”. And it brings everything together. It is a fantastic glue that joins lots of things together, while at the same time not being too proud to steal/copy the best ideas and APIs from other languages.

Python comes with fantastic documentation, testing and packaging tools. The design of the language forces developers to write clean code, and a strong adherence to coding standards using tools like flake8, black and autopep8 makes group-developed code more coherent.

Most importantly, the Python community develops and encourages the development of shared modular libraries. Pandas, scikit-learn, numpy, geopandas etc. just feed into a culture where researcher-developers are encouraged to write code that is modular and sharable. And because it is a scripting language, distributing the source (often via highly permissive open source licenses) is the default.

It’s the jobs that’s never started that takes longest to finish

Bringing this back to MetaWards, once I had finished and optimised the Python port, my next goal was to add flexibility to the code. This was not just so that the code could adapt to the demands for the next few weeks/months, but more importantly, that the code will be adaptable for the next few years as this area of research is likely to remain extremely important.

I took two approaches to achieve this, which I will detail below;

  1. I adopted the process of “tutorial-driven development”. This involved writing a detailed tutorial for each new feature before or as it was developed.

  2. I tried to anticipate how the code may need to be changed, and then used the dynamic typing and functional programming capabilities of Python to create a plugin interface that researchers should be able to understand.

Tutorial-driven development

Tutorial-driven development is a process we’ve adopted at Bristol that has grown out from our experiences from the BioSimSpace project. We realised during that project that it was a waste of our time to add functionality that users didn’t know existed, nor knew how to use. We thus took to writing Jupyter-notebook-based workshops that demonstrated key functionality. Tutorial-driven development was born as the development of workshop material began to overlap with, and then precede actual code development.

In MetaWards I had completed the Python port after ~2 weeks so that it was a drop-in replacement for the C code. I then had the pressing need to add new functionality to support the ideas being discussed by the epidemiologists in their Slack group (more on that later). I needed to not just add new functionality, but also teach people how to use it. Because a lot of the flexibility came from a Python plugin interface, this also meant teaching users how to use the MetaWards API to write plugins.

So I started to write tutorials that were included in the code as rst files in the doc/source/tutorial directory. They started as tutorials for the code as it was, and to be honest, were more forcing me to learn enough epidemiology so that I could explain what the code was doing to others. However, they quickly evolved to describing functionality as I was writing it, to writing tutorials first showing how the code should work, and how it should behave. This meant that the tutorial inched forward in lock-step with the code releases. Each new feature had an associated new tutorial, and the process of writing the tutorial had forced me to adopt the role of the user and the teacher. As a result, the code was deliberately written to make functionality easy to use and easy to teach. As an added benefit, the tutorials, by design, had almost full coverage of the code. Users who followed the tutorials on their own machines were a great distributed human continuous integration service. The errors they encountered and bugs they surfaced were reported quickly, and in a way that I could easily reproduce (we were, of course, all working from the same input files and tutorial files). The strong Windows support for MetaWards is entirely because one of our users works on Windows, and was very diligent and helpful in raising issues for every problem they encountered. I sincerely thank that user :-). (also, hint to everyone, add GitHub issue templates as they nudge people to submit useful issues, and keep out most of the spam and trolls).

One issue with tutorial-driven development was that the tutorials were always changing. I constantly refined tutorials based on suggestions from the users, e.g. to add functionality that would make the code easier to understand, or to make it easier for users to write plugins. This was yet another benefit of tutorial-driven development - users made really useful suggestions, as we were all speaking the language of the user actually using the code!

To prevent confusing people, I had to version the https://metawards.org website. I did this by adding a few small scripts to the GitHub Actions pipeline that copied the Sphinx-generated docs into a versions subdirectory depending on the Git version tag. I then ran a deduplication script to remove duplicated website assets that were larger than 256KB, and then a small bit of javascript so that a combo-box below the logo on the website could access any version of a page.

Image showing the version combo box

Now users could follow the version of the tutorial that matched the software. And we have a record of how the code has evolved and developed. Reproducibility, provenance, user-centered design, strong testing and distributed human CI. Tutorial-driven development forces adoption of a workflow that brings huge benefits to research code.

It also solves the question that I asked Neil Chue-Hong when I first met him as part of a SSI project back in 2010: How do we document code for users, and keep that up to date? At that time I had been working on ProtoMS and Sire, both of which had good source-level documentation, and in ProtoMS’s case, a detailed user manual. However, writing user-level documentation and keeping it in sync with code as it developed was really difficult. I rarely found the time to write user-level docs, and when I did, they quickly became outdated as I added new functionality or changed the design of the code.

Tutorial-driven development makes the user-level docs the central output and focus of the software development process. The code has to stay in sync with the docs, not the other way around. New functionality does not exist unless it is in the tutorial. Developers must put themselves in the shoes of their users, and adopt the mindset of a teacher who has to explain how this new functionality will work.

Now you may think that this would dramatically slow down the code writing process. I hope that MetaWards, which in just two months has hit version 1.0 and been used in production as part of extensive studies proves otherwise. Much to my surprise, tutorial-driven development made me faster and more productive. Writing the tutorial first gave me the time and head space to properly design new functionality before writing it. It prevented me from going down many blind alleys and rabbit holes. The most valuable outcome was that it made me actually use the code to run epidemiological experiments.

For these, I created the lurgy, so that I could model scenarios in public without the risk of causing panic or misinterpretation. As a user, I looked at MetaWards and asked the question of how I would use this to, e.g. model different lockdown scenarios. Or to model shielding, quarantines, hospital admissions etc. I had reached a level of trust with the epidemiologists that they had invited me into their Slack group. I knew what they wanted to model and what approaches they wanted to take. So, I looked at MetaWards and tried to work out the easiest and most easily explainable way to model what they wanted - under obviously very tight timescales.

By writing the tutorial first, I was forced to re-use as much existing functionality in MetaWards as possible. The best new code is no new code (it has the fewest bugs). Re-using and re-purposing existing functionality exposed and fixed edge-case bugs, but also prevented me from writing whole new functions/classes for conceptually different, but what turned out to be structurally related functionality. Code written for, e.g. modelling different age demographics, was quickly repurposed to represent dynamic and arbitrary groups, to model shielding and quarantine. And then, as I wrote the tutorials and really used the code, it gave me a deep level of understanding that enabled me to make big contextual leaps, e.g. the creation of interaction matrices, or using different isolation groups for different days, such that really sophisticated features, like modelling different transmissability of the virus from carers to shielded versus shielded to carers, or modelling the impact of people breaking quarantine with different probabilities for different numbers of days, could all be implemented in the tutorial with no major changes to the code!

Tutorial-driven development stopped me from “coding in a hole”, and stopped me writing overly complex or unnecessary functionality that would never be used. It is the single most productivity-enhancing process I have followed as a research software engineer, and I strongly encourage its wider roll-out in the community.

Plugin focussed design

Flexibility is paramount for research software. However too much flexibility makes for untestable or spaghetti like code. Plugin-based designs, which give users flexibility within small sandpits defined within the code are a good compromise.

I’ve fully embraced a plugin-based design for MetaWards. We use iterate plugins to advance the outbreak on each model day. We use extract plugins to control how data is analysed and output to files (or databases etc.) We use mix plugins to control how the force of infection calculated for different demographics is merged together, and move plugins to control how individuals are moved from one demographic to another (e.g. from home to hospital, or from quarantine to released).

Python is a functional language, so plugins are super-easy to write and work with. I make liberal use of **kwargs keyword arguments and use the tutorials to strongly encourage users to embrace a very Pythonic way of writing software. Initial discussions with a new user with a C++ and C# background was illuminating as it really showed that the object-orientated way of thinking encouraged by those languages is very constraining. A lot of people who come to Python from C++ end up writing “C++ in Python”, and don’t adapt to the freedom and flexibility that full Python provides. I feel it is very similar to when C developers first moved to C++ - they just wrote “C in C++” and didn’t take advantage of the new freedom of a much more expressive language. We are at a similar tipping point now, and a new generation of programmers are coming in who will truly embrace functional-style development. Such codes are fast (MetaWards Python is faster than MetaWards C), but are also significantly faster to develop, easier to maintain, easier to test, and give an expressive freedom that mean that complex and sophisticated ideas can be implemented intuitively using little code (and remember, less code means less bugs).

If you are a Fortran, C, or C++-trained RSE, then please take a week to learn Python from scratch. Put aside your pre-conceptions of object-orientated programming. Forget inheritance. Embrace functions as objects, duck typing, keyword arguments and forgiveness is better than permission. Yes, it will be different, and you will initially feel lost and confused. But, once you get it, once you see that this style of programming is very different to what has come before, you will grow as a developer. And have a new tool to complement the collection that you should build up through your career.

To motivate you, the best way I can explain the difference is that writing C++ is like writing a technical manual - it is fully detailed and each page tells a different story. In contrast, writing Python is like writing a haiku. There are a small number of lines and, on first look, it is deceptively simple. However, there are many layers of meaning. Calling these lines in different situations with different types of objects gives ever deeper and more sophisticated behaviour. Each time you read those lines you can get deeper meaning and greater understanding. And tied into what I said above about re-using code in the tutorials, those same lines of code can be repurposed for completely different behaviour, but are still easily comprehendable for both of those different situations. Since 2000 C++ was my favorite programming language, with Python gradually rising up into a solid second place. Writing MetaWards has given me a deep and new appreciation of Python that has now elevated it clearly to the status of my favourite programming language. It is the new king.

Part 2 - More general topics

What is the role of the RSE?

In addition to making me reflect on how to program and how to teach, this project has also made me think deeply about the role of the RSE.

It has taken two months to get to MetaWards 1.0. I adopted a parallel code strategy, leaving the epidemiologists to work in a (slightly cleaned up) version of the C code while I spent the first two weeks completing the initial refactor and testing. They then started moving across to the new code for speed, but developing new functionality in C. I mirrored that into the Python code, until we eventually reached a point where they were happy enough to do everything in Python. So, while it has been 2 months, they were productively working with the code from about 2 weeks. Most importantly, they were never slowed down by my work, and the move across to Python, supported by the tutorials, has been relatively smooth.

RSEs need to be empowered to make these kind of large changes to a codebase. But to do this, they must make the effort to learn enough about the field they are working in so that they can fully understand the code and gain the trust of researchers to make these suggestions. While I was working on MetaWards I was hearing about what was happening with Microsoft and Covid-Sim. No disrespect intended, but software engineers who don’t understand the science behind a code are limited to making tiny changes, e.g. adding integration tests that just compare outputs, or pointing out duplicated code that could possibly be refactored, or running static analysis tools to point out potential bugs. These are useful things, no doubt, but this is not research software engineering!!!. I was disappointed to see the trolling, attacks and general mess that resulted from the publication of Covid-Sim. And the general hypocrisy of “professional” software engineers saying that academia has a culture of mediocrity regarding software, when we can all point to many examples of truly terrible business software, or ridiculously buggy software from major commercial software providers (iOS 13 anyone?).

I was also annoyed to read that RSEs merely “bring the tools of industry into academia”. Version control has been used in academia since the days of VCS and CVS. If you want, you can look through the full history of my 20+ years of academic software development, from the CVS-to-SVN of ProtoMS, through the SVN to Git of Sire, via BioSimSpace and now to MetaWards. Testing, documentation, continuous integration and continuous deployment can be seen to be evolving through all of these codes, reaching a pinnacle now with MetaWards, which embodies every tool and technique that my years of experience have given.

There is a narrative that academic code is rubbish, and commercial code is great. This is patently false. There is another narrative that scientists and researchers cannot learn software engineering because it is too complicated, so should leave this to the professional software engineers. This is also false.

I remain a scientist. I remain an academic. And I am employed specifically to write software in a university, so yes, I am a professional software engineer (yes, I only just realised that I am a professional software engineer!). Watching ten years of hard work improving the reputation of academic software be blown away in a few days was very painful. But the past is the past. We don’t waste time making excuses for it, but instead must all redouble our efforts to build a better future.

What should RSEs do?

So what can RSEs do? MetaWards took two months from a very experienced professional RSE working full time, 6-day weeks with no holidays or breaks. This is clearly not a sustainable way of working. I estimate that if this was a “normal” project, then this work would have taken three months. And the epidemiologists’s codes are only a few thousand lines. As a rough guide I’d say you need 1 months RSE time per 1500 lines of code.

We cannot possibly add months of professional RSE time to every research project that produces some software. Nor can we expect that every researcher will learn enough software engineering to raise their code up to a “professional” level. But we also cannot let the current situation continue, where poorly engineered academic code is elevated to “national-security-level” with no research software engineering support.

While we cannot make every academic code immediately “crisis-ready”, we equally cannot ask the virus to stop infecting people while we improve the software. So, we need to somehow make sure that all academic codes that could potentially be elevated to “national-security-level” have enough basic scaffolding in place to support rapid productionalising and scale up. And we need a national RSE capability who are ready in place to perform that scale-up in partnership with the academics.

To this end, the EPSRC RSE Fellowship was fantastic because it provided a free-floating RSE capability in the UK. I was funded to work flexibly on academic research software projects, and so was able to drop everything to work exclusively on MetaWards. I was able to take the decision to do this as I only had to answer to the EPSRC (I am sure that my REF and impact statement next year will look great ;-)). As a nation, we spend hundreds of millions a year funding huge instruments and supercomputers to support “normal” research. These capability investments were quickly repurposed for some excellent Covid-research (particularly Diamond, which speaking as a one-time computational chemist, their outputs and bound ligand structures have been incredible - a truly fantastic example of why we must invest in capability).

We need to invest in people capability as well as equipment. A pool of highly skilled RSEs who can be trusted to parachute into “national-security-level” projects to support productising and scale-up of academic codes in times of national crisis is absolutely needed. Equally we need to change the incentive structure so that academics add the scaffolding that is necessary to support rapid scale-up. At a minimum, academics must publicly release their software, ideally as it is being developed. If they need to protect the software, then they should release a “reference code” that can generate the same outputs. I cannot publish a maths paper where I state the result of a proof but do not publish the actual proof or any of my working, so why is it acceptable to publish a graph without the source code? Even if “there is enough detail in the paper to reproduce the result”, the reality is that there isn’t. And there is no guarantee that the code that produced the graph was not full of bugs, and hence the results are not reproducible. We therefore need a culture change whereby it is unacceptable to publish the output from a piece of custom software without publishing the software itself (or at least how to achieve the same results from a reference implementation).

Equally, all researchers who write software should be taught the basics of the craft. Again, I would not dare to publish a maths proof if I had no training in basic mathematics. Equally, it should be unacceptable for academics to write and publish software without a basic training in software engineering. This means version control; writing documentation and tutorials; and writing tests. And given nearly all researchers need to write software, teaching in software engineering should be as ubiquitous in school and university programmes as teaching maths.

And researchers must understand and demonstrate reproducibility. Software must produce the same outputs for the same inputs (and random number seed), or, if the code is non-reproducible because it is parallel, then the code must reproduce results according to a statistical model.

We also need to remove the cult of personality / attitude of the expert whereby people accept that a code works because an expert says “I’ve run it and the graph looks right”. I have had countless bugs in MetaWards that produced graphs that looked right. Some were so small that one in fifteen graphs were wrong, and it was only because I was running large ensemble jobs for the tutorial, and taking care to query why some results “looked weird” that I found and fixed those bugs (via unit tests, which remain in the code to catch regressions). Equally, the most insidious bug we’ve had only occurs when running in parallel on Windows. It was caught because I use lots of run-time testing, and where possible will calculate values using two different methods and compare the results. In this case, for 1 in 3 runs in Windows, this run-time test was triggered for parallel runs and the code exits. If I didn’t have this test, then the graph would look right on this Windows user’s machine, they would potentially publish that result, but it would be wrong. It doesn’t matter that the graph looks (and is) right on my computer. It is wrong on theirs! As mentioned, I am a professional research software engineer with 20+ years experience. My code is full of bugs. But I apply a process (testing) to mitigate the risk of those bugs affecting production results. Anyone who thinks that they can write code that doesn’t have bugs, and that don’t need testing to mitigate those bugs, is either naive, deluded or arrogant. And, I am sorry, the middle of a crisis is exactly the time to start following proper procedure, even if it does cost you a week to set up an appropriate testing infrastructure (spoiler-alert, setting up a good testing infrastructure shouldn’t take a professional more than a day, and won’t stop anyone else from continuing to work).

Equally, we need to recognise that a bad testing infrastructure is worse than no testing infrastructure. You should be writing tests as you are writing code, and tests should exercise a good portion of the code and run quickly (seconds). Slow tests should be partitioned off and run during merges or before releases. For MetaWards, we use pytest.mark to mark slow and veryslow tests. Then make quicktest runs all of the quick tests in less than 5 seconds, while make test takes about 5 minutes on my 2016 1.1 GHz Macbook. If your code uses lots of data, then create a small input testing data set that allows your code to run on developers laptops, so that developers can run tests easily and quickly as part of their normal workflow.

Again, I know, won’t this slow you down? Academics need to understand that testing speeds you up, because the up-front investment is more than paid off by increased productivity. And the slowest thing you can do is to give the wrong advice because your code was wrong.

Reflections on the outbreak

I have spent two months watching thousands of simulated outbreaks of the lurgy spread around the UK, and have been reading lots of papers and reports about the spread of the real virus. I am not an epidemiologist. However, a research software engineer has a duty to learn enough about the research area in which they are working so that they are able to speak the language of the researchers and make meaningful contributions that go beyond running static analysis tools, adding basic “compare the output hash” integration tests, putting code on GitHub or creating a pretty website or GUI.

I put the effort into reading the papers and understanding the maths. I exposed my ignorance by asking the epidemiologists questions, and always “putting my hand up” when I didn’t understand something or got confused. I continually revisited my assumptions and corrected my misinterpretations (see the change in the tutorial regarding the description of “IW”).

I reached a point where my suggestions of how to change the model were valuable, and indeed novel (once things are quieter I may have a proper epidemiology paper on interaction matrices and multi-demographic networks). I gained enough trust that I was invited onto the epidemiologists’ slack, which is a lot like the RSE slack. And from this, I have learned a lot.

In my opinion, a second wave this winter is almost inevitable (and that was before the latest mess in government that has all-but destroyed trust in the lockdown, and likely the tracking app too).

To caveat, I have only modelled the lurgy, but have run thousands of simulations. To this model I added a infectious asymptomatic stage, whereby individuals with no or low symptoms were able to continue with their normal behaviour, but were silently infecting others. This stage made the lurgy extremely difficult to control. It is like a fire, and infected individuals are like sparks. If they hit a group of susceptibles, then the outbreak ignites. Concepts like R0 are basically irrelevant at these early stages, as one or two people suddenly infect dozens or hundreds. And cases like the Itaewon cluster in South Korea, the Church Choir cluster, and others, show that prediction of the outbreak in these early stages is essentially impossible. You cannot know when or if a single infected individual will travel to a new city, or interact with a diverse group, and thus create a new, large and quickly spreading cluster.

No matter how I modelled lockdown, every time it was released you would have one spark find its way to a susceptible group and reignite the outbreak. Yes, potentially contact tracing these small fires could work, but modelling self-isolation suggests that the time period you have between someone thinking they may be ill, and tracing and isolating all those they may have infected for the lurgy was extremely short - i.e. potentially 1-2 days. And everybody had to comply with self-isolation. If just 10% of individuals didn’t take it seriously, or made up their own rules, then the outbreak spread massively. I do not believe we will reach the speed and availability of testing, nor the level of adoption or compliance necessary to make contact tracing or self-isolation work, and the social and economic cost of these is too high for something that will not be effective.

“Every little helps” was one response I had when I tweeted this. Unfortunately this reply shows how little we humans understand complex networks. Yes, you self-isolating will break your own path in the network. But it only takes a few open paths for the entire network to be overwhelmed. Or, to use an analogy, there is no point installing sprinklers in your 5th storey apartment if no-one else has installed sprinklers and the students in the basement flat are having a bonfire party. Indeed, you wasting time installing sprinklers has a huge opportunity cost, as you really should be using your time and energy to move house or find an alternative location for the students to hold their party…

The sad truth for the lurgy, and I suspect also for covid, is that most of the transmission of the virus is by people who are infected and either asymptomatic or don’t really notice they are ill. Once they become ill enough to notice and stay at home, and to trigger an alert on a contact tracing app, then it is too late - they have already infected others. The only pathway I see is that we need to accept that coronavirus is now endemic in the human population. It will never go away, just as the “common cold” or flu have never gone away. We need to adapt our society and our culture to adopt practices and building designs that minimise transmission and protect the vulnerable.

So what changes do I think are needed? First, it is becoming clear that this is an airborne virus. The hamsters and face masks study, which I am still waiting to see published, so “not peer-reviewed yet warning”, raises two important observations;

  1. Hamsters in the uninfected cage became infected when exposed to the air from the infected cage. This means that the virus can spread via breathing only. The hamsters were not touching each other, coughing or sneezing over each other, or touching infected surfaces. If this is confirmed by other groups then this is a very significant observation. And no, I probably won’t then want to get on a plane, or sit in a crowded bus, or be in a small meeting room for an hour, etc. etc…

  2. The level and severity of infection was significantly reduced when a simple surgical face mask was placed in the tube connecting the two hamster cages together. This means that even basic face masks block a good percentage of the viral particles, and thus reduce the number of hamsters infected, and importantly, reduce the viral load at the time of infection (thereby giving the immune system more time to learn how to beat the virus before the body is overwhelmed).

I don’t think it is a co-incidence that SE Asian nations have been spared the worst of the virus. They have a strong culture of good hygeine and mask wearing. I spent a long time in Thailand and am married to Thai national (we got together when she was using my software to study the flu outbreak in 2010-11). In Thailand, if you are ill and come into an office, you had better be wearing a mask. You do not cough on people. You don’t sneeze on people. You don’t blow your nose in public. You don’t shake hands.

Equally, there is a culture of shaming if you become ill. You work out who could have infected you and you go and talk to them and tell them off. The Thai experience of bird and swine flu mean that the public know what H5N1 and H1N1 means, and air conditioners and hand sanitisers are marketed as “killing all viruses inluding H1N1”.

Thai culture, like most SE Asian cultures, responded to these outbreaks by developing practices that naturally minimise spread of airborne diseases. To quote the report from the lessons we could learn from Thailand from the H1N1 flu outbreak;

“The use of good hand hygiene practices, social distancing measures, and face masks was emphasized in national policy and prevention guidelines. These measures were widely promoted and implemented, particularly during the first pandemic wave, when awareness and anxiety levels were high.”

We need a similar culture change to routinely adopt Thai (and other SE Asian) practices in the UK:

  1. Assume everyone, including yourself, is infected. Wear a mask in enclosed environments. Wash your hands regularly. Do not shake hands, but instead bob your head or wai.

  2. Put cleaning materials on all tables (or carry your own) and adopt the practice of wiping the table clean before you sit down and eat or have a meeting.

  3. Put cleaning materials in the toilets (or carry your own) and adopt the practice of cleaning the toilet seat and surroundings before you use it.

  4. Boiling water should be placed next to cutlery in cafeterias so that you can wash the cutlery before eating. Eating implements (chopsticks, straws, cups) are individually wrapped in plastic (sorry World) so that there is no chance of cross-contamination.

  5. Design rooms with vertical airflow and use good ventilation to either exhange inside with outside air or use good virus-catching filters.

  6. Have all employees, particularly those involved in food production, wear masks and/or face guards (like these here from Korea and Singapore. These were routinely used from before the outbreak).

  7. Reduce face to face meetings or large gatherings to only those that are necessary. If something can be done from home or from a small personal office, or if something can be done online, then it should. Good bye hot desking!

  8. Most importantly, do not go out if you have any symptoms of a cold, flu or coronavirus style illness (which includes driving four hours to another city). Do not go to work. Do not return until you are 100% better. And this last point is hard because many cannot take this amount of time off. Hence, we need legal safeguards and the culture change to protect those that stay home, and to shame those that come in.

These cultural changes are not difficult. We have, perhaps 4-5 months before the second wave hits. I am a software engineer, and not an epidemiologist, so trust me on this topic as much as you would trust an epidemiologist when they write or talk about software.

But do look at the empirical evidence. Japan - 16.5k cases, 825 deaths. Singapore - 16.5k cases, 825 deaths. Thailand - 3k cases, 57 deaths. Vietnam - 325 cases, 0 deaths. Myanmar - 201 cases, 6 deaths.

Even though reporting is different, testing is different, “they may be hiding their cases” etc., UK - 260k cases, 36.8k deaths (and the UK has about the same population and geographic size as Thailand).

As a developer, the one rule that binds them all is keep it simple, stupid - KISS. Complex bluetooth-based tracing apps, complex lockdown messages, complex rules that even those at the top interpret differently and don’t follow are the antithesis to KISS.

In Thailand, the malls are open and everyone is enjoying shopping and eating in restaurants again. A simple QR-code reading app counts how many are inside each building and limits their numbers. You can see on a map which shops have space and where you would have to queue. If an outbreak came, they have the details of who was in which shops. Restaurants have used mascots or toy pandas to block seats and socially distance customers. Handwash and masks are freely available and required to be used, with guards on the doors making customers follow the rules.

They are not perfect, and I am sure there is much they have and will do wrong. But it is clear to me that culture and collective community action are our most effective tools to fight this virus. Success is not who has the best technology or spends the most money, as the rich and technologically advanced West has demonstrated. Put simply, which culture do you think is better placed to survive the second wave? A culture who wear masks because they offer some protection to others, or the culture where people argue that masks are useless because they don’t offer 100% protection to me?