End-user Programming

Today’s software apps are like appliances: we can only use the capabilities exactly as programmed by the developer. What if we, and all computer users, could reach in and modify our favorite apps? Or even create new apps on the fly according to our needs in the moment?

This is end-user programming, a vision for empowered computing pursued by bright-eyed computer science visionaries. Its rich history reaches back to the 1960s with programming environments like Smalltalk and Logo. Notable successes since then include Unix, the spreadsheet, Hypercard, and HTML. And today, newcomers like Zapier, Coda, and Siri Shortcuts are trying their own approaches to automation and dynamic modeling.

But despite forty years of commercial products, open source, and deep academic work, we have yet to reach an end-user programming utopia. In fact, the opposite: today our computing devices are less programmable and less customizable than ever before.

This article tackles the question of why this is so. We’ll start by describing three qualities we think are important for end-user programming: embodiment, living systems, and in-place toolchains. We’ll survey the prior art and try to illuminate what has made this problem so immensely difficult. Then we will document the experiments we’ve done at the Ink & Switch research lab in adding automation and customization capabilities to a digital sketchbook application.

We believe in a computing future where programming is within the grasp of everyone and hope this article can inspire and challenge our industry.

We welcome your feedback: @inkandswitch or [email protected].

Why end-user programming?

Empowered computing

In their computing lives, power users often want simple extensibility. For example: adding download capability to Instagram, setting up windows for a live-stream session, generating recipes and shopping lists for a home kitchen inventory, or automatically keeping timestamp-named backups of Illustrator documents.

Desktop programs, web apps, and smartphone apps are to various degrees like appliances with a hermetic seal to keep users out of their interior. If there is not already a button for a given function, the only option for users is to petition the developer.

Some tools have a history of extensibility, such as vim and emacs which allow the user to record and replay macros, configure keybindings, and write plugins that run within the editor context. But most apps offer no such capabilities — and if they do, it’s usually by exposing programmatic API made for developers.

There’s an abyss to cross between using an app and modifying it with code by calling APIs. The user has to switch to a whole other paradigm including setting up a development environment. Consequently, few users take the step from using a tool to customizing or making their own tools.

At Ink & Switch, we believe that software should be extensible in an easy, everyday manner. We believe users want to automate, customize, or even make their own tools without much ceremony. History offers us some great examples.

Success stories

Unix pipes allows the user to chain together simple commands to make an ad-hoc program.

The UNIX command line is the quintessential end-user programming environment. An operating system built from small tools, with a command-line interface where the user is writing an ad-hoc program with every interaction. If you want to create a file automatically named with the timestamp, the touch and date programs can be combined in a matter of seconds.

VisiCalc, the first digital spreadsheet, was released in 1979.

Perhaps equally as pervasive as Unix is the spreadsheet. Spreadsheets are many people’s first encounter with writing simple computer code (such as SUM(A1:A10)) with concepts like variables and iteration. Spreadsheets serve as ad-hoc databases for project management, contact management, shipping and receiving, and form generation for many small businesses. And typically the people creating these relatively complex programs inside the spreadsheet environment are domain experts rather than professional software engineers.

Hypercard helped give birth to the first wiki which in turn led us to Wikipedia.

A beloved but less commercially-successful example than either of the first two examples is Hypercard. From The Hypercard Legacy:

“HyperCard was a tool for making tools – Mac users could use Hypercard to build their own mini-programs to balance their taxes, manage sports statistics, make music – all kinds of individualized software that would be useful (or fun) for individual users. These little programs were called stacks, and were built as a system of cards that could be hyperlinked together.”

The combination of a simple text editor like Notepad++, an FTP client (here, WinSCP), and a low-cost shared hosting provider with PHP preinstalled remains one of the most accessible ways to get started with software development. Image from Elated’s How to Upload Your Site with WinSCP

A final success story is the web: specifically, HTML and some companion technologies like CSS, PHP, and Javascript. Combined with low-cost FTP shared hosts, Wordpress, Geocities, and Squarespace, this method of software creation has proven incredibly accessible.

Different from regular programming

Why “end-user”? We need a term to differentiate from programming done by professional software engineers who build general-purpose apps designed to be used by thousands of people.

End-user programming is done by a motivated power user who may or may not be a professional programmer, modifying or creating small ad-hoc tools for their own use or perhaps to share with a handful of colleagues.

There shouldn’t be a chasm that the user has to cross in order to customize the behavior of their software. Going from using to inspecting to modifying the system should be a gradual process where each of the steps is small enough to be easily discoverable. The user should not need to switch to “programmer mindset” but instead stay within the context of the application. They can stay close to their work and their ideas.

There is a moral imperative for end-user programming: to avoid a digital divide. A world where only a tiny elite of high priests (aka “programmers”) have control over what happens in our computing lives is concerning. This is the well from which programming literacy advocates such as Code.org draw their vigor.

More practically, the business case for end-user programming is substantial. In enterprise software, end-user programmers are sometimes referred to as “citizen developers”. Businesses of all sizes run their daily processes on tools built by non-programmer domain experts with enterprise-oriented tools like Filemaker, Microsoft Access, Force.com, and Quick Base.

Prior art and three qualities

Given that end-user programming is desirable and potentially achievable, what are some ways to make tools that support this goal? We suggest three qualities of tools that support end-user programming: embodiment, living systems, and in-place toolchains.

Let’s look at each quality and survey its prior art from academic research and commercial products.

Embodiment

One of the hardest parts about making software is that it requires the programmer hold many abstractions in their mind. The programmer must mentally model the data structures and flow of the program, then translate those into a symbolic representation, aka code.

Professional software engineers are often drawn to the field because of their innate capability with abstract reasoning. They train their whole lives to improve this capability and even then they require large, uninterrupted blocks of time to concentrate in order to successfully write software.

Excel’s formula language editor provides a visual helper for the range of cells. The flow of logic from cell to cell is still invisible and requires abstract reasoning on the part of the spreadsheet creator.

Such a deep investment isn’t viable for the more casual user, who is typically devoting their time and brainpower to being an expert in another domain outside of programming. We need ways to require less abstract reasoning and less of the user’s mental bandwidth to be given to the mechanics of modeling their program and data, leaving more space for them to think about the domain of their application.

One way to do this is embodiment. This is where a tool makes the elements of a working program concrete, usually via visual representations onscreen, to the user.

Scratch offers a way to edit programs by snapping compatible blocks together.

Embodying the program syntax and flow can be found in “visual programming” tools such as Scratch, a block-based learn-to-program environment, or realtime audio/visual environments like MaxMSP or OpenMusic.

MaxMSP is an environment for electronic music and visual effects production. Program execution flows visually through nodes connected by lines, with each input or filter displaying its contribution live.

But a graphical syntax is only one form of embodiment. One can also use a traditional code-based programming language such as Javascript or Swift while providing embodiment of the output as a micro-visualization.

Bret Victor’s concept for visualizing iterations and output in Learnable Programming.

Embodiment means, broadly, making elements of the programming experience more concrete and visual. Spreadsheets do this for the source data; Scratch does this for syntactic program elements including how they can connect together; MaxMSP does this for execution flows; Swift Playgrounds does this for results of each line of imperative code.

Figuring out the right things to embody, and how, is one of the key challenges of end-user programming.

Living system

A second quality that we propose for end-user programming is interaction with a living system.

In the classic software development model, engineers perform a save/build/run cycle that can take seconds, minutes, or longer. But end-user programmers need something with more immediacy. That means shortening the save/build/run loop, ideally until it collapses to nothing.

Spreadsheets show their pioneering approach here. A spreadsheet author “runs” the sheet by entering a formula into a cell and pressing enter. The entire sheet is recalculated without any additional steps, and for most sheets on modern computers, this happens instantly.

This liveness is visible in another end-user programming success story: SQL, a language for interacting with databases. Business analysts, product managers, and other non-engineering but motivated power-user roles in organizations often teach themselves a bit of SQL and use it to interact with their company’s production database (or a copy of it).

phpMyAdmin offers an end-user accessible interface for interacting with the living system of a production database. The image here shows querying a Wordpress install, from MySQL Basics

In spreadsheets and SQL, the user gets their immediate results after pressing enter. Taking this idea further, there is a whole academic community around “live coding.”

For example, the LIVE track at Splashcon describes it this way:

“Live programming gives the programmer immediate feedback on the behavior of a program as it is edited, replacing the edit compile-debug cycle with a fluid programming experience.“

Ink & Switch explored the topic of live coding for scientific notebooks with the Livebook project. The user gets immediate feedback as to the validity of their syntax but also sees context-sensitive help about the variable under the cursor and the results in the connected cell below as they type.

Adding live coding to programming tools is a huge technical and design challenge. Microsoft Research has explored this space in depth including creating several ground-up programming languages and IDEs that support live coding.

But the “living systems” quality of end-user programming is broader than the fast feedback loops of live coding. It also includes the ability for the system to change itself from within, giving the end-user programmer a feeling of open-ended possibility and complete ownership over their tools.

A small example here is is the developer tools console available in most desktop web browsers. Here the user can grab any element on the page and change it: its color, its size, whether it appears at all.

Browser Dev Tools console editing the background color of the Facebook homepage. Interaction with a living system can feel empowering.

The lineage of Smalltalk, Squeak, and Pharo takes this idea to its furthest: if a system is fully written in itself (sometimes called “self-hosted”), the user can change absolutely anything about the system. Unlike a traditional operating system, where deeper changes require a reboot or even a separate development device, self-hosted systems can inspect and change anything on the fly. They are living systems.

Dynamicland is a modern take on a highly-interactive living system. Users can manipulate the system by moving paper under a set of computer-vision scanners; and they can program the system live by typing into wireless keyboards. Image from Notes from Dynamicland: Geokit by Omar Rizwan

In-place toolchain

A third quality that we think is necessary for end-user programming is an in-place toolchain. The user should be able to edit their programs without installing additional tools or programs. Further, they should be able to use an interface and set of abstractions that is as close as possible to the ones they use for their regular daily work.

On the iOS platform, for example, creating an app requires not only a separate toolchain, but an entirely different platform (macOS). This creates the widest possible chasm between end users and software developers.

Shortcuts (originally Workflow) is Apple’s attempt to bring automation to their mobile platform. Image from iOS 12: the MacStories Review

In-place toolchains are not just about pre-installed tools. The truly hard part of this is allowing the user to apply the concepts and interface paradigms that they have already learned in their daily use of the application.

The Google Docs script editor is available to any user from the Tools menu inside all Gsuite apps. Shown here is a script to export a Google Doc to Markdown format.

Compare to Unix, for example. A user of a Unix system learns to type commands, edit text files, and copy/move/delete files and directories as part of their regular use. And when they are ready to write a program, they can continue using the same concepts and interface, because a Unix shell script is just a series of commands saved into a text file. That text file can be copied, moved around, edited, and deleted just like any other file.

Zapier and IFTTT offer tantalizing glimpses of end-user accessible automation for the web and cloud APIs. But by the full measure of the in-place toolchain idea, these fail because, for example, the act of automating your smarthome components with IFTTT requires a completely different interface and set of concepts from using those components day-to-day.

Automating a WeMo smart plug to turn on a light each night with IFTTT.

A similar problem of “tool in its own standalone universe” exists in academic examples of end-user programming. Eve, Sketch-n-Sketch, and Pharo are technical design marvels. But they require the user to enter an entire new world of tooling and concepts, leaving behind everything they already know about using computers. They have to be motivated to want to program to begin with, rather than having it there waiting for them in an environment they are already using, such as their word processor, their web browser, or their photo editor.

Coda gets high marks for creating a document editor that can be enhanced with calculations and automation. In theory, a user might want to use this as an alternative to Google Docs, Microsoft Word, or Dropbox Paper to author a document without programming and then add those capabilities later.

Airtable offers the user a conventional spreadsheet for storing tabular data with the ability to move to programming via advanced filtering and grouping.

In-place toolchains are one of the greatest obstacles to widely-used end-user programming.

The Capstone experiments

With the context of those three qualities and what has come before, our research lab set out to do some end-user programming experiments. Our testbed was a tablet thinking tool called Capstone.

Capstone is a prototype by Ink & Switch which offers mixed-media cards and freeform inking on a shared canvas. It’s intended as a place for creative professionals to develop their ideas through research collection, sketching, and mood boards.

If we assume that a creative professional is using Capstone on a regular basis for their notes, clippings, and research, what programming or automation capabilities can we offer that would give them more power, flexibility, and customization within that context?

We ran five experiments to try applying embodiment, living systems, and in-place toolchains to Capstone.

Edit card source

The Capstone user interface consists of cards on a canvas. As a simple starting place, we built an editor inside the Capstone system that allows the user to edit the underlying code of a given card. Think of this as similar to the web browser’s “View Source” option, but with write capability.

Text card on the left, slide-out Javascript/React editor on the right. Here, the user makes small customizations to the color of the card.

On the surface, this seems to fulfill the in-place toolchain goal: the user needs nothing additional to start editing the code of their chosen card. It’s a live coding environment where every keystroke re-renders the card so that users see the results of their work right away.

However, we found the actual use of this approach uninspiring. It felt obvious, not a bold new direction that hasn’t been tried before. From the user’s standpoint, it felt editable but not inviting to the user. The jump from cards-on-a-canvas in a touch/stylus interface to a cryptic code editor with Javascript and React code was incongruous.

Data pipelines

For our next experiment, we wanted to embrace the cards+canvas model and touch interface of Capstone, while also borrowing some ideas from Unix pipes.

What would this look like for a more visual environment like that of Capstone’s cards-on-canvas? Our approach was to allow each card within Capstone to take input and provide output, allowing the user to chain them together with the visual equivalent of a pipe operator.

The now-defunct Yahoo Pipes allowed users to pull data from websites and APIs to produce outputs in a visual environment.

We added fields to each card: uses and exposes. Like the names suggest, cards could now wait for some input, and expose some output. Those inputs and outputs were strongly typed to allow for exchange of richer datatypes like arrays and objects.

In the examples below, we’ve borrowed a use case from this Emacs literate programming example. Here, a teacher begins with a table (spreadsheet or CSV) of student grades and wishes to create an ad-hoc dashboard showing pass/fail for each student.

A card contains a CSV that exposes raw text. Another card consumes that text and turns it into a table. Colors give cues as to which cards can be connected or are currently connected.

By dragging the “exposes” label from one card onto another card’s “uses” label, the user creates connections between cards. Connected cards are color-coded, inspired by spreadsheet cell-and-range color coding. Based on user feedback, we found that users wanted to keep cards in a pipeline physically near each other.

A “grep” card type filters input.

By combining multiple simple cards users could build data-processing pipelines. Further enhancements included multiple inputs/outputs and renaming inputs or outputs.

A multi-step pipeline including multi-input cards and named inputs/outputs.

Many attempts at more accessible programming languages are weakly typed, under the hypothesis that types are unforgiving to newcomers. Our team’s instinct is the opposite: strong typing, with the right interface, can be friendlier for newcomers by making program components “snap” together like building blocks. If the blocks fit, the program will probably work. See the previously mentioned Scratch; Elm’s strong typing for eliminating runtime errors; and Hazel’s “typed holes” live programming environment.

Strong typing prevents the user from connecting cards that don’t fit.

This standalone prototype is available here.

Unix pipelines continue to be the reigning champion for composability — something not yet replicated in GUI environments. We feel that this CSV pipeline experiment produced positive findings supporting the value of strong typing (here as uses/exposes), and showed a potential interface for in-place toolchains that don’t break out of the touchscreen interface.

On the other hand, visual embodiment of the data pipeline created some problems. It adds visual clutter (a problem with many/most visual programming systems). Furthermore we found a tradeoff between grouping cards together in a way that makes the program flow clear (e.g. pipeline goes top-to-bottom or left-to-right) versus grouping cards in a way that reflects how the user wants to think about their content more generally.

REPL

For our third experiment, we decided to relax the constraint of an in-place toolchain in exchange for better results on other dimensions. In particular, we wanted to see if the REPL (read-evaluate-print loop) used in many programming systems would be of value.

The built-in REPL for Ruby.

REPLs are traditionally built on wire protocols. That is, the user’s console sends commands to the runtime system over the network. But Capstone uses a synchronizing data model which keeps all of the visible elements stored in a live document. This allowed us to build the REPL by writing directly into the document from a session on another device.

Modifying canvas background color and moving cards via commands in the REPL.

We absolutely loved the resulting feel of this experiment.

Being able to interact with a live system felt magical. Much like browser Dev Tools, the user can change appearance of anything via CSS. They can also interact with the cards data model to move cards around on the screen, move cards between boards, or absolutely anything else within the system. But unlike Dev Tools, all of these changes persist. The user has modified their workspace, customized it to their taste, via a fully programmable interface.

Our team was energized by this result and we instantly wanted this capability for all of our existing systems such as the desktop computers we use in our regular work lives. But we also quickly ran into what would likely be the biggest chunk of work in making a system like this real: API design.

Typically software systems are built with an internal API used only by the professional engineers building that system. These functions are often minimally documented and have obscure names that may reflect history or even internal in-jokes, whereas a public-facing API is designed separately, well-documented, versioned, and kept more stable.

Our finding here is that for a living system to work, the internal and external APIs need to be mostly the same.

We also noted that living systems produce a tension between hackabilty and the danger of user breakage. For example, the user can change a card’s background color just as easily as executing a command that would discard every card onscreen or even put the system into a crashed state or infinite loop. What to allow, how to surface errors, and how to recover are deep and challenging questions we did not explore in the course of this experiment.

Hooks

Our next move was to extend the REPL with hooks for system events, such as the user dragging a card around the canvas.

With a hook for card dragging, we could then build a “window manager” within the Capstone environment:

Setting a hook on card movement to snap to a grid boundary.

Programmable window managers are an inspiring source of prior art on this. See xmonad, Phoenix, and HHTWM.

Hooks bring a significant downside: the computation is no longer visible. The user’s code could do things hidden behind the curtain, which is the opposite of embodiment.

A parallel here is triggers and stored procedures in databases like PostgreSQL. They also have the downside of no embodiment. Hence, while SQL is an end-user programming success story, triggers are typically reserved for professional database engineers rather than SQL console dabblers.

Bots

For our final experiment we wanted to explore how the user could create long-running programs (or daemons) inside of Capstone that solved the embodiment problem of hooks.

Typically when we say embodiment we mean a visual element onscreen. But we took inspiration from the world of chat bots: what if computation was embodied as something with a bit of personality, a sense of being an actor or a collaborator in the system along with human collaborators?

An autonomous bot uses a tiling algorithm to keep a canvas tidy as the user moves cards.

Like the REPL experiment, Capstone bots still suffer from no in-place toolchain. The end-user programmer must write the script in Javascript and then issue a command to add the card (or update an existing one) in their Capstone workspace. Technical details are available in the pull request.

A subtle but important piece of the bot interface design is that a bot subscribes to all changes in the document (similar to reactive programming). This is instead of subscribing to specific event types (similar to event-emitter) such as as moving or deleting a card.

Another variation on this experiment was to allow bots to expose a small UI. Since the bot’s card already has screen real estate, why not allow programming direct interactivity?

A bot’s card offers buttons to trigger actions: here, creating a timestamp for a journal entry.

Although creating bot code does not satisfy the in-place toolchain goal, embodying them as cards has its own set of in-place toolchain benefits. In the same way a Unix user can manipulate a script file like any other file, a Capstone user can manipulate a bot like any other file.

A user’s Bot Bin, with favorite bots saved. The user uses the mirror command to duplicate the bot and then bring it to the desired location.

We noticed user behavior like creating a board full of their favorite bots and mirroring those cards onto the shelf to take them to the board where they wanted it. Deleting a bot is done by throwing the card off the screen like any other card. All operations that worked on (for example) an image card worked on a bot card as well. Computation follows the same rules as data (it can be cloned, shared, stored, etc). This feels important.

Conclusion

The potential for end-user programming remains largely a dream in today’s computing devices. The huge amount of work done by academia and industry indicates this is a very hard problem indeed. But in working on this problem via these experiments, our team feels that it is achievable with enough concentrated effort from our industry, and worth doing so.

Our positive findings from these experiments include the strength of combining strong typing (uses/exposes) with visual program flows; the magic feeling of interacting with a living system; and embodiment of long-running computation as bots, visually represented as cards with properties the user already knows.

Our negative findings included that our best experiment (REPL) still required an external/separate toolchain; that visual arrangement of computation cards can get messy quickly; and that API design and documentation will be a huge challenge for a powerful end-user system.

At Ink & Switch we continue to believe that the end-user programming utopia is reachable. Are you working on this problem, or have thoughts on what we've written here? Get in touch: @inkandswitch or [email protected].


Appendix

Appendix A: Data layer as an interface

Three of the five Capstone end-user programming experiments used a different device for the programming interface. In both cases, we were able to build on the CRDT / Hypermerge data storage layer rather than use a traditional network connection such such as ssh or nREPL.

It feels like this difference is significant.

We could speculate that this is the difference between imperative vs declarative code. Imperative programs say “run this function on the host.” Declarative says “update a portion of the document and let all subscribers to the document choose how to render the new state.”

Even in the case of a REPL, which is by nature imperative, simply storing the history of commands within the document gives us a scrubbable history of changes made to the system that is inspectable by anyone with access to the document.

We are already exploring this topic in our next Ink & Switch project. We’re experimenting with live documents as a basis for a new programming model; realtime version control that combines the best parts of Git/Github and Google Docs; inspectable change history as a way to surface the power of CRDTs to end users; and what happens with end-user programmable environments when everyone is connected to a shared document.

Appendix B: The web stack for sandboxing and hackability

We chose Chrome OS as our platform for Capstone over more mature options like the iPad or Surface largely because web technologies offer vastly more possibilities for end-user programming. On other platforms, we would have needed to embed a scripting language and runtime such as Lua.

Some examples of apps that make great use of the web’s native extensibility:

Embedding the Javascript toolchain for building and running Javascript programs in place (such as the editable cards shown in our first experiment) is straightforward. In this case we bundled the Babel compiler via @babel/standalone for building JSX files. As an added bonus, it bundles most popular presets and plugins.

One open question is how to use external libraries. A potential solution is using a service like unpkg to provide users with a fetch and cache mechanism.

Overall, the web is the only full-featured platform ever created that allows instant download and execution of a program written by a stranger (by visiting a website URL). But the perfect sandboxing of the Javascript runtime means that this action is almost completely safe. This is a truly stunning technical achievement and means that the web is a promising place for end-user programming capabilities.