Capstone appendices

Ink & Switch

December 2018

The following is supplementary material to Capstone, a tablet for thinking.

Appendix A: Technology stack

Capstone uses Typescript as a type-safe Javascript variant, React for rendering, Hypermerge for persistence and syncing, all running on a Google Pixelbook with Chrome OS.

For the full story on why we chose a Chrome OS device rather than other options like an iPad, Surface, or Android tablet, see the platform showdown.

Typescript

Having chosen to build on the web stack, Javascript is the default language. Our past experience with larger projects led us to want something typesafe.

We were happy with Typescript. The type-checking caught all kinds of deep bugs that saved us headaches later. On the other hand, sometime it led to large investments of time in trying to figure out just the right type model.

We used Prettier and TSLint, combined with Visual Studio Code and plugins, to ease working with Typescript.

Code reformatting on save (inspired by Elm-format) was a particular favorite on the team.

CRDTs, Automerge, Hypercore, and Hypermerge

Ink & Switch and the University of Cambridge computer science department have been developing an open-source, peer-to-peer collaboration layer based on conflict-free replicated data types (CRDTs) called Automerge.

See Martin Kleppmann’s talk on the motivations and theory behind Automerge.

We combined this technology with the Dat project’s Hypercore to create Hypermerge, a local-first persistence layer for applications that allows any number of users and devices to work on their own documents disconnected, and then merge them back together once they discover each other on a network.

This choice of storage technology reflected a goal we had for the Capstone project that was never fully realized: realtime collaboration on boards between users, like one can have in a cloud application like Milanote or Figma. But Hypermere enables collaboration without the privacy concerns for the user of sending data through a third party; or the operational cost for the app developer of running cloud infrastructure.

See Pixelpusher: Real-time peer-to-peer collaboration with React for an expanded description syncing without servers, or “Google Docs without Google.”

In the end we only used Hypermerge for syncing between the user’s desktop browser extension and Capstone on the tablet (see Shelf as a portal).

Capstone is already capable of realtime collaboration. You can open a debug panel by putting the tablet into laptop mode, pressing right shift, and copy-pasting the workspace URL into a message to a colleague. If they paste that URL into their Capstone, you’ll be working on the same board.

Debug panel with workspace URLs for realtime collaboration between Capstone users.

The Automerge and Hypermerge libraries developed further during the course of this project. That includes undo support, a frontend/backend API split to allow expensive CRDT calculations off the rendering thread, and discovery-cloud as a stopgap for discovery-swarm which is unreliable on Chrome OS.

Building on web technologies

One appeal of Chrome OS is that it is designed to run applications built on the web stack (HTML, CSS, the DOM, and Javascript). The web stack has historically had very good developer experience, and our team has had good results building prototypes on Electron, which uses the web stack for desktop applications.

Past Ink & Switch projects built on web technologies include Livebook (scientific notebook) and Trellis (kanban board).

Our hope was that Chrome OS and Chrome Apps would feel like Electron on a dedicated operating system and hardware, sort of an “Electron machine.”

See Lisp machines.

We encountered many engineering problems related to the ways that the Javascript and web standards supported by Chrome Apps differ from those in Node.js, NPM, and standard web browsers. For example the strange and buggy API to localStorage; access to sockets (particularly in our need for UTP); difficulty getting an application background thread that doesn’t block the UI but can also access I/O like disk and network; fetch() limitations such as refusing Allow-Origin: * (seemingly arbitrary for an app that already has full TCP socket access).

React, the DOM, and transitions and animation

We similarly found that web technologies were a questionable fit for the type of continuous transitions (for example, zoom in and out) we wanted to implement, as well as micro-interactions with motion design and small animations that users of touch platforms have come to expect.

React is for rendering documents, not animations and spatial movement — so it felt like the wrong tool for this kind of application. In web browsers, clicking on a link usually results in a new document being rendered from scratch, and the DOM of the new document might be completely different from that of the previous document. There is no native or obvious way to animate the transition between the two.

Many native app platforms embrace the concept of a navigation hierarchy. For example, iOS offers a navigation stack for moving up or down a hierarchical set of application screens. This includes default transition animations and an API for customizing these transitions down to the last pixel and making them interactive, reversible, and truly responsive to the user input.

Take a simple example: the user deletes a card, but it should fade to zero opacity over a few hundred milliseconds before being removed from the DOM. In React we do this by adding object state that turns on a CSS class which defines the transition. But we found that doing this frequently resulted in convoluted state objects and lots of added complexity in the render function.

React and the web stack generally do provide some capabilities for animation between state. But there is no direct way to make transitions map directly to the progression of a user’s gesture. Perhaps more significantly, there is not a culture of transitions and motion design for web applications like there is for touch platforms.

For further reading on animation with web technologies, see Designing Interface Animation book and talk by Val Head.

Finger and stylus gestures

We started with Hammer.js for recognizing the user’s fingers and stylus, but quickly outgrew it.

Our hybrid finger/stylus gestures were a problem for the web pointer input APIs. Touching the stylus to the screen sends a pointercancel event and no further touch input. This is probably related to palm rejection, but lack of application control on this was frustrating.

We also found ourselves wanting wet vs dry stroke support in the operating system, like the excellent Inking APIs in Microsoft Windows. And we struggled to get low-latency drawing from the stylus, on par with what the operating system claims to support.

Overall, the web stack did not serve our needs for this application. We’re open to the possibility that we’re “doing it wrong,” but even our attempts to find web-stack engineers who have the depth of knowledge about transitions and complex gesture input were not successful.

Appendix B: Design language

In designing a product, it's helpful to start with the “why” of first principles. That leads to the “what” of a design language, which in turn leads to the “how” of particular implementation details.

The table below illustrates our why → what → how model for Capstone.

Principles Design language Implementation
  • fast
  • move at the speed of thought
  • precise and predictable
  • sense of reliability
  • habituating
  • low latency on user input
  • immediate command execution
  • modelessness, monotonicity
  • one way to invoke a command
  • same action → same result
  • no accidental activation
  • muscle memory actions
  • low-latency inking, typing, drag
  • don't rely on time to disambiguate
  • blindly chainable commands
  • commands are minimally precise
  • be a human
  • augment / resonate with the mind
  • visual / spatial memory
  • use your mental gpu
  • full range of motion
  • associative thinking
  • scale-free
  • works the same everywhere
  • consistent metaphor
  • represent things visually / spatially
  • continuous and connected
  • provide sense of location, scale
  • incremental zoom
  • flexible arrangement, organization
  • scan images, read words
  • metadata and associations
  • ops work for docs and collections
  • things always where you left them
  • lightweight open - just look at it
  • show the thing (image), not the abstraction (filename)
  • fast visual scanning of options
  • active doc/working set/archive are one thing separated by degrees, like a desk/drawers/closet
  • custom mixed media layouts
  • desk w/ active and reference docs
  • see at different levels of zoom
  • see the whole room / library
  • place things ~anywhere
  • ink ~anywhere
  • studio for ideas
  • sanctuary / calm / reflective
  • inviting for creation
  • joy of the blank page
  • as natural as paper
  • the space inspires you
  • you choose what you see when you walk in
  • last thing you put down is the first thing you pick up
  • always easy to get a fresh page
  • no ceremony to start
  • casual but well-designed
  • neutral color scheme
  • no chrome / tool palettes
  • 100% of the screen = user content
  • feel of paper, canvas, wood, matte metal
  • projects are organic / optional
  • stacks/folios/boards of docs
  • handy set of blank pages

Appendix C: Web page archiving

Capstone’s Clipper browser extension allows the user to save entire web pages into their Capstone tablet’s storage. This proved to a significant technical design challenge.

The goals are simple:

  1. When the user clips a website, save exactly what they are seeing. They should feel confident that if they look back a day, a week, or a year later they’ll see the same thing.
  2. Be able to use it offline. If the user is sifting through their saved raw material on a train, in a library, or at a conference with a spotty network, the saved websites should load just the same as text, images, and sketches in Capstone.

Problems with archiving web pages

Digital preservation on a large scale is a massive challenge. See Archiving the Dynamic Web. For Capstone, we’re trying to solve the much simpler problem of one person saving one web page for themselves.

Browser’s File -> Save

What do browsers suggest as their own archive format? Every major browser has a File -> Save function in its menus and even a hotkey (such as ctrl-S) mapped to it. But despite the prime UI real estate this is given, they barely work at all on the modern web.

The front page of Wikipedia saved from a browser and re-opened in the same browser. While the text is preserved, most users would not consider this a good snapshot.

The save formats are an .HTML file or “Web Page, Complete” which is a zip archive of media on the page. These often don't reload into the browser reliably. Sites which use lots of dynamic content, absolute URL references to external media, XHR requests, etc are likely to not render at all.

Print to PDF

Another browser function is print to PDF. The output is legible and the PDF format has proven durable over time. But it still bears only a passing resemblance to what the user was seeing when they saved it.

Wikipedia as a PDF.

Screenshots

In the age of mobile apps with content the user cannot directly save or copy, screenshots have become the gold standard for sharing digital content.

It’s a trustworthy format and looks like exactly what the user saw. But so much of the original content is lost: they cannot select, copy-paste, or search the text; they cannot reflow or reformat it for a different screen; in general it loses much of its “webpageness” in the screenshotting process.

One user reports taking screenshots of iBooks and using Evernote OCR to re-encode as searchable, selectable text.

Screenshots also tend to include information the user don’t want (such as browser UI or neighboring windows) while losing information they do want (parts of the website which are outside the scroll window).

Browser plugins like Full Page Screen Capture for Chromeor Full Page Screenshots for Firefox are one solution to the scroll-window problem.

wget -r

Wget is a beloved command-line tool. Its recursive option can archive collections of pages.

One can use this to convert a dynamic-content site (say, a Wordpress blog) into static HTML that can be hosted nearly anywhere, like S3. As the website owner, this works reasonably well, if you’re willing to spend some time tweaking the output.

But for Capstone’s needs, running a background wget process doesn’t have the browser’s context (login cookies) and it loses dynamic state on the page.

MHTML, WARC, and Webrecorder

MHTML is a web standard designed specifically to solve this problem. It’s supported by Chrome in theory, but our experience is that it is essentially abandonware.

A more modern format comes from the Heritrix project: WARC, or Web ARChive. It’s zip archive similar to the File -> Save Webpage, Complete format Chrome and other browsers give you. And Webrecorder is a web capture tool that includes a desktop (Electron-based) viewer.

We considered the WARC approach for Capstone. But we saw that the rendering logic is complicated, which goes against the idea of a simple, future-proof format.

Awesome web archiving and Archiving URLs offer more approaches and tools.

Our favorite solution: freeze dry

The solution we settled on for Capstone is freeze-dry. Its use was just a few lines of code.

Freeze Dry takes the page’s DOM as it looks in the moment, with all the context of the user’s browser including authentication cookies and modifications made to the page dynamically via Javascript. It disables anything that will make the page change (scripts, network access). It captures every external asset required to faithfully render that and inlines it into the HTML.

Capstone browser extension saves a copy of the web page into the shelf via freeze-dry.

We felt that this is a philosophically-strong approach to the problem. Freeze-dry can save to a serialized .HTML file for viewing in any browser; for Capstone, we stored the clipped page as one giant string in the app's datastore.

Freeze Dry was developed by the author of WebMemex, a project with some similar values as Capstone.

Appendix D: FPS counter and culture of fast software

One of our product/design goals was to make a tool that operates at the speed of thought: the user does not need to wait in order to get their thoughts on a page.

On the product side that means removing steps in a task like ingesting a new piece of media or making a mark on the digital page. On the technology side, it means sustaining 60 frames per second for output, and keeping latency very low (<50ms for most operations).

See Slow Software.

Framerate

Framerate is well-understood within some software professions such as video game creation. But it seems to be less used in productivity applications.

Briefly, most computer displays update themselves 60 times per second, with each of these updates being one frame. Performant software responds to user input and refreshes the display to match in as few frames as possible, and does not skip frames (aka “jank”) when doing heavy computation.

We tried to create a culture of fast software on the Capstone team. This means, first and foremost, awareness for all team members, because you make what you measure.

The Chrome Dev Tools FPS counter is a potential place to start:

Chrome Dev Tools framerate monitor floats in the upper right.

In practice, we found this hard to activate team-wide (buried in a submenu, doesn’t preserve state across app restarts) and hard to interpret (FPS counter only updates during refreshes versus when the page is at a standstill).

For Capstone we used an FPS counter embedded in the app, using the excellent Stats.js javascript performance monitor.

The stats.js FPS counter. Enable in Capstone via the right-shift debug menu.

Platform choice

Experienced software developers know that performance of an app or codebase tends to be the best when you start out, and only go downhill from there. The dual arguments of “we’ll optimize it later” and “that’s just a performance optimization” are logically compelling but empirically weak.

Platform choice matters a lot. Worse-performing languages and runtimes reduce the performance the user experiences for the comfort of the developer. There’s a good argument to be made for this tradeoff (hardware is cheap, programmers are expensive), especially in the research/prototyping environment where the goal is to move fast to test ideas.

But this is how we get to a place where computers are faster than ever, and yet loading a chat program on a top-of-the-line workstation takes 45 seconds.

We had some concerns in choosing the web stack, which has historically optimized for developer experience over application performance. But the particular subset of web technologies seemed promising:

In practice, it did not live up to our hopes. We labored extensively to keep performance up with techniques such as using Preact in place of React. In the end we had an app that was 60fps much of the time, but the engineering calories expended to achieve that seemed out-of-sync with the capabilities of the Pixelbook hardware.

Video games

Video games are a subset of the computing industry that has a strong culture of performance. One might say that action games like Pac-Man, Quake, or Destiny are speed-of-thought tools.

Video game developers are part of this equation, but so are hardware manufacturers with entire brands built on low-latency input devices and computers; low-latency monitors; and of course the huge market for GPUs. Virtual Reality gaming is pushing this even further, with refresh rates of 90hz.

Even end users of video games (aka players) are aware of framerates, with FPS counters built into gaming platforms and players of competitive games able to count frames of latency.

So the question is, why do developers of productive and creative tools not have this same awareness? In our anecdotal experience, many software engineers with a web or mobile background are only dimly aware of framerate as a concept, let alone know whether their own apps are maintaining a steady framerate.

There are occasional exceptions like 60fps with Rust and Electron.

Or to put it another way: why does a text editor struggle when video game engines can render a full-detail 3D city without slowing down?

We don't know the exact solution here, but we'd like to propose all of us doing productivity app development might want to take a few cues from video game performance culture.