The following is supplementary material to Capstone, a tablet for thinking.
We were happy with Typescript. The type-checking caught all kinds of deep bugs that saved us headaches later. On the other hand, sometime it led to large investments of time in trying to figure out just the right type model.
Ink & Switch and the University of Cambridge computer science department have been developing an open-source, peer-to-peer collaboration layer based on conflict-free replicated data types (CRDTs) called Automerge.See Martin Kleppmann’s talk on the motivations and theory behind Automerge.
We combined this technology with the Dat project’s Hypercore to create Hypermerge, a local-first persistence layer for applications that allows any number of users and devices to work on their own documents disconnected, and then merge them back together once they discover each other on a network.
This choice of storage technology reflected a goal we had for the Capstone project that was never fully realized: realtime collaboration on boards between users, like one can have in a cloud application like Milanote or Figma. But Hypermere enables collaboration without the privacy concerns for the user of sending data through a third party; or the operational cost for the app developer of running cloud infrastructure.See Pixelpusher: Real-time peer-to-peer collaboration with React for an expanded description syncing without servers, or “Google Docs without Google.”
In the end we only used Hypermerge for syncing between the user’s desktop browser extension and Capstone on the tablet (see Shelf as a portal).
Capstone is already capable of realtime collaboration. You can open a debug panel by putting the tablet into laptop mode, pressing right shift, and copy-pasting the workspace URL into a message to a colleague. If they paste that URL into their Capstone, you’ll be working on the same board.
The Automerge and Hypermerge libraries developed further during the course of this project. That includes undo support, a frontend/backend API split to allow expensive CRDT calculations off the rendering thread, and discovery-cloud as a stopgap for discovery-swarm which is unreliable on Chrome OS.
Our hope was that Chrome OS and Chrome Apps would feel like Electron on a dedicated operating system and hardware, sort of an “Electron machine.”See Lisp machines.
fetch() limitations such as refusing
Allow-Origin: * (seemingly arbitrary for an app that already has full TCP socket access).
We similarly found that web technologies were a questionable fit for the type of continuous transitions (for example, zoom in and out) we wanted to implement, as well as micro-interactions with motion design and small animations that users of touch platforms have come to expect.
React is for rendering documents, not animations and spatial movement — so it felt like the wrong tool for this kind of application. In web browsers, clicking on a link usually results in a new document being rendered from scratch, and the DOM of the new document might be completely different from that of the previous document. There is no native or obvious way to animate the transition between the two.
Many native app platforms embrace the concept of a navigation hierarchy. For example, iOS offers a navigation stack for moving up or down a hierarchical set of application screens. This includes default transition animations and an API for customizing these transitions down to the last pixel and making them interactive, reversible, and truly responsive to the user input.
Take a simple example: the user deletes a card, but it should fade to zero opacity over a few hundred milliseconds before being removed from the DOM. In React we do this by adding object state that turns on a CSS class which defines the transition. But we found that doing this frequently resulted in convoluted state objects and lots of added complexity in the render function.
React and the web stack generally do provide some capabilities for animation between state. But there is no direct way to make transitions map directly to the progression of a user’s gesture. Perhaps more significantly, there is not a culture of transitions and motion design for web applications like there is for touch platforms.For further reading on animation with web technologies, see Designing Interface Animation book and talk by Val Head.
We started with Hammer.js for recognizing the user’s fingers and stylus, but quickly outgrew it.
Our hybrid finger/stylus gestures were a problem for the web pointer input APIs. Touching the stylus to the screen sends a
pointercancel event and no further touch input. This is probably related to palm rejection, but lack of application control on this was frustrating.
We also found ourselves wanting wet vs dry stroke support in the operating system, like the excellent Inking APIs in Microsoft Windows. And we struggled to get low-latency drawing from the stylus, on par with what the operating system claims to support.
Overall, the web stack did not serve our needs for this application. We’re open to the possibility that we’re “doing it wrong,” but even our attempts to find web-stack engineers who have the depth of knowledge about transitions and complex gesture input were not successful.
In designing a product, it's helpful to start with the “why” of first principles. That leads to the “what” of a design language, which in turn leads to the “how” of particular implementation details.
The table below illustrates our why → what → how model for Capstone.
Capstone’s Clipper browser extension allows the user to save entire web pages into their Capstone tablet’s storage. This proved to a significant technical design challenge.
The goals are simple:
Digital preservation on a large scale is a massive challenge. See Archiving the Dynamic Web. For Capstone, we’re trying to solve the much simpler problem of one person saving one web page for themselves.
What do browsers suggest as their own archive format? Every major browser has a File -> Save function in its menus and even a hotkey (such as ctrl-S) mapped to it. But despite the prime UI real estate this is given, they barely work at all on the modern web.
The save formats are an
.HTML file or “Web Page, Complete” which is a zip archive of media on the page. These often don't reload into the browser reliably. Sites which use lots of dynamic content, absolute URL references to external media, XHR requests, etc are likely to not render at all.
Another browser function is print to PDF. The output is legible and the PDF format has proven durable over time. But it still bears only a passing resemblance to what the user was seeing when they saved it.
In the age of mobile apps with content the user cannot directly save or copy, screenshots have become the gold standard for sharing digital content.
It’s a trustworthy format and looks like exactly what the user saw. But so much of the original content is lost: they cannot select, copy-paste, or search the text; they cannot reflow or reformat it for a different screen; in general it loses much of its “webpageness” in the screenshotting process.One user reports taking screenshots of iBooks and using Evernote OCR to re-encode as searchable, selectable text.
Screenshots also tend to include information the user don’t want (such as browser UI or neighboring windows) while losing information they do want (parts of the website which are outside the scroll window).Browser plugins like Full Page Screen Capture for Chromeor Full Page Screenshots for Firefox are one solution to the scroll-window problem.
One can use this to convert a dynamic-content site (say, a Wordpress blog) into static HTML that can be hosted nearly anywhere, like S3. As the website owner, this works reasonably well, if you’re willing to spend some time tweaking the output.
But for Capstone’s needs, running a background wget process doesn’t have the browser’s context (login cookies) and it loses dynamic state on the page.
A more modern format comes from the Heritrix project: WARC, or Web ARChive. It’s zip archive similar to the File -> Save Webpage, Complete format Chrome and other browsers give you. And Webrecorder is a web capture tool that includes a desktop (Electron-based) viewer.
We considered the WARC approach for Capstone. But we saw that the rendering logic is complicated, which goes against the idea of a simple, future-proof format.Awesome web archiving and Archiving URLs offer more approaches and tools.
We felt that this is a philosophically-strong approach to the problem. Freeze-dry can save to a serialized
.HTML file for viewing in any browser; for Capstone, we stored the clipped page as one giant string in the app's datastore.
One of our product/design goals was to make a tool that operates at the speed of thought: the user does not need to wait in order to get their thoughts on a page.
On the product side that means removing steps in a task like ingesting a new piece of media or making a mark on the digital page. On the technology side, it means sustaining 60 frames per second for output, and keeping latency very low (<50ms for most operations).See Slow Software.
Framerate is well-understood within some software professions such as video game creation. But it seems to be less used in productivity applications.
Briefly, most computer displays update themselves 60 times per second, with each of these updates being one frame. Performant software responds to user input and refreshes the display to match in as few frames as possible, and does not skip frames (aka “jank”) when doing heavy computation.
We tried to create a culture of fast software on the Capstone team. This means, first and foremost, awareness for all team members, because you make what you measure.
The Chrome Dev Tools FPS counter is a potential place to start:
In practice, we found this hard to activate team-wide (buried in a submenu, doesn’t preserve state across app restarts) and hard to interpret (FPS counter only updates during refreshes versus when the page is at a standstill).
Experienced software developers know that performance of an app or codebase tends to be the best when you start out, and only go downhill from there. The dual arguments of “we’ll optimize it later” and “that’s just a performance optimization” are logically compelling but empirically weak.
Platform choice matters a lot. Worse-performing languages and runtimes reduce the performance the user experiences for the comfort of the developer. There’s a good argument to be made for this tradeoff (hardware is cheap, programmers are expensive), especially in the research/prototyping environment where the goal is to move fast to test ideas.
But this is how we get to a place where computers are faster than ever, and yet loading a chat program on a top-of-the-line workstation takes 45 seconds.
We had some concerns in choosing the web stack, which has historically optimized for developer experience over application performance. But the particular subset of web technologies seemed promising:
In practice, it did not live up to our hopes. We labored extensively to keep performance up with techniques such as using Preact in place of React. In the end we had an app that was 60fps much of the time, but the engineering calories expended to achieve that seemed out-of-sync with the capabilities of the Pixelbook hardware.
Video games are a subset of the computing industry that has a strong culture of performance. One might say that action games like Pac-Man, Quake, or Destiny are speed-of-thought tools.
Video game developers are part of this equation, but so are hardware manufacturers with entire brands built on low-latency input devices and computers; low-latency monitors; and of course the huge market for GPUs. Virtual Reality gaming is pushing this even further, with refresh rates of 90hz.
Even end users of video games (aka players) are aware of framerates, with FPS counters built into gaming platforms and players of competitive games able to count frames of latency.
So the question is, why do developers of productive and creative tools not have this same awareness? In our anecdotal experience, many software engineers with a web or mobile background are only dimly aware of framerate as a concept, let alone know whether their own apps are maintaining a steady framerate.There are occasional exceptions like 60fps with Rust and Electron.
We don't know the exact solution here, but we'd like to propose all of us doing productivity app development might want to take a few cues from video game performance culture.