Engineering

The evolution of Asana’s Luna framework

Asana is in the middle of overhauling our application framework and this seems like a good time to share the approaches that we have taken as well as what worked and what caused problems.

When Asana got started in late 2008 we had three primary goals for the framework:

  • Reduce bugs (as well as engineer effort) by automatically updating the UI when data changes.
  • Mask network latency by running handlers on the client and updating the UI as much as possible with data that the client already has (versus waiting for the server to provide the new UI).
  • Reduce network latency by fetching everything that the client needs in one round trip.

Six-plus years (!) later, we still think that these are the right goals. Let’s take a look at how we went after them.

A different kind of Reactivity

Automatic reactivity allows a framework to determine what the client needs to re-render when data changes. Ideally, it ensures that if the app can render correctly in state A and state B then it can correctly transition between the states, since the framework manages the details of the transition.

Our approach to reactivity was to divide the call graph into different reactive units. We do this by wrapping some functions in rvalue objects. When code executed by an rvalue asks for the value of another rvalue, we automatically record a dependency between the two of them. Most rvalues are side effect-free but some special ones update DOM elements when they compute or recompute. We can update the DOM by recomputing all stale rvalues. (There’s more information about this on Quora.)

An rvalue is the smallest unit of recomputation and the smallest unit for updating the UI. That is, if an rvalue recomputes then any DOM elements that it created will be recreated. Because of this, engineers need to use lots of rvalues to avoid breaking animations and losing scroll positions when things recompute.

However, more rvalues mean more overhead from rvalues as well as the dependencies between them. The result is that adding features often reduces performance.

Making this even sadder is that most of the rvalues are not used. During a page load, for example, we create thousands of rvalues to properly isolate the different parts of all of the tasks that we are rendering. Most of the time, though, the user doesn’t make changes that would benefit from these rvalues. Instead, they switch to a different project and we have to spend even more time tearing down these data structures.

Finally, code that’s split into many rvalues is hard to debug because the order of execution is controlled by the framework.

We still believe that systematic reactivity is much better than building it by hand throughout the app. However, we are switching to React because it strikes a better balance by using coarser units of recomputation (views/components) while still making fine-grained updates to the DOM by comparing the old and new DOM descriptions returned by a view.

We still believe that systematic reactivity is much better than building it by hand throughout the app. However, we are switching to React because it strikes a better balance.

Client Datastore and Object Replication

A key component of our framework is a full-featured datastore running on the client. This syncs with the server to get objects and send changes and allows application code to load objects and run queries almost as if it were on the server.

The client datastore also allows application code to see the results of changes before they have been sent to the server. The server is still authoritative, though, so once the client hears that a change has been applied on the server it drops its version of the change and uses the data provided by the server.

Meteor and Relay take this approach too. We feel that a writable client datastore that reflects local changes is a requirement for a modern web app and our new framework will continue to work this way.

We feel that a writable client datastore that reflects local changes is a requirement for a modern web app and our new framework will continue to work this way.

We are making one change, though. The current version of Asana shares application code between the client and the server. This makes it easy for the client to accurately simulate a change before the server confirms it, but in our opinion, it produces unacceptable coupling between the client and the server, especially when you consider native mobile clients. Future versions of Asana will have separate code in different languages for the client (Typescript, ObjectiveC/Swift and Java) and the server (Scala). This will cause some duplication, but for now we feel that the decoupling makes this worthwhile and will make it easier to support third-party clients that cannot share code with our servers.

Moving away from Client Simulation

We’ve talked a lot about the client, but every web app needs a server that knows what data the client needs. Traditionally this is the source of a lot of pain since new features require that matching changes be made to the client and the server, often by different teams. It’s easy to cause performance problems by accidentally making the server load more data than the client requires.

One solution to this is to let the client make many small requests for the data that it wants. This puts the client in control and decouples it from the server, but it often requires multiple round trips, again causing performance problems.

Asana’s solution has been to simulate the application on the server: we look at the data that the application loads and send it to the client.

It’s hard to overstate how enjoyable this is in the right situation: you can write code while thinking almost exclusively about the client and the application magically gets the data that it needs. It’s a liberating way to program.

But there are many problems, ranging from accidental to intrinsic.

Performance

It’s hard for a single piece of code to simultaneously render the UI efficiently, load data efficiently (including batching requests to the database and caches) and be maintainable. Also, you often want to preload data on the client so that it’s immediately available if the user takes some common actions, but this requires simulating the UI on the server in multiple states, further reducing performance and consuming memory.

Handlers

It’s infeasible to simulate all of the handlers in the UI so we don’t always get the data that they need. Handlers crash unless some part of the app happened to load what they need.

Versioning

For simulation to work the client and the server have to be running exactly the same version of the code. Our web servers need multiple versions of our code and run the one that the client requests. We have to remove support for old clients whose server code is too buggy to be allowed to continue running. That’s bad enough for web clients, where we can force a reload if needed. It is unacceptable for mobile clients, though, where old versions must work even when server bugs have been fixed.

A deeper problem is that simulation encourages app developers to focus on the UI they are building, not the data that they are loading or how many round trips they are making to the cache or database. In a way, that was the point of simulation, but the amount of data loaded and the number of roundtrips to backends are critical to the performance of our application. Since it’s easy to change the app to load more data or to load it serially and It’s hard (or impossible) to catch this through static analysis, we are forced to write tests for the most important parts of the application.

Caching the data required for a given view on the server is also ineffective for several reasons. Simulation encourages developers to load whatever data they want, with the result that a view’s data is often dependent on session state and the current user. Also, the data loaded by a view depends on the code for the view, its subviews and all of the framework used by these views. Each time we deploy code it’s hard to determine which cache entries can still be used by the new version.

We are switching to a more declarative way for clients to specify the data that they need. The goal is to have a small, fast server that processes and loads data based on these descriptions. By keeping the descriptions free of code we’ll make it easier for the server to cache data between clients, users and versions. This is similar to Facebook’s Relay but with some important differences. For the client we compile these descriptions to TypeScript interfaces, giving us safer access to data. See our post on switching to TypeScript for more.

We are switching to a more declarative way for clients to specify the data that they need.

Summary

Six years into this company, the initial goals for the framework still seem reasonable.

Our implementation of reactivity got some of the balances wrong but others have proven that the basic approach is sound. A client datastore is becoming the norm for new web frameworks. Simulation had deeper problems than we expected, especially with the rise of native mobile apps. We’ve learned a lot and are excited about using the next generation of our framework to pursue Asana’s mission.

As you might expect, this stuff is fun to work on. Would you like to join us?

Would you recommend this article? Yes / No