Building abstractions on spreadsheets

AJ Nandi · Dec 13, 2022

Spreadsheets are an incredibly powerful tool for organizing and analyzing data. The most common form of “end user programming”. They allow users to input, manipulate, and visualize data in a structured and intuitive manner. The atomic unit at its core is a cell. It allows for any point on the infinite canvas of cells to represent some data or computation. But this flexibility comes with a cost. Relative to other end-user software, direct access to cells is very low level and thus very error prone. Issues with low level access in other programming languages have been solved by some combination of abstractions, frameworks, and other guardrails. We at Subset are interested in bringing these kinds of abstractions to spreadsheets, with the goal of making spreadsheets easier to build, use, and share.

In this blog, we will explore a few of the abstractions we and others have conceptualized, how we use them, their benefits and limitations, and how they can be used to better understand and analyze data.

Sheets on a canvas add structure

We often group related numbers and calculations under a header. It has obvious benefits of organizing a complex spreadsheet, but there are inevitably challenges with moving these groups of cells around. Row and column sizes differ, data ends up wrapping, and adjusting the column size in one place will mess up formatting elsewhere. We decided to solve these issues by allowing you to create isolated blocks or grids on an infinite canvas. Each is its own dedicated sheet, but it no longer has to compete for column space or row counts with data above or near it. The header, titles, and grouping is all automatic. The biggest challenge with this approach is making sure each block feels endless in its own right. It is an interesting design challenge that Apple’s Numbers has done a good job with.

Check out this subset: How much does a puppy cost

If you asked any modern programmer to manually define the memory addresses and sizes of every array they use, every time anything changes, software development would grind to a halt. That’s effectively what Excel makes you do, when you ever try to add new rows or columns to a section of your spreadsheet that you didn’t originally intend for there to be extra data.

=SUM(A1:A50)

Well what happens when you add 4 rows to A51 and you want to include them with the sum? Luckily Excel has full column summation, =SUM(A:A) so that should work just fine. Except when you type something else into the A column that you don’t want included in the sum, and now the sheet is confusingly broken. This is a big part of the inspiration for blocks. Make the full column summations way more common and simple to reason about, without having to add an entirely new workbook/sheet.

Auto-autofill: remembering to drag down formulas

Fill-down, flash fill, “ctrl+d” and other formula mapping features are very popular in spreadsheets. In conjunction with R1C1 notation, they’re essential to mapping formulas to the correct cells, and a core piece of why spreadsheets are the best tool for analysis. However, every time you add a new row (or column) of data, you’re forced to drag down the formula to reach the new row. Excel has a feature called Excel tables that attempts to solve this, but we haven’t seen widespread adoption. We’ve approached it differently, taking advantage of the blocks we described above. We called this autofill. It automatically fills down your formulas to the end of your grid, and you no longer need to remember to keep filling it down as you input new data.

Dynamically adding rows or columns

Another scenario in which we find ourselves looking for primitives similar to that of other modern programming languages is when we want to map row data to other parts of our spreadsheet. Dynamically mapping data feels like part foreign keys, part javascript map function. If we want a 1-1 row to row map of some sheet of data, there is no easy way to do it in Excel, as far as I know. (I’m sure there’s a way with VBA scripts, but lets keep this in the possibility of mere mortals.)

We’ve created this concept of having references to other parts of your spreadsheet. These references map the data and maintain a 1-1 count of rows from one grid to another. So if the source grid gets 10 new rows, the destination grid will add those 10 rows, and pull in the particular column of data that was referenced.

This allows anyone to build a number of interesting unbreakable spreadsheets. You no longer need to remember to add or remove rows, or insert them in the right place. Uploading data and having that data set be dynamically de-duped in a new grid is a great example of what’s possible with these block references.

Looking forward

We’ll be posting more examples of what’s possible with some of these new spreadsheet primitives. If you’re interested, check out Subset. If you have ideas or thoughts on new primitives for spreadsheets, let us know, we’d love to hear from you!