Reboot ember-data as @ember/orm ?

Ember’s data layer has come a long way and is serving many, many developers a good API to connect to their backend and interchange data. There also evolved a good ecosystem with adapters and serializers ready to install. Ember-data is grown alongside Json API and they are tightly coupled together, making it a bless to use. Though with all this history, ember-data also carries a lot of bloat from the past. Things like async relationships make it very often hard to guess what the code does. Despite all its goodness, ember-data is still my number one source of app failures. May it be a proxy object I unknowingly access or an observer written for an array, that was swapped out with a many relationship from ember-data that killed the whole thing. Errors, you and me run into one way or another and we do continue running into them just next time by taking a new route (to avoid the already bugged one) to approach the same problem.

Given ember-data’s historic growth and entanglement to Json API, I raised the question:

What if ember’s data layer would be created today, what would it look like?

In this article I’ll start with outlining requirements to a data layer of a frontend app and discuss some topics that would go on top of what ember-data is capable as of today. I also make assumptions of what an API could look like. I’ll do that in a way as if ember-data wouldn’t exist to not conflict with the current one. This is on purpose and refers to the title of this article to reboot ember-data as a fresh project, leaving old chunk in the past.

Requirements

A data layer for a frontend app in the year 2018 has to fullfil some task, it needs to challenge. In my eyes, these are:

Variety of Sources: A frontend app can connect to many sources, e.g. a REST API or a socket which sends data back and forth. These are different types of connection/sources the data layer needs to handle and coordinate.
Unified Queries: Writing queries should be abstracted into a unified way, so they will translate to – whatever the source it targets – will understand.
Bulk/Batch Processing: Handle one, two or even more operations within one transaction
Conflict Management: Nothing a developer ever wants to deal with, this should be handled seemlessly by the data layer.

Model Improvements

The way a model is defined in ember-data is one thing I love about ember-data. It follows the DataMapper and ActiveRecord design patterns and provides the mechanics to configure the way we need our models to behave; it looks like this (btw: I use TS + Decorators for my code):

// src/data/models/post.ts
import { attr, belongsTo, hasMany } from '@ember-decorators/data'; 
import DS from 'ember-data'; 
import Model from 'ember-data/model'; 

export default class Post extends Model { 
	@attr title: string; 
	@attr body: string; 
	@attr('date') date: Date; 
	@belongsTo('user') author: User; 
	@hasMany comments: DS.ManyArray<Commet>; 
} 

// src/data/models/user.ts 
import { attr, hasMany } from '@ember-decorators/data';
import DS from 'ember-data'; 
import Model from 'ember-data/model'; 

export default class User extends Model { 
	@attr givenName: string; 
	@attr familyName: string;
	@hasMany posts: DS.ManyArray<Post>; 
} 

// // src/data/models/comment.ts 
import { attr, belongsTo } from '@ember-decorators/data';
import DS from 'ember-data'; 
import Model from 'ember-data/model'; 

export default class Comment extends Model { 
	@attr authorName: string;
	@attr comment: string; 
	@belongsTo post: Post; 
}

One thing I haven’t found yet (but may be exist in ember-data somehow) is the way, relationships behave when they are altered. Taken the example above, if I do have the following code:

const post = store.findRecord('post', 123);
post.destroyRecord()

which is translated to a HTTP request: DELETE /posts/123 The backend deletes the post and most likely will also delete the associated comments. The most common serverside storages are databases and in case of a RDBMS actions can be declared, which are performed when relationships are altered, e.g. onUpdate="NO ACTION|SET NULL|CASCADE|RESTRICT". For the time beeing I haven’t found if there is a way in ember-data to mimic this behavior. Something like:

@hasMany({onDelete: 'SET NULL'}) posts: DS.ManyArray<Post>;

If we can instruct ember-data to behave the same way as our backend it would be very welcoming to adapt to our backend.

Second issue dealing with relationships is code related. I’m very eager when it comes to naming things (like classes, interfaces, methods, etc) and I can become very frustrating, when things aren’t applied their correct name(s) in the first try. It motivates me to resolve this dissonance and I won’t find my inner peace until that thing is named correctly. Correctly in a way it isn’t ambigous, in a way it neither implicitely nor explicitely allows for wrong assumptions and in a way it doesn’t drive me nuts, because it is still the wrong name for what it is. Back to the example at hand, ember-data allows two different relationships: belongsTo() and hasMany(). Now the type for the hasMany() relationship is ManyArray. A many. array.? Does that mean there is a OneArray? Is that the type for the belongsTo() relationship? Does a one. array. even make sense? No! None of this! The correct type for the hasMany() relationship is a Collection with the referenced model as template: @hasMany posts: Collection<Post>;. With the arrival of for...of loops it isn’t necessary anymore to use a forEach() callback. It works out of the box with native arrays but doesn’t for the ManyArray although it is suggesting – just by it’s name – it is an array, because it has no iterator symbol attached to it. All I can say for now is: ember-data has a ManyArray with many things going wrong here. Maybe all this is just ignored by many people, taking the forEach() as usual, some are strongly with me on this one, other’s will fall in between – that’s all fine. It just shows my strong will to craft perfect and beautiful APIs (= extremely pleasent to use), which the reader might take into account for the later topics in this article.

Orchestration of Sources

In todays apps we easily end up dealing with more than one sources/connections to our backend. The most prominent example for this might be an offline source (localStorage or IndexedDB), some form of online source (REST API or GraphQL) and a synchronization between them. E.g. the app is offline and data is stored in the offline source and when the connection is online, the offline data is synchronized with the server and vice versa. Ember-data by default is designed with one source in mind. Synchronization may be happen at an adapter level (e.g. for ember-pouch). Besides offline/online synchronization, an app can consist of multiple backend sources, that could be the REST API for usual CRUD related operations while there is a socket the app can subscribe to get data pushed from the server, when it is available, without explicitely requesting it.

Luckily there is ember-orbit/orbitjs which has a perfect integration of handling sources and strategies on how to synchronize, prioritize and coordinate amongst them. There is no way but just copy this beauty one-to-one into @ember/orm !!

Unified Queries

Ember-data’s store methods take parameters and passes them to adapters which in turn translate this input into an adapter specific format and send it along. A serverside ORM would provide some form of a query class with a unified API and adapters take that input and translate it into the SQL dialect of the database these adapters connect to. Ember-data behaves similarly, though because their is no strong convention of what the options hash should look like for all retrieval methods (findRecord/findAll/peekRecord/peekAll/query) they are basically tied to the adapter. Ember-pouch relies on different options than the JSON API adapter and also the ember-local-storage adapter (which implements the JSON API for the localStroage) only implements a subset of all options that the JSON API adapter takes (PS. it ignores the included option). Changing the adapter requires to check (and possibly adjust) every retrieve method of your application; or in the words of Boromir: “One does not simply change the adapter of ember-data” (I’ll leave the imagination of this picture to yourself :D). In my eyes, this is a fatal misconception of the ember-data’s API. Instead the API should offer a unified way to describe query and let the adapters translate it accordingly. Maybe something like this (just some different ways, how an API could look like):

// 1) in regards to the current ember-data API
store.query('post')
	.fields(['id', 'title', 'body'])
	.include('comment', 'author')
	.page(2)
	.pageSize(10)

// 2) with query class and finalization call
const q = new Query('post');
const result = q
	.fields(['id', 'title', 'body'])
	.include('comment', 'author')
	.page(2)
	.pageSize(10)
	.find();

// 3) fluent query class instantiation syntax, without finalization call for reusable queries
const q = Query.create('post')
	.fields(['id', 'title', 'body'])
	.include('comment', 'author');

// filtered
const posts = q.filterBy('title', '*performance').find();

// paginated
const posts = q.page(2).pageSize(10).find();

// findOne
const post = q.findOne(123);

// findOne with filter
const post = q.findOneBy('title', '*performance');

Some comments as explanation for better understanding:

The (default) store would injected (see below what I mean with default store here)
Methods like fields() or include() could take the arguments either way, I just demonstrated both

I borrowed this idea of chained methods for query from a php ORM called propel which I think does a fantastic job when abstracting queries into a unified API and TypeORM is what it looks like in JavaScript/TypeScript land.

Store Mechanics and Design Patterns: ActiveRecord, UnitOfWork, DataMapper, ….

ORMs these days are pretty well backed up with design patterns to get the architecture right. The model, with the help of the adapter, implements the DataMapper and the ember-data store can be seen as the repository, where peekRecord() and peekAll() retrieve the in-memory records (ember-orbit has the same retrieval methods of the store apply to store.cache (it’s in memory reference)). From now on, I will call it repo instead of store. Regarding design patterns for the model, there are ActiveRecord and UnitOfWork. Happily these two design patterns work side-by-side and can enhance each other. And here is how @ember/orm could look like, when those come into action:

const post = new Post({
	title: 'Some words on performance',
	body: 'Lorem Ipsum dolor sit amet ....'
});

// PERSISTING
// Active Record (AR)
post.save();
// UnitOfWork (UoW)
repo.persist(post);
repo.commit();

// DELETING
// AR
post.delete();
// UoW
repo.remove(post);
repo.commit();

This would also eliminate the hassle between deleteRecord() and destroyRecord() – which one was the persistend? ActiveRecord will probably the choice at hand for manipulating one record at a time but the UnitOfWork pattern allows for batch/bulk processing. Keep continue calling repo.persist() and repo.remove() with a final repo.commit() will launch the transaction off to the backend. A GraphQL backend will support this out of the box, JSON API will provide this with the next version, as an extension or built-in (I didn’t follow the discussion closely enough to clearly say what it will be… we will get it). This is hackable as of today and Netflix recently released an article on how to do batch-requests with ember and the required ember-batch-request addon (they had to introduce a very ugly API (sorry guys, it just is) in order to get the job done, which would easily be eliminated with the UoW pattern).

Subscriptions

In order to receive push notifications from the backend, the query won’t be sufficient here. GraphQL to the rescue; which has subscriptions built-in as a root type. It is less GraphQL which comes to the rescue but the wording it delievers with it. Basically the frontend can subscribe to events fired from the backend and pass the result to a handler (which e.g. may update the repository with records of a given query) and unsubscribe when the page is left, so no unnecessary data is transmitted between client and server. Could look something like this:

const q = Query.create('post').find();
const handler = new ResultSetMutationHandler(q);

// subscribe to them
repo.subscribe('postAdded', handler);
repo.subscribe('postUpdated', handler);
repo.subscribe('postRemoved', handler);

// unsubscribe when leaving the page
repo.unsubscribe('postAdded', handler);
repo.unsubscribe('postUpdated', handler);
repo.unsubscribe('postRemoved', handler);

This API would be very rudimentary but allows to subscribe to events that do not necessary manipulate the data of the current query you are working with. Speaking of CRUD, the use case here is heavily screaming for a shortcut method on these. Ember-orbit provides a store.liveQuery() though I have already proposed Query classes and secondly the name liveQuery() doesn’t appeal to it (will probably just be me who is having an … let’s say … “unconfortable feeling” with it). Though I must say, I basically have no experience dealing with subscriptions but that could look like the API I would expect to deal with that. Alternatively I would maybe also see a Subscription class that can be subclassed (and ships with subclasses for common scenarios) which work similary to Query classes and something (repo?) to hang them on.

Conflict Management

Here is the deal: WE DO NOT WANT TO DEAL WITH CONFLICTS! Though we have to…. do we?

With multiple sources in place, synchronizing between them has a chance to arise conflicts. Orbitjs (and ember-orbit) allows stores to be forked and merged. I personally had no use case for this so far but maybe this is a way to handle conflicts when working on different sources and merging them back together (also the ember-orbit docs highly urge to not do). With multiple stores/repos in place, I consider the repo injected by Ember/Glimmers DI system the default repo/store, handling with them could look like:

const fork = repo.fork();
const post = new Post({
	id: '123',
	title: 'Some words on design',
	body: 'Lipsum bla bla....'
}, fork);
post.save();

try {
	repo.merge(fork);
} catch (e) {
	// deal with conflicts here
}

repo.commit();

We would still have to deal with the conflict but an API to do that? We maybe can get better here with something called conflict-free replicated data type (CRDT). Also I may recall this wrongly here, the takeaway is: the model/DataMapper is capable of resolving conflicts on its own based on how the model is structured. It is – for example – been used by decentralized peer-to-peer database called orbit-db (not to confuse with orbitjs) respectively the IPFS data-storage system. For the time being I think this isn’t quite easy to implement with JSON API but would take off the hussle to deal with conflicts completely. I merely pointed to it as a second option for dealing with conflicts.

API Design Proposal for @ember/orm

So here is what I could think of an API for @ember/orm (at least for Model/Query and Repository):

interface Model {
	constructor(properties: object, repository?: Repository);
	save(): void;
	delete(): void;

	isNew(): boolean;
	isDirty(): boolean;
	// ...
}

interface Query {
	constructor(model: string, repository?: Repository);
	fields(...fields: string[]): this;
	includes(...relationships: string[]): this;
	// ...

	// pageination
	page(page: number): this;
	pageSize(size: number): this;

	// offset/limit
	offset(offset: number): this;
	limit(max: number): this;

	// filtering
	filterBy(fieldName: string, search: string): this;
	// and() and or() methods

	// finalization
	find(): Collection<K extends keyof ModelRegistry>;
	findOne(id: string): K extends keyof ModelRegistry;
	findBy(fieldName: string, search: string) Collection<K extends keyof ModelRegistry>;
	findOneBy(fieldName: string, search: string): K extends keyof ModelRegistry;
}

interface Repository {
	fork(): Repository;
	merge(repository: Repository): void;
	persist(record: Model): void;
	remove(record: Model): void;
	commit();
	rollback();

	subscribe(eventName: string, handler: SubscriptionHandler);
	unsubscribe(eventName: string, handler: SubscriptionHandler);
}

// reference: for <K extends keyof ModelRegistry> see @types/ember-data (I hope this is possible to type)

Final Thoughts

To sum it up, my final thoughts on this:

I do used repository over store towards the end to better describe fork() and merge() (since this works better with a repository wording) and to make people aware of it
I even think that store is the maybe better word in an ember context (not just because it is historically grown). If however, the retrieval methods will be dropped, then repository is better to avoid confusion (= give it a fresh start!)
I unfortunately have no good idea to load delayed data (= async) but I can tell working with proxy objects from code perspective (= not templates/hbs) is a nightmare!
I am by no means an expert when it comes to data(base) handling or anything that comes beyond calls to the API I outlined above
I love beautiful, thoroughful, pleasent to use crafted APIs
I think ember-data doesn’t have this kind of API (anymore), especially when it comes to nowadays use-cases. I think ember-orbit is the more powerful data-layer for ember apps for the moment
I love to use beautiful and thoroughful APIs
I think embers data layer deserves one of these
I think embers data layer should be called @ember/orm since I think this is the more appropriate name for it
That’s my idea to it, if some of this makes it into embers data layer, I am more than happy

– gossi