Let's talk about Data Models and ORMs

brown wooden map board — Photo by Brett Zeck on Unsplash

Background

I can’t really pinpoint the moment that got me thinking. It was probably related to the sheer number of code tests that I’ve been reviewing lately and the rising phenomena of “Framework Developers”.

I’m not talking about developers who actively develop frameworks. No. I’m talking about people who actually learn to program by learning a single framework and then sticking to it. Think “React developer” or “Laravel developer”. I kid you not, but I’ve been witnessing more and more instances of people who learn to use a framework, without ever learning the language behind it (for those unfamiliar – it’s JavaScript and PHP, respectively).

But it’s not even that. What actually bothers me is that we are starting to forget that there is an actual database behind all Doctrines and Eloquents. We are taking ORMs for granted without ever stopping to think what they are. They became so perfect that you just know that you can spit whatever you want on them, and they’ll somehow, automagically, store and later on retrieve that for you.

And this, kids, is back to basics article. Back to what objects and relational databases are and how do we use mappers to fill that gap in-between. So, buckle up, it’s gonna be a ride!

Data Models

Just stop and think about it for a moment. What is it that you are doing as a developer?

You’re writing code, right? That’s kind of obvious. But, on a higher scale of things, what you are actually doing is converting business requirements into computer code. And even though “business requirement” might carry some negative connotation, it really boils down to a requirement of a sort. You are MODELLING a thing based on a use-case that somebody is (hopefully) paying you for.

One might ask, and rightfully so, how accurate is it that you are doing? And, as probably every project manager will tell you – it usually sucks and sometimes completely misses the point. But hey, we learn and develop.

So next question is – how accurate does it have to be? How many details do you need to capture when translating somebody’s wish into a computer signal?

Well let me give you a perfect example. A Map. Yeah. Like the one pictured at top of this article. How detailed are they? I think we could probably agree that they’re as bare as it gets, as in – they’re definitely not depicting every single tree and building and whatever is down there. But are they USEFUL? Well, ask Columbus about it! They capture just enough information to allow you to do whatever the heck it is that you wanted to do (e.g. meet your Tinder date or discover a new continent; whatever you’re feeling up for doing tonight).

You see where I’m going with this? The same is true about data really. When modelling a business requirement, what you want to do is capture just enough information so that it can do whatever it is that it’s intended to do. Nothing more and nothing less. Just enough. And that’s exactly what a data model is!

But how do you capture that info, right? Which “modelling” tool do you use for shaping the requirement?

Assuming that you are using some of the popular languages, you are most likely using Objects for modelling the requirements. And it’ll probably come as no surprise that this “technique” is referred to as Object Oriented Programming (OOP).

Yep. OOP is really nothing more than a way of translating those vocal / written requirements into a form that’s understandable to computers. It works by literally translating “spoken” models (e.g. I want X and Y to do Z) into Objects and defining how they can interact.

But is it the ONLY way? Absolutely not. It’s just the most convenient way that we have. There’s also a procedural approach that relies heavily on usage of functions applied to the data structures, but that’s a completely different story now.

Let’s talk a bit on how we store objects in memory.

Objects in Memory

Let me open this with a question. Let’s assume you have an instance of a class called Person:

person = new Person("Jane Doe")

person.address = new Address("John Doe Street 55")

If I were to ask you to send me this object as a response to an HTTP request (e.g. a GET request), what would you say? Would it be possible?

If you’re coming from the world of automagical stuff (I’m looking at you Laravel!), you’d probably say – yes, of course, why not?

Or you might answer – well, sure, I just need to convert it to JSON (or XML, or whatever the format you fancy these days). No biggie. And if you look into what your automagical framework is doing, you’ll probably notice it’s sending JSON back as well.

Well, what you may or may not know is that, behind all this, there exists some form of conversion from Object notation into a JSON . Yeah, I’d go as far as to call it an Object-JSON-Mapper (OJM; I just made that up!). It’s mapping your complex structure into a byte sequence – a JSON.

But more important question here is – WHY? Why do we need to convert our Objects into JSON in order to send them back as a response to an HTTP request?

The answer is rather simple — Objects are complex structures. Their byte sequences are scattered all over your RAM.

And as you may, or may not know, in languages that support threads, you are free to pass references to your objects around. That’s because those languages and programs share the same memory so you can just to a specific memory address and access whatever it is that you want.

But if you want to share that OUTSIDE of your RAM? Well, you either physically take out your RAM stick and take it to wherever it is that you want to have your data. Or you actually convert your stuff into a simpler presentation – a sequence of bytes (or, as we’d happily say – “JSON String 😀”).

What we can conclude is that Objects (in terms of OOP) are “complex” and, thus, called high-level data structures, whereas JSON, XML and all the language-specific serialization methods are called low-level structures. Oh, and this process of converting a high-level data struct into a low-level one is called “serialization”.

Now let’s switch to the other side for a bit. Let’s talk about Databases; relational databases in particular.

Relational Databases

If you recall the beginning of this article, I was whining over the fact that people seem to be taking ORM’s for granted. We, for whatever reason, seem to forget that behind that fancy ORM of yours, lies an actual relational database. What’s even more crazy, I think we’re forgetting that Relational Databases are JUST ONE way of storing your data. Yes! Does that come as a surprise?

Relational DBs originate back to 1970s, when the “relational approach to storing data” was proposed by Edgar Codd (yeah, that’s 50 years ago!). As the name implies, the idea was rather simple — store your data in tuples (what you probably know as “columns” these days) and make relations between them (what you probably know as “relations” these days). As simple as that, really. Did you expect anything more superior than that? Sorry to disappoint you!

Then some other fella’s from IBM found that idea interesting, and they developed a language for making queries against the relational data. You may know this by the name of SQL these days — Structured Query Language. Yeah 🙂 It’s really just a STRUCTURED way of QUERYING your relational data. Disappointed over the simplicity again? I hope not!

Well turns out that those simple ideas proved to be incredibly powerful. As in – you could store petabytes of data and still make ultra complex queries over it (as all of us who have witnessed monstrous SQL queries can confirm).

But here is a really important thing – most (I say “most” because I have no idea if there are relational DBs that accept complex object types for column definitions) of the relational DBs accept only scalar data types as columns! Yep. All those VARCHARs, INTegers, BLOBs, ENUMs, … they are all scalar types. Like – sequences of bytes really. No complex structure to them. And this is a pure speculation from my side, but I’d assume this is the exact reason why they became so popular & powerful in the first place. They combine sets of simple ideas in order to make a powerful outcome.

What might surprise you though is the fact that Relational model is just ONE possible way of storing your data. Yep. Sometimes in the past we also had Object-based databases (called OODBMSs). People simply assumed that if we use objects to program our stuff, why not simply store those objects as … well, objects?

Turns out that this only sounds good in theory. In practice, querying such data, especially when you have N:N relations, really becomes a nightmare.

But you know what else is there? Document databases (e.g. MongoDB and 30+ others), Graph databases (e.g. Neo4j and 25+ others), Key-value databases (e.g. Memcached and 35+ others).

I could probably write number of articles on each of these, but you might have heard of them referred to as “NoSQL” databases. The naming is very unfortunate really and, from what I understood, it emerged by accident and then just stuck around. But the whole point is that they are non-relational databases (i.e. data is not stored as list of tuples interconnected with relations) and they don’t use SQL for querying the data.

For example, MongoDB stores data in a format that is very similar to JSON (called BSON – Binary javascript Object Notation) and is pretty useful if you have data that doesn’t have too many N:N relations. For example, list of news articles with their comments is a great candidate for Mongo. Just store each “news” article as an object with users comments embedded inside of it and fetch them all in a single query when needed.

On the other hand, Graph databases are perfect if your data is highly connected! Think of how you’d model a Facebook’s friend relations in a relational DB. You’d probably have tables called person (to store list of people) and friendship (to store N:N relationships). Now think of an SQL query that will find all friends of your friends. What about people you are NOT friends with but at least TWO of your friends are (usually advertised as “Suggested people” in FB)? Good luck in writing that SQL! But this is where Graph DBs shine – when you have highly interconnected data, you want to use a Graph Model rather than relational model because it just makes more sense!

By now I hope you caught a gist of all of this – each DB model (relational, document-based, graph-based, etc.) is perfect for some specific use case. But it just so appears that Relational DBs are good enough for MOST, if not ALL use-cases (another reason why Excel is so popular anyway). And that’s what makes them such a popular choice, and that’s why we have so many Object Relational Mappers around 🙂

Finally, we get to the essence of it. Let’s talk about ORMs now!

Object Relational Mappers

By now, I’m hoping that you’ve read and understood what Objects and Relational databases are. And how objects are complex objects scattered all over your RAM memory, whereas relational databases are just collections of tables and columns (think Excel) that can be linked to other tables and columns (think connecting your Excel sheets). Oh, and those columns are (mostly) of some scalar types!

See the problem? If you want to store your objects into some permanent storage, hopefully to retrieve it some time in the future, you need some way of converting that complex object structure into a simple relational model. The same as you have to do when going from Objects to JSON 🙂

And that’s really where Object Relational Mappers come in. As the name implies, “all they are doing” is converting your data back and forth. From Object Oriented structure (think classes, properties, inheritance, etc.) to a Relational structure (think tables and simple data types like integers, varchars, etc.).

Now, obviously it’s not just back-and-forth conversion that they provide. They usually add tons of other stuff on top of it, ranging from caching to having their own query languages (e.g. DQL – Doctrine’s Query Language). But in essence of it – all they are really doing is converting your data from one model to another. And that’s all!

Good question is – HOW do they do it, right? How do you go from complex Object Oriented representation to a relational one? Well, depending on how “magic” your ORM is, it either forces you be explicit about how to do it (that’s what all those annotations usually are for) or it makes bunch of assumptions based on the types of your properties.

But you know what else is there? ODMs! Object Document Mappers. Yeah. They convert between Object and Document model. If you ever wanted to store your data in Mongo, chances are you dealt with ODM to make your life easier.

We have OGMs as well! Object Graph Mappers!

I’m sure you’re getting the point by now. These thingies are used for helping you convert from one data model to another. And that’s it!

What pisses me off

Well, what pisses me off is that these things became so good and magical that we are forgetting that there’s an actual database behind them. We just rely on using them without even thinking what do they do with our data.

Most of the time they do good. Sure. They get it right on how to store & how to fetch, yeah. But sometimes they do so much shit that what would be a simple SQL query, turns into a plethora of gigantic statements that kill everything along the way! But we’re not aware of it because, as I said, we keep forgetting what’s behind ORMs!

And don’t even get me started on the fact that many developers forgot how to write SQL! It’s a pity! It pisses me off. Some have never even seen it! Not to mention that there’s surely a minority whose not aware that SQL exists at all! Damn!

So yeah, that’s what pisses me off. The fact that we forgot what ORMs are, what’s behind them and why we have them in the first place. And that there are other data models (Graph, Document, etc.) and mappers (OGM, ODM) that we could use.

And that, kids, is how I want to close this article. Hoping that you leave this article with some additional knowledge of different data models and which role do mappers play in that game.

Summary

Let me make a quick summary of all that we’ve discussed and what you, hopefully, will take out as a lesson of this article:

You, the developer, are a medium of converting the VOCAL signal (i.e somebody’s spoken idea) into a computer-processable signal
Most common way of doing that conversion is by using Objects and modelling interactions between them. Approach known as Object-Oriented Programming
When modelling a business requirement, what you want to capture is just the working essence of it; just enough info so that you can achieve whatever it is that you want to do
This process is called modelling; more specifically – Data Modelling
Using OOP data model is perfect when working on machine(s) that share the same memory space; however, sharing your OOP structures via HTTP is a bit trickier
We use lower-level data modeling approach for that – usually by converting our OOP model into JSON / XML one. This process is referred to as “serialization”
For storage purposes, Relational Databases are most common these days. They use a way simpler model – series of scalar columns interconnected by relations
Problem arises when we want to convert from OOP data model into a Relational one; that’s where ORMs come into play!
ORMs are used as a back-and-forth converter between two data models (e.g. OOP and Relational model)
There are also ODMs (Object Document Mappers) and OGMs (Object Graph Mappers)
Finally, remember — it’s OK to have your OOP model different than Relational model! You can use one structure to represent OOP data and another structure for storing it into a relational DB! Don’t get yourself locked into your database!

Useful resources

I’d lie if I were to say that I wasn’t inspired from somebody else’s material. Specifically, two books that I read couple of years ago and am re-reading again, which I’d absolutely highly recommend are:

Designing Data-Intensive Applications by Martin Kleppmann – this one is literally a BIBLE when it comes to data and data design. It starts with very basic concepts like – what data is and expands to culminate with distributed systems and how to handle huge amounts of data in such environments. Highly recommended and here’s my last year’s review
Domain-driven design by Eric Evans – usually referred to as a “blue book” (there’s a “red book” as well), one of its kind and bible of domain-driven design. If you want to pimp up your game and go deeper into the area of “how do you actually transfer complex ideas into object-oriented model”, this is a book that you have to read. But be warned – this is NOT a light read. Not even remotely close. This book is so dense and packed with information that it’s impossible to just “sit and read it”. No. Dedicate 20mins per day / per week for it and just crunch it slowly and steadily. I’m doing my second re-read now and here’s my last year’s review

If you liked this article, you might also like:

If you want to stay up to date about what’s happening on this blog, you may befriend me on LinkedIn, follow my posts on Instagram and Twitter, or subscribe to RSS feed.

You may also subscribe to my mailing list:

Let’s talk about Data Models and ORMs