Simple Solutions 1 - Active Record versus Data Mapper

Posted on by Matthias Noback

Having discussed different aspects of simplicity in programming solutions, let's start with the first topic that should be scrutinized regarding their simplicity: persisting model objects. As you may know, we have competing solutions which fall into two categories: they will follow either the Active Record (AR) or the Data Mapper pattern (DM) (as described in Martin Fowler's "Patterns of Enterprise Application Architecture", abbrev. PoEAA).

Active record

How do we recognize the AR pattern? It's when you instantiate a model object and then call save() on it:

$user = new User('Matthias');

$user->save();

In terms of simplicity as seen from the client's perspective, this is amazing. We can't imagine anything that would be easier to use. But let's take a look behind the scenes. If we'd create our own AR implementation, then the save() function looks something like this:

final class User
{
    public function __construct(
        private string $name
    ) {
    }

    public function save(): void
    {
        // get the DB connection

        $connection->execute(
            'INSERT INTO users SET name = ?',
            [
                $this->name
            ]
        );
    }
}

In order for save() to be able to do its work, we need to somehow inject the database connection object to the save(), so it can run the necessary INSERT SQL statement. Two options:

One, we let save() fetch the connection:

use ServiceLocator\Database;

final class User
{
    // ...

    public function save(): void
    {
        $connection = Database::getConnection();

        $connection->execute(
            'INSERT INTO users SET name = ?',
            [
                $this->name
            ]
        );
    }
}

The practice of fetching dependencies is called service location, and it's often frowned upon, but for now this does the trick. However, the simplicity score goes down, since we have to import the service locator, and call a method on it (-2 points?).

The second option is to pass the connection somehow to the User object. The wrong approach is this:

use ServiceLocator\Database;

final class User
{
    // ...

    public Connection $connection;

    public function save(): void
    {
        $this->connection->execute(
            // ...
        );
    }
}

That's because the burden of providing the Connection is now on the call site where the User is instantiated:

$user = new User();
$user->connection = /* ... get the connection */;
// ...

This would definitely cost points in the "ease-of-use" category. A better idea is to provide the connection in the framework's bootstrap code somehow:

final class User
{
    // ...

    public static Connection $connection;

    public function save(): void
    {
        self::$connection->execute(
            // ...
        );
    }
}

// Somewhere in the framework boot phase:
User::$connection = /* get the connection from the container */;

Because we don't want to do this setup step for every model class, and because we are likely doing similar things in the save() function of every model, and because we want each of our model classes to have a save() function anyway, every AR implementation will end up with a more generalized, reusable approach. The way to do that is to remove the specifics (e.g. the table and column names) and define a parent class that can do everything. This parent class defines a few abstract methods so the model is forced to fill in the details:

abstract class Model
{
    public static Connection $connection;

    abstract protected function tableName(): string;

    /**
     * @return array<string, string>
     */
    abstract protected function dataToSave(): array;

    public function save(): void
    {
        $dataToSave = $this->dataToSave();

        $columnsAndValues = /* turn into column = ? */;
        $values = /* values for parameter binding */;

        $this->connection->execute(
            'INSERT INTO ' . $this->tableName() 
                . ' SET ' . $columnsAndValues,
            $values
        );
    }
}

// Pass the connection to all models at once:
Model::$connection = /* get the connection from the container */;

We should award ourselves several simplicity points in the area of reusability! The AR model class is now portable and useful in different contexts. We can use the same simple solution again and again. However, we also get a lot of negative points. Because we are introducing a parent class, and each model class has to provide a number of abstract methods, so the number of elements (functions, classes, etc.) as well as the number of lines of code (LoC) increase dramatically.

A certain risk of this Model class is that it's going to have a lot of additional behavior that is not needed by all of the model classes that extend from Model. The API of each model becomes very big, containing methods like find(), delete(), and for creating or loading related model objects in a dynamic way.

In fact, instead of implementing AR ourselves, it's more likely that we'll be importing a library that solves all of our current and future needs. To be honest, we already had (at least) one dependency for the DB's Connection class, but now we add another one, which itself includes many more, making our solution drop many points on the simplicity scale.

Data mapper

Let's consider the data mapper pattern now. You recognize this pattern by a model object that is instantiated and then handed over to a service that persists the object for you. The service is often called "repository", mixing in the Repository pattern (also from PoEAA, but maybe more famous in the modeling space because of the Domain-Driven Design books by Eric Evans, Vaughn Vernon, and the likes):

final class UserRepository
{
    public function __construct(
        private Connection $connection
    ) {
    }

    public function save(User $user): void
    {
        $this->connection->executeQuery(
            'INSERT INTO users SET name = ?',
            [
                // how does it get the name?
            ]
        );
    }
}

Since UserRepository is a service, we don't have to worry about "getting" the database connection, it's there. So this class has no dependency on the service locator, but of course it does have a dependency on the Connection class itself. So in terms of dependencies UserRepository::save() is simpler than User::save(). However, from the perspective of the client, it's less simple because a client can no longer call $user->save(), but has to pass User to UserRepository:

// assuming $this->userRepository is a `UserRepository`:

$user = new User('Matthias');

$this->userRepository->save($user);

This means every client that wants to save a User requires an additional dependency, so the overall number of points for dependency management may be equal for both AR and DA. However, I think we could make a strong case for putting a higher penalty on resorting to static service location, versus (constructor) dependency injection. We'll save that discussion for another article.

One thing to note is that User does not extend from a base Model class. In fact, it will never have to. It has no special abilities, or any methods at this point. We are free to make what we want of this object, which is why the Data Mapper pattern is naturally a better match for a domain model made with objects.

final class User
{
    public function __construct(
        private string $name
    ) {
    }
}

Earlier I skipped one important step in the UserRepository::save() implementation, which will get us into trouble now: how does the repository get the data out of the User object in order to use in in the SQL INSERT query? I wrote about this problem earlier in ORMless; a Memento-like pattern for object persistence, but let's repeat our options here:

  1. We could add getters to the User for each property that needs to be persisted. This would widen the API way too much, exposing all the internal state to any client of User.
  2. We could use reflection to extract the data from the User's private properties. This is what ORMs implementing DM will do, but it requires dynamic programming, leaving most of the "mapping" logic implicit, not type-safe, while fully breaking the object's encapsulation, allowing the model to keep absolutely nothing to itself.
  3. We could add a single method to User that exposes all of its persistable data at once, e.g. asDatabaseRecord(): array.

I think the last option makes the most sense, at least in our current example. We'll add one method to User:

final class User
{
    public function __construct(
        private string $name
    ) {
    }

    /**
     * @return array<string,string>
     */
    public function asDatabaseRecord(): array
    {
        return [
            'name' => $this->name
        ];
    }
}

The repository uses this to build up the SQL query:

final class UserRepository
{
    public function __construct(
        private Connection $connection
    ) {
    }

    public function save(User $user): void
    {
        $data = $user->asDatabaseRecord();

        $columnsAndValues = /* turn into column = ? */;
        $values = /* values for parameter binding */;

        $this->connection->executeQuery(
            'INSERT INTO users SET ' . $columnsAndValues,
            $values
        );
    }
}

When you approaching the model object with the DM pattern like this, there is no immediate need to extract common functionality into a package, or to introduce a third-party package to the project, besides the one that contains the Connection class. If you still want to do that, the solution will become less simple again, loosing some simplicity points. Compared to AR, introducing a DM-support package doesn't change the API of the model class itself by adding a lot of methods (or code in general) to it that aren't needed, but it will certainly introduce what's known as accidental complexity. This is complexity you didn't want to deal with, but that you inherit because this DM-support package (ORM) has to solve any potential persistence-related problem, not just your problems.

Conclusion

In this article I've tried to analyze the "simplicity" of two competing patterns for object persistence: Active Record and Data Mapper. With regard to dependency management and ease of use, both had some positive and negative points, resulting in similar scores. However, AR introduces many more code elements than needed, mostly because it relies on inheritance to give the model the powers it needs to persist itself. When using DM, the model objects don't inherit anything. They are plain old objects. A complicating issue for DM is how to get the data out of the object, which is not a problem for AR. You can solve this with a simple state-exposing method as we've seen, but many projects may introduce an additional DM-support package, which complicates the solution a lot. In the end, importing an additonal ORM package for your persistence needs is what complicates both AR and DM solutions.

PHP active record data mapper object design simplicity
Comments
This website uses MailComments: you can send your comments to this post by email. Read more about MailComments, including suggestions for writing your comments (in HTML or Markdown).
Brett Santore

I’m assuming the hydration method would be a ‘Entity::fromDatabaseRecord’ that the repository would use to return a valid entity?

Matthias Noback

Exactly!

Liked the article.

Was hoping for a generalization from UserRepository to Repository which can be inherited from.

Would also love to see actual implementations of the construction of the sql queries.

Matthias Noback

Thanks, you can find an example of how the INSERT query is built in Doctrine DBAL's Connection::insert() function.

You can definitely generalize some things in a Repository class, but watch out for the problem of expanding the API unnecessarily. The solution for me is not to extend from a base repository, but to inject such a repository as a constructor argument to the concrete repository and keep the repository interface really small (e.g. only save(), getById(), and nextId()). I did this for the TalisORM library.