This post was originally published on Kam’s Idea Log by Kam Figy on November 16, 2013.
There are many times where you need to extend the Sitecore data architecture in some way. One of the most obvious, and most used, is attaching an event handler to one of the item events (e.g. item:saved, item:renamed) but this is not by any means the only way to manipulate items before they get to the database.
This blog post came out of a discussion I had with Alex Shyba about the DataEngine, a few hours on a plane, and a lot of decompilation. I’m going to attempt to show what happens when an item gets saved and where you can plug into that.
The entry point
The most high level API is of course the
Item class. Saving an item using this API is very simple:
item["MyField"] = "newValue";
EditContext goes out of scope, this causes
item.Editing.AcceptChanges() to be invoked, and our journey down the rabbit-hole begins there.
The Item Manager
Our next step is the highest level item data API in Sitecore: the
ItemManager. At first blush
ItemManager appears to be a rather ugly giant static class, with tons of manipulaton methods:
AddVersion, etc. However, it’s not as bad as it looks.
ItemManager is basically a static facade around the current
The Item Provider
Here’s where things get interesting. You can register multiple item providers in the config under
itemManager/providers. While this appears to provide a lot of power as an extension point – being able to plugin your own high level provider – it unfortunately seems to be rather ungainly. If multiple providers are registered you can access them directly using
ItemManager.Providers, but the other
ItemManager methods only execute against the default provider. You could replace the default provider with your own, but this is less than ideal as it can lead to contention for whose item provider gets used if more than one module wants to extend using it.
Extension points aside, the
ItemProvider is the most broad data API Sitecore provides. It deals in high level objects and abstracts lower level concepts like databases and item definitions away. Most all methods in the
ItemProvider result in invocations of the next lower level construct, the
UPDATE: Nick Wesselman pointed out that some queries will bypass the Item Provider and go directly to the Data Engine. This gist has some examples of the types of things that bypass the Item Provider. In light of this I’d say the item provider is generally not a good candidate for extending things.
The Data Engine
DataEngine (generally accessed by
databaseObject.DataManager.DataEngine) is a very interesting component for extension. Unlike the item provider, the
DataEngine is database-specific. This is also the layer at which item event handlers (e.g. item:saved) are processed, but more interestingly is also a place where you can hook events in an uninterruptible fashion. Regular event handlers, such as
item:saved, are vulnerable to being disabled when someone scopes an
EventDisabler. Generally this is good, as events are disabled for performance reasons during things like serialization bulk load operations. But occasionally, such as in Unicorn, you need an event handler that cannot be disabled. Having a Unicorn event not fire would likely mean data being lost.
The internal architecture of
DataEngine works something like this: – An engine method gets invoked – The engine creates and initializes an
DataEngineCommand<> object, using a stored prototype of the command – The
DataEngineCommand has its
Execute method invoked.
EngineCommand is a generic type, and each action that the engine can take has its own implementation – for example, there is a
Sitecore.Data.Engines.DataCommands.SaveItemCommand class. Each of these command types is bootstrapped by a prototype system on the
DataEngine itself. The engine, by way of its
Commandsproperty, stores copies of each command type. When a new command is needed, the existing prototype command is cloned (
Clone()), initialized (
Initialize()), and then used to execute the action. These prototypes are settable, including via configuration, so this allows you to inject your own derived implementation of each command and thus inject ‘event handler like’ functionality into the
Item Buckets utilizes this type of functionality injection, replacing the
AddFromTemplatePrototypewith its own extended implementation. This is the relevant section of Sitecore.Buckets.config:
<database id="master" singleInstance="true" type="Sitecore.Data.Database, Sitecore.Kernel">
<obj type="Sitecore.Buckets.Commands.AddFromTemplateCommand, Sitecore.Buckets" />
This same functionality could be used to inject non-disableable event handler type functionality into other events, for example by replacing the
SaveItemPrototype. Unfortunately this method also suffers from possible contention issues if multiple modules wished to patch these commands, so be careful when using them.
DataEngine creates and executes a command, the command generally does two things: * Pass its task down to the Nexus
DataApi to be handled * Fire off any normal item event handlers (item:saved), as long as events are not disabled
Most of the methods on
DataEngineCommand are virtual so you could extend them. The most obvious candidate, of course, is the
DoExecute method that performs the basic action of the command.
Further down the rabbit-hole we arrive to the Nexus APIs.
The Nexus assembly, due to it containing licensing code I believe, is the only obfuscated assembly in Sitecore. This makes following what the data API is doing rather difficult, but I’m pretty sure I traced it back out as a call to the
databaseItem.DataManager.DataSource APIs, which thankfully are not obfuscated. It’s worth noting that the Nexus APIs appear to also be using a separatecommand-class based architecture internally. Given the sensitive nature of the Nexus assembly however, I would look elsewhere for extension points.
Sitecore.Data.DataSource appears to largely be a translation layer between slightly higher level APIs, where things like
Item are used, and the low level API of the data providers (which use constructs like IDs and
ItemDefinition instead). The data source is very generic and does not appear to be designed for extension, which is fine because we can extend both above and below it.
DataSource eventually make calls down to static methods on the
The Data Provider
There are two faces to the
DataProvider class. The internal static methods that the
DataSourceis invoking are helper methods that invoke non-static methods on all of the data providers attached to the database. (Yes, you can have multiple data providers within the same database, which may not even look at the same backend database at all…hehe) The actual data provider instances are the lowest level data APIs in Sitecore. They deal with primitive objects and have a lot of unspoken rules about them that make them tougher to implement than most other extension points. However they are also the most powerful extension point there is. Using a data provider you can manipulate the content tree in nearly any fashion for example Rhino, a data provider that makes serialized item files on disk appear to be real content items in Sitecore.
You’re still reading this?
Hopefully this is a useful post for some crazy nuts like myself who like to bend the guts of Sitecore. It’s definitely a long and byzantine road from saving an item to it getting to the database. Rather amazing that Sitecore is as fast as it is with all of these layers. Part of me suspects that some are baggage from Sitecore 4 for backwards compatibility.