All the wonderful loose coupling on the service boundary doesn't help you the least bit, if you tightly couple a set of services on a common store. The temptation is just too big that some developer will go and make a database join across the "data domains" of services and cause a co-location dependency of data and schema dependencies between services. If you share data stores, you break the autonomy rule and you simply don't have a service.

Separating out data stores means at least that every service has it's own "tablespace" or "database" and that in-store joins between those stores are absolutely forbidden. If you have a service managing customers and a service managing invoices, the invoice service must go through the service front for anything that has to do with customer data.

If you want to do reporting across data owned by several services, you must have a reporting service that pulls the data through service interfaces, consolidates it and creates the reports from there.

Will this all be a bit slower than coupling in the store? Sure. It will make your architecture infinitely more agile, though and allows you to implement a lot of clustering scalability patterns. In that way, autonomy is not about making everything a Porsche 911; it's about making the roads wider so that nobody (including the Porsche) ends up in a traffic jam all the time. It's also about paving roads that not only let you from A to B in one stretch, but also have something useful called "exits" that let you get off or on that road at any other place between those two points.

If you decide to throw out you own customer service and replace it with a wrapper around Siebel, your invoice service will never learn about that change. If the invoice service were reaching over into co-located tables owned by the (former) customer service, you'd have a lot of work to do to untangle things. You don't need to do that untangling and all that complication. As an architect you should keep things separate from the start and make it insanely difficult for developers to break those rules. Having different databases and, better yet, to scatter them over several machines at least at development time makes it hard enough to keep the discipline.

Thursday, May 27, 2004 3:45:34 AM UTC
Nice post, especially the Porsche 911 vs. wider roads & exists metaphore! :)

I think this is one of the basic rules about SOA and/or your services and the first thing most developers will complain about. The performance issue is really something they can't see beyond. When defining services, they want A LOT in THE LEAST possible amount of services, just because of the performance issue, that isn't even there yet ;)

Anyway, I've got two questions.

1 - Forgot the first, nevermind this one! ;)
2 - How about deleting? Referential Integrity is completely gone when you, for example, delete a product when there's an order referencing that product. Taken, that the order is in a different service, for the sake of the example.

Now we have two problems.
- In the order service, we can never assume that the product is still alive. We might not be responsible for the product service or its implementation will be replaced by an external application or something, like the Siebel example you gave.
- In both services, we can never just delete something, only mark it as deleted or something. Even not in the order service. We presume the orders are ours and no one will reference to it, because what's the point? But you might never know what the future brings, so again, we can't delete rows.

Any ideas, suggestions, anything? I'd like to hear much, much more about SOA! Although everyone has definitions on what it is supposed to be, but real implementations (with the complete story on how they came to the services and everything) I haven't seen yet! :(
Thursday, May 27, 2004 3:49:36 AM UTC
Never delete. Buy more disks. You'll want to data mine in 5 years.
Clemens Vasters
Thursday, May 27, 2004 8:57:57 PM UTC
To start with, I believe in SOA as we need to change how modern enterprise application are built! The update process I completely agree with for holding service boundaries.

Hard and fast rules however rarely hold when you take the example to an extreme. The absurdity quickly becomes apparent and you can fall back from the extreme to find the point where you get the best bang for the buck. Reports/Lookups crossing boundaries are delicate decisions to make an extreme rule to never cheat in some way.

To not cache with a background sync to refresh or do a database join as is appropriate would be like having to go to Saudi Arabia for each tank of gas in that porsche or really to an extreme, for each rotation of the cylinder that pulls in more gas from the injectors/carburetor. Of course we 'cache' the gasoline at gas stations, refineries, etc along the supply chain. If you want to keep going with analogies and state the gas station is a local service that caches data, etc... sure. At some point though, the speed of having the data be there for you (an in-flight refueling of a Jet by a fuel tanker?) is a necessity that an architect has to weigh the flexibility versus performance vs complexity trade offs and make a decision.

I guess what I am saying is we need to look at these use cases where local caching is needed and look at solving those scenarios to make SOA practical as something more than Mainframes, FTP and proprietary flat file formats handled SOA 20 years ago. I like any platform, SOAP, XML and Schema much better than those days and we can take SOA to another level IF we have practical guidelines for those looking for direction and leadership.
Friday, May 28, 2004 12:47:25 AM UTC
I've read a ton of papers on SOA (and i agree with Dennis about real implementations) and this concept of application data isolation is a very popular one but has cause a few questions for me (performance has never been one of them).

1. If you currently live in a unified data scenario (as I do) is it absolutely necessary to migrate from this architecture in order to have any benefit from SOA? Seems like I could enforce strict policy about accessing data via designated services only even though the data is in the same store (maybe i hide the database from everyone and pretend.. lol). It just seems like a huge task to do this, but maybe its a matter of now or later?

2. How do you handle referential integrity? Not for deletes mind you, but for actual pointing to a record. For instance you said, "If you decide to throw out you own customer service and replace it with a wrapper around Siebel, your invoice service will never learn about that change". To be perfectly honest, I don't think this is as easy as you make it sound. If I make an invoice for a customer, what is stored in the invoice to indicate to me which customer it's for? Some sort of identity for the customer? Where is this identity generated from? My guess would be from the CRM service (which is my own, home grown system). So here we are with a pointer in the invoice system that can used by the CRM service to find the customer.

All is good.

But now we wipe out our CRM implementation and replace it with CRM-X. What just happened to the pointer sitting in the invoice system? It has become invalid because CRM-X assigns keys to records a different way (say with a GUID rather than an int). Unless your contract defined the customer key in a VERY general way (say a string?) you could have issues here right? Maybe this is rule #1 when defining a contract? But my guess is that the tendency of most people will be to define the contract in terms of what they have at the moment.

No problem (assuming we defined our contract right).

When we import the data into CRM-X we just preserve the old key as some meta-data that can be used to map from the key the invoice system has. But now I have to decipher the key and determine which system it was from in my service and then load the data based on a different mechanism (this is wrapper you mention)... seems like a lot of work, but manageable. And I suppose if CRM-X defines a key in similar terms (say an int) I could just force my old keys into the new system and forget about the mapping all together. Hrmmm not bad. But I just think that there is a ton of potential for just as much entangling either way... but I guess the good thing is that you're only untangling in one place.

There is also the small issue that if you display the key anywhere in the consuming application you would end up with a noticeable change which may cause confusion. A list of int-based record id's in a list all of sudden starts counting in GUID... so be sure to never display keys anywhere in an application.

There is a lot to think through and to un-learn before you can do this for real... which is where an example implementation would some in handy because it would help you avoid making any of these mistakes.

I think I just talked myself down off the ledge here... but since I took all this time writing this ramble... I’ll post it. :)

Maybe I’ll think of something else later... work to do now.
Comments are closed.