I’ll write a few more parts of my little Indigo series next weekend (too busy during the week), and will move from “throw arbitrary XML on the wire” to typed messages. However, before I’ll do so, I am curious about your opinion and I am asking you to comment (on the blog-site) on which of the following two declarations you would prefer.

I should probably quickly explain a few things before I let you look at the code snippets: [DataContract] attribute essentially replaces [Serializable] for Indigo and is used to label classes than can be serialized by the System.Runtime.Serialization infrastructure into XML or into a binary representation. So the serialization control through attributes is unified and independent of the actual output flavor you choose at runtime. The [DataMember] attribute labels fields or properties that are part of the data contract and should be (de)serialized. Unlike the current serialization models of Remoting (System.Runtime.Remoting.Formatters) and the XML Serializer (System.Xml.Serialization) where the serializers grab anything public, this model is strictly opt-in, meaning that public fields and properties do not get serialized unless you explicitly label them with [DataMember]. Even more surprising, the new serialization infrastructure does work with fields that are private.

I have a clear preference for one of these two declarations and have also what I think to be a solid explanation for why I prefer it, but before I elaborate, I am interested in your opinion.

Version A

[DataContract]
public partial class Address
{
    [DataMember("Company")]
    private string company;
    [DataMember("RecipientName")]
    private string recipientName;
    [DataMember("AddressLine1")]
    private string addressLine1;

    ... more fields ...

    public string Company
    {
        get { return company; }
        set { company = value; }
    }
   
    public string RecipientName
    {
        get { return recipientName; }
        set { recipientName = value; }
    }
   
    public string AddressLine1
    {
        get { return addressLine1; }
        set { addressLine1 = value; }
    }

    ... more properties and methods and stuff ...
}

 Version B

[DataContract]
public partial class Address
{
    private string company;
    private string recipientName;
    private string addressLine1;

    ... more fields ...

    [DataMember("Company")]
    public string Company
    {
        get { return company; }
        set { company = value; }
    }

    [DataMember("RecipientName")]
    public string RecipientName
    {
        get { return recipientName; }
        set { recipientName = value; }
    }
    [DataMember("AddressLine1")]
    public string AddressLine1
    {
        get { return addressLine1; }
        set { addressLine1 = value; }
    }

    ... more properties and methods and stuff ...
}

Consider this obvious statement: The class is declared in this way to provide programmatic access to and encapsulation of data that will eventually be serialized into some wire format or deserialized from a wire format.

Tuesday, March 01, 2005 10:58:25 PM UTC
I think either is just fine. Isn't the whole point of the DataContract to produce a schema that defines what data is serialized? How you represent that in one piece of code compared to another shouldn't matter. You might use one version in one place and the other in a different case. Because the contract is the same, the data works with both.

As an example, you might later migrate from a field to a property so that you can do some mapping of the data before you serialize/after you deserialize but you don't need it to be a property in the first version to support this.
Wednesday, March 02, 2005 2:14:36 AM UTC
I think if the goal of this exercise is to demonstrate an Address instance that will be serialized onto some wire format and/or deserialized on the other end, then version (A) better demonstrates that goal. It is clearer that the current "state" data of an Address instance will be serialized, as opposed to its public interface.

Generally speaking, version (A) just scared me for some reason when I first looked at it. Too many years of the OO - "my private stuff is my private stuff". But, looking at your disclaimer of:
"The class is declared in this way to provide programmatic access to and encapsulation of data that will eventually be serialized into some wire format or deserialized from a wire format." Then I think (A) better demonstrates this.
Scott P Stewart
Wednesday, March 02, 2005 2:37:30 AM UTC
(A) better reflects the idea that the object model (including properties) used by app can be quite different from the wire contract. But this is only comparison at single element level.

The more interesting question is: should your angle bracket hierarchy match the object hierarchy used by business logic?
Gia
Wednesday, March 02, 2005 2:42:46 AM UTC
I agree with Scott, in addition I actually prefer "public sealed class" or "public struct". Also depends on useage of the class, decoupling the message from the type is prefered. What is your prescription for "DataContract" versioning in the Indigo world?
Marlon Smith
Wednesday, March 02, 2005 9:49:28 PM UTC
#1 definetely. This becomes more obvious if you ever want to provide a higher level programatic access to your data. For example:

[DataMember("TotalMilliseconds")]
private int millis;

public TimeSpan TimeSpan
{
get { return TimeSpan.FromMilliseconds(this.millis); }
set { this.millis = (int)value.TotalMilliseconds; }
}

I think is makes sense to keep your state and wire format aligned. Getters and setters are a view onto that state, not a place to enforce business logic.
Wednesday, March 02, 2005 11:01:37 PM UTC
I concur, definitely A. How would B work if I didn't have public properties on serializable content ? Sample B reminds me of the crytpic ATL C++ macros used for IPeristStream implementation, and no one wants to go back there :-)
brad king
Friday, March 04, 2005 8:14:12 PM UTC
I like #1 for serialisation but not for deserialisation.

A class has the responsibility to make sure that its data makes sense (guarantee semantics/maintain invariant). That is the whole reason to encapsulate state. Deserialising data from the wire into the private fields breaks the encapsulation principle and allows for state that does not match the class invariant.

If it were possible to declaratively define constraints on the private field as well and get these constraints checked regardless of how the field is updates, then that would make me a lot more happy about #1. That would allow for the specification of the semantical aspects (allowed value ranges) of a data contract as well as the syntactical aspects (value names).

Thinking more about it, I am becoming more hesitant to bind the data contract to the private fields. It would make it harder to change the class internals. One example is that in data contracts I always define identifiers as opaque strings, but tend to use more optmised data types such as integers internally. The internal representation may change, but this should not affect the external representation and semantics on the wire.

So #2 is for me the way to go. Not the most elegant solution, but definitely the more resilient one. Contracts are supposed to be pretty static, implementations flexible.
Tuesday, March 08, 2005 8:47:58 PM UTC
Hi

1 - The concept of messages as classes is simply fantastic! At YDreams we use the concept for about two years in our messaging system and I can say it is clean and quite simple to use. Attributes, attributes, thank you very much!

2 – My vote goes for both. Private fields could be useful to prevent the user of a class to access the field without reflection. It prevents bugs! Performance is a drawback, though. To serialize private fields you need reflection and therefore your formatter will be slower, which can represent a strong shortcoming.
If you have both, the decision is in your hands.

Manuel
Manuel Costa
Thursday, March 10, 2005 10:51:45 AM UTC
I have to go with B, DataContracts need to be treated as public access and giving essentially public access to private members is never a good idea. This data coming into classes from external sources needs to be treated as suspect and therefore needs to validated/authenicated...something before it can be allowed into the private trusted space. In many ways, this may have nothing to do with trust but also with synchronization, If something can write directly to private space, can I control when it writes, I suspect not.
Doug Nelson
Thursday, March 17, 2005 6:22:43 AM UTC
Presumably, we're talking about best practices here. Indigo will not limit us to one or the other.

In general, I prefer to (de)serialize the internal state (option A), and leave state validation to a method executed post-deserialization. I'm not particularly bothered by the fact that it makes the internal state in some sense "public". It was always public in this sense using reflection anyways. Information hiding is a fair-use guideline, not security.
Michael Kelly
Thursday, March 24, 2005 5:10:28 PM UTC
Good thread. I'm late to the party, but looks like Clemens hasn't spilled the beans yet.

For me, the mechanism looks like a way to move state, period. In this light, (A) is the obvious convention. The class interface is for clients of an object; that is something different.

I might disagree with Jamie's last sentence, depending on his use of 'enforce': I think getters/setters/props are *exactly* the place to implement business logic. But if he means enforcing some state consistency, Michael's post-deserialization method is my preference.

I admit to some vague concern about trusting the wire: when do we need this kind of enforcment, and whose job is it?
Saturday, March 26, 2005 10:09:27 PM UTC
"The class is declared in this way to provide programmatic access to and encapsulation of data that will eventually be serialized into some wire format or deserialized from a wire format."

That being the case I would choose B. It gives more control over what your actually providing in the contract.

Since private members are serialized if the [DataMember]attribute is provided, By putting the [DataMember] attribute on the property we force the contract through the Property get/set would seem like the best practice to me.
Saturday, April 09, 2005 12:18:29 AM UTC
Definitely C:

[DataContract]
public struct Address
{
[DataMember("Company")]
public string Company;
[DataMember("RecipientName")]
public string RecipientName;
[DataMember("AddressLine1")]
public string AddressLine1;
...
}

Classes are for implementing interfaces. Structs are for organizing data in a "just the data, ma'am," kind of way that most directly maps to messages and serialization. We are, after all, dealing with a copy of the data passed across the boundary, in line with the copy semantics of structs. Previously we sometimes were forced to use classes instead of structs just to get nullability, but no more in Whidbey.
Thursday, September 22, 2005 5:43:21 PM UTC
OK seems to work now with this information
Tuesday, December 06, 2005 12:16:35 AM UTC
B
Trevor
Tuesday, July 04, 2006 11:46:51 PM UTC
OK seems to work now with this information
Comments are closed.