Serialization/Deserialization Series: Custom Serialization

For the last post in this series, we build a custom serializer/deserializer with poco classes.

May 25th, 2015 • Develop •
0 (0 votes)
Custom Formatter - Mail Room

Have you ever had a file format that was a little weird, but it could be used in future projects?

"We have a custom file format that we use. Can you read that file format?"

Yeah, I've had those requests as well.

While I discussed how to deserialize XML and JSON into objects in our previous posts, today we'll go over the steps on how to create a customized serializer/deserializer for those "unusual" file formats.

How Many Serializers Are There?

As we decide which base serializer to use, we need to know what's available to us.

JavaScriptSerializer - Used to convert JSON into objects.
XmlSerializer - Used for XML streams.
BinaryFormatter - Used for Storage or Socket-Based Network Streams.
Each one of these has a particular purpose. However, none of these would work for what we're looking for when it comes to custom serialization.

Building our own Serializer

Since we want to read a CSV file and we don't have a base to build off of, we'll need to start off by using the IFormatter interface for our implementation.

The BinaryFormatter uses the IFormatter and contains the following interface:

Property/Method Description
Binder Gets or sets the SerializationBinder that performs type lookups during deserialization.
Context Gets or sets the StreamingContext used for serialization and deserialization.
SurrogateSelector Gets or sets the SurrogateSelector used by the current formatter.
Deserialize() Deserializes the data on the provided stream and reconstitutes the graph of objects.
Serialize() Serializes an object, or graph of objects with the given root to the provided stream.

Our skeleton formatter will look like this:

public class CsvFormatter<T> : IFormatter where T : class
{
    public object Deserialize(Stream serializationStream)
    {
        throw new NotImplementedException();
    }
    public void Serialize(Stream serializationStream, object graph)
    {
        throw new NotImplementedException();
    }
    public ISurrogateSelector SurrogateSelector { getset; }
    public SerializationBinder Binder { getset; }
    public StreamingContext Context { getset; }
}

The reason we made this a generic is because we need to know the type of object to create when building our list of objects.

Our constructor should have some initial parameters, as a delimiter and a way to determine if there are headers in the file. Trust me, we'll need these later.

private readonly char _delimiter;
private readonly bool _firstLineIsHeaders;
public CsvFormatter(char delimiter, bool firstLineIsHeaders = false)
{
    _delimiter = delimiter;
    _firstLineIsHeaders = firstLineIsHeaders;
}

Now that we've added these to our CsvFormatter class, we need our Serializer process.

Serializing the Object

The serialization process is nothing special. Streams and reflection are the essential components here.

First, we make sure that we have an IEnumerable being passed into the Serialize method. This is strictly for CSV so that it will be a list of objects.

if (!(graph is IEnumerable))
    throw new Exception("This serialize will only work on IEnumerable.");

Then, we create the headings based on the types' properties and String.Join them together with the delimiter.

var headings = typeof(T).GetProperties();
var headerNames = headings.Select(e => e.Name.ToString());
var headers = String.Join(new String(_delimiter, 1), headerNames);

Next, we start up the StreamWriter object and write out the headers.

using (var stream = new StreamWriter(serializationStream))
{
    if (_firstLineIsHeaders)
    {
        stream.WriteLine(headers);
        stream.Flush();
    }

One thing to pay attention to is when this StreamWriter closes, it will also close the MemoryStream as well, but there is a simple workaround for that which I'll explain later.

Finally, we get the SerializableMembers, loop through them to retrieve the values as a string, and then String.Join the values with our delimiter that we assigned earlier through the constructor.

    var members = FormatterServices.GetSerializableMembers(typeof(T), Context);
    foreach (var item in (IEnumerable)graph)
    {
        var objs = FormatterServices.GetObjectData(item, members);
        var valueList = objs.Select(e => e.ToString());
        var values = String.Join(new String(_delimiter, 1), valueList);
        stream.WriteLine(values);
    }
    stream.Flush();
}

Now that we have our Serializer, the Deserializer should be a breeze.

Deserializing our Object

The deserialization will take our string from a stream and build our objects. When we read the stream, we need to know if we are going to read in a header or not.

If you remember, we passed in the firstLineIsHeaders boolean parameter through the constructor to determine if we want to skip headers or include headers.

IList list;
using (var sr = new StreamReader(serializationStream))
{
    // Optional if reading headers! Example: UserId, FirstName, Title
    if (_firstLineIsHeaders)
    {
        string[] headers = GetHeader(sr);
    }

If you wish to use the headers later, you can but, to be honest, this was merely a reader to skip the header line and get to the meat of the data.

Next, we build a generic list of our type passed in through our class type (T).

var listType = typeof (List<>);
var constructedListType = listType.MakeGenericType(typeof(T));
list = (IListActivator.CreateInstance(constructedListType);

After creating the generic list, we start traversing through the stream by reading lines and splitting the lines by using the line.Split method.

while (sr.Peek() >= 0)
{
    var line = sr.ReadLine();
    var fieldData = line.Split(_delimiter);
    var obj = FormatterServices.GetUninitializedObject(typeof (T));
    var members = FormatterServices.GetSerializableMembers(obj.GetType(), Context);
    object[] data = new object[members.Length];
    for (int i = 0; i < members.Length; ++i)
    {
        FieldInfo fi = ((FieldInfo)members[i]);
        data[i] = Convert.ChangeType(fieldData.ElementAt(i), fi.FieldType);
    }
    list.Add((T)FormatterServices.PopulateObjectMembers(obj, members, data));
}

We, then, use the FormatterServices to get an object, get the SerializableMembers, and create an array to hold each piece of data for each field.

Finally, we loop through each member and get the data in the fields using reflection and add the item to the list.

Now we have a complete CsvFormatter Serializer/Deserializer.

public class CsvFormatter<T> : IFormatter where T: class
{
    private readonly char _delimiter;
    private readonly bool _firstLineIsHeaders;
    public CsvFormatter(char delimiter, bool firstLineIsHeaders = false)
    {
        _delimiter = delimiter;
        _firstLineIsHeaders = firstLineIsHeaders;
    }
    public ISurrogateSelector SurrogateSelector { getset; }
    public SerializationBinder Binder { getset; }
    public StreamingContext Context { getset; }
    public object Deserialize(Stream serializationStream)
    {
        IList list;
        using (var sr = new StreamReader(serializationStream))
        {
            // Optional if reading headers! Example: UserId, FirstName, Title
            if (_firstLineIsHeaders)
            {
                string[] headers = GetHeader(sr);
            }
            var listType = typeof (List<>);
            var constructedListType = listType.MakeGenericType(typeof(T));
            list = (IListActivator.CreateInstance(constructedListType);
            while (sr.Peek() >= 0)
            {
                var line = sr.ReadLine();
                var fieldData = line.Split(_delimiter);
                var obj = FormatterServices.GetUninitializedObject(typeof (T));
                var members = FormatterServices.GetSerializableMembers(obj.GetType(), Context);
                object[] data = new object[members.Length];
                for (int i = 0; i < members.Length; ++i)
                {
                    FieldInfo fi = ((FieldInfo)members[i]);
                    data[i] = Convert.ChangeType(fieldData.ElementAt(i), fi.FieldType);
                }
                list.Add((T)FormatterServices.PopulateObjectMembers(obj, members, data));
            }
        }
        return list;
    }
    private string[] GetHeader(StreamReader sr)
    {
        string line = sr.ReadLine();
        return line.Split(_delimiter)
            .ToList()
            .Select(e=> e.Trim())
            .ToArray();
    }
    public void Serialize(Stream serializationStream, object graph)
    {
        if (!(graph is IEnumerable))
            throw new Exception("This serialize will only work on IEnumerable.");
        var headings = typeof(T).GetProperties();
        var headerNames = headings.Select(e => e.Name.ToString());
        var headers = String.Join(new String(_delimiter, 1), headerNames);
        using (var stream = new StreamWriter(serializationStream))
        {
            if (_firstLineIsHeaders)
            {
                stream.WriteLine(headers);
                stream.Flush();
            }
            var members = FormatterServices.GetSerializableMembers(typeof(T), Context);
            foreach (var item in (IEnumerable)graph)
            {
                var objs = FormatterServices.GetObjectData(item, members);
                var valueList = objs.Select(e => e.ToString());
                var values = String.Join(new String(_delimiter, 1), valueList);
                stream.WriteLine(values);
            }
            stream.Flush();
        }
    }
}

Ok, now how do you use it? Like this:

To serialize an IEnumerable, use this code:

var names = new List<User>
{
    new User {UserId = 1, FirstName = "Jeff", Title = "CEO"},
    new User {UserId = 2, FirstName = "Mark", Title = "CTO"},
    new User {UserId = 3, FirstName = "Jennifer", Title = "CIO"},
    new User {UserId = 4, FirstName = "Marcia", Title = "CMO"}
};
var ms = new MemoryStream();
var serializer = new CsvFormatter<User>(','true);
serializer.Serialize(ms, names);
var newStream = new MemoryStream(ms.ToArray());
var nameString = newStream.ToStringContents();

If you noticed the creation of another MemoryStream, let me explain why I created a new one.

The first one automatically closes when the StreamWriter finished in the Serialize method. Since it's not readable or writeable, you can create a new one by passing the old MemoryStream bytes into a new MemoryStream to retrieve the contents out of it.

Simple and neat little trick for "non-readable or writeable" streams.

To deserialize that "nameString" into objects, use this code:

var names = @"UserId, FirstName, Title
                1, Jeff, CEO
                2, Mark, CTO
                3, Jennifer, CIO
                4, Marcia, CMO";
var serializer = new CsvFormatter<User>(',');
List<User> result;
using (var stream = names.ToStream())
{
    result = (List<User>)serializer.Deserialize(stream);
}

Again, if you are wondering about the ToStream() in the deserializer and the ToStringContents() in the Serializer, they are located in the post entitled "10 Extremely Useful .NET Extension Methods."

Conclusion

Today, I've shown you how to create your own Serializer/Deserializer using the IFormatter to create a CSV formatter.

If you don't have a baseline class you can use, like the BinaryFormatter or XmlSerializer, then use the IFormatter to build your own.

It's probably the easiest way to read and write data to objects without writing a ton of code.

Have you created a Serializer or Deserializer? What was the file format? Post your comments below.

Was this informative? Share it!

Looking to become a better developer?

Sign up to receive ReSharper Design Pattern Smart Templates, ASP.NET MVC Guidelines Checklist, and Newsletter Updates!

Picture of Jonathan Danylko

Jonathan Danylko is a freelance web architect and avid programmer who has been programming for over 20 years. He has developed various systems in numerous industries including e-commerce, biotechnology, real estate, health, insurance, and utility companies.

When asked what he likes to do in his spare time, he replies, "Programming."

comments powered by Disqus