Serialization

Data structures and types can be serialized to text or binary formats using Pt's serialization. This is used within the framework to load and store data or to implement remote procedure calls. It is extensible to work with all kinds of types, including STL containers, PODs (plain old data types), builtin language types or custom data types. The framework separates the process of composing and decomposing types from the formatting stage, resulting in a two-phase serialization process. This also allows to resolve and fixup shared pointers or references.

A type is serializable, if two operators are implemented to compose and decompose it to a SerializationInfo. The SerializationContext provides improved memory management, a mechanism to generate IDs for shared pointers and a way to further customize or override serialization for a type. Alternatively, performance can be increased by implementing a Composer or Decomposer for the type, however it is more complicated to do so.

Various formats are supported by implementing Formatters. Other modules of the framework also implement Formatters, for example to support serialization to XML. The Serializer and Deserializer combine a Formatter and a SerializationContext, manage composition and decomposition and thus form the high-level interface for the serialization of a set of types.

Serialization of Value Types

The Pt framework already provides serialization support for the C++ builtin types and the types provided by the C++ standard library, like std::string or std::vector. To make custom types serializable, the serialization operators have to be implemented. The next example shows a simple data type and the declarations of the serialization operators:

struct Address
{
Address()
: code(0)
{}
std::string country;
std::string city;
std::string street;
unsigned code;
};
void operator >>=(const Pt::SerializationInfo& si, Address& address);
void operator <<=(Pt::SerializationInfo& si, const Address& address);

Similar to the insertion and extraction operators for standard C++ iostreams, one operator has to be overloaded to serialize a type and another one to deserialize it. The types of operators (<<= and >>=) indicate that this is an assignment operation. Each type has to be composed from or decomposed to Pt::SerializationInfo objects, which form a tree representing the object graph. It contains all meta information so composition and decomposition can be separated from formatting and parsing. Building up the tree is highly optimized and, for example, requires only very few allocations.The next example shows the definition of the serialization operator:

void operator<<=(Pt::SerializationInfo& si, const Address& address)
{
si.addMember("country") <<= address.country;
si.addMember("city") <<= address.city;
si.addMember("street") <<= address.street;
si.addMember("code").setUInt32(address.code);
si.setTypeName("Address");
}

The SerializationInfo passed by reference to the operator is meant to represent an Address object in the object graph. SerializationInfo child nodes are added for each member variable using addMember(), which also assigns the name. Member types can be serialized to the returned SerializationInfo using their specific overload of the operator. For builtin integer types it is recommended to use a setter (here setUInt32()) instead of the serialization operator, to be specific on the type. Integer types could potentially be serialized differently depending on the platform. For example, a long could be serialized as a 32-bit or a 64-bit integer type. Finally, the type name is set for the parent node, representing the Address object.

The deserialization operator performs the same process, just in reverse. The SerializationInfo object passed to it contains the meta information for all members. SerializationInfo child nodes can be obtained by name using getMember() and members can be deserialized with their overloads of the deserialization operators. The following example illustrates this:

void operator>>=(const Pt::SerializationInfo& si, Address& address)
{
si.getMember("country") >>= address.country;
si.getMember("city") >>= address.city;
si.getMember("street") >>= address.street;
si.getMember("code") >>= address.code;
}

Serialization of Pointers

The serialization of pointers is identical to the serialization of values if the object pointed to is owned by the pointer. The pointer can simply be dereferenced and the object it points to passed to the serialization operator. However, there are rare cases, when pointers reference other objects in the object set to be serialized. These weak pointers can also be serialized using the serialization operators provided by the framework:

namespace Pt {
template <typename T>
void operator<<=(SerializationInfo& si, const T* ptr);
template <typename T>
void operator >>=(const SerializationInfo& si, T*& ptr);
}

Any pointer which is serialized or deserialized will be treated as a weak pointer. Depending on the format this will format or parse a reference id pointing to another object in the object stream. It does not matter in which order weak pointers and the objects pointed to are serialized, it works for both forward and backward references.

Serialization of Containers

The framework already implements operators to serialize the containers of the standard C++ library. Container types from other libraries can be serialized in a similar fashion. The next example shows the serialization operator for std::vector:

template <typename T, typename A>
inline void operator <<=(SerializationInfo& si, const std::vector<T, A>& vec)
{
typename std::vector<T, A>::const_iterator it;
for(it = vec.begin(); it != vec.end(); ++it)
{
si.addElement() << Pt::save() <<= *it;
}
si.setTypeName("std::vector");
}

Each element of the vector is added to the parent SerializationInfo using addElement(). The modifier Pt::save() marks the element as a type that is potentially referencable by a weak pointer in another object. This improves performance, because the serializer only has to consider 'reachable' objects when pointers are also serialized. At the end, a type name is set and the type of the parent SerializationInfo note is set to 'sequence'. This is neccessary to allow empty vectors. Deserialization is somewhat similar:

template <typename T, typename A>
inline void operator >>=(const SerializationInfo& si, std::vector<T, A>& vec)
{
T elem = T();
vec.clear();
vec.reserve( si.memberCount() );
SerializationInfo::ConstIterator end = si.end();
for(SerializationInfo::ConstIterator it = si.begin(); it != end; ++it)
{
vec.push_back(elem);
it >> Pt::load() >>= vec.back();
}
}

As an optimization, the vector reserves the memory for its element first. Then all elements in the parent SerializationInfo are deserialized into the back of the vector. The modifier Pt::load() marks the deserialized object as being referencable by a pointer in another object, similar to Pt::save().

Considering the implementations of the serialization operators for std::vector, one significant problem becomes apparent. It consumes a lot of memory for containers with many elements, because it requires the complete tree of SerializationInfo objects representing the container to be in memory. It can not work incrementally.

An alternative to implementing the serialization operators is to specialize Pt::BasicComposer and Pt::BasicDecomposer. The default implementations use the serialization operators and a tree of SerializationInfo objects, which is fine for small and medium sized objects. The parsing anfd formatting layers interact with these two classes to compose or decompose the actual types and this layer can be completely customized. The following examples show the specializations for std::vector:

namespace Pt {
template <typename T>
class BasicComposer< std::vector<T> > : public Composer
{
public:
BasicComposer(SerializationContext* context = 0)
: _type(0)
{
_elemComposer.setParent(this);
}
void begin(std::vector<T>& type)
{
_type = &type;
_type->clear();
}
protected:
virtual void onSetId(const char* id, std::size_t len)
{ }
{
_type->push_back( T() );
_elemComposer.begin( _type->back() );
return &_elemComposer;
}
private:
std::vector<T>* _type;
BasicComposer<T> _elemComposer;
};
}

The specialized BasicComposer needs to inherit Pt::Composer, which is the interface used by the deserializer to report parse events. The virtual functions onBeginElement() and onSetId() are overridden to build the vector from the parse events. The function onBeginElement() is called, when the begin of a vector element was parsed. It should set up and return a composer for the new element in the vector. A constructor is required that optionally takes a reference to a Pt::SerializationContext. The context can be used in onSetId() to map a reference ID to the vector being composed, if another object has a weak reference to it (not shown here). The begin() function is also required, to set up the composer to start composing a type.

The specialization of a BasicDecomposer is somewhat similar. It needs to derive Pt::Decomposer, which is the interface used by the deserializer to drive output to a formatter.

namespace Pt {
template <typename T>
class BasicDecomposer< std::vector<T> > : public Decomposer
{
public:
BasicDecomposer(SerializationContext* context = 0)
: _type(0)
{
_elemDecomposer.setParent(this);
}
void begin(const std::vector<T>& type, const char* name)
{
_type = &type;
_name = name;
}
protected:
void onBeginFormat(Formatter& formatter)
{
formatter.beginSequence(_name.c_str(), "std::vector", "");
_it = _type->begin();
}
Decomposer* onAdvanceFormat(Formatter& formatter)
{
if( _it != _type->begin() )
{
formatter.finishElement();
}
if( _it == _type->end() )
{
formatter.finishSequence();
return this->parent();
}
formatter.beginElement();
_elemDecomposer.begin(*_it, "");
_elemDecomposer.beginFormat(formatter);
++_it;
return &_elemDecomposer;
}
private:
std::string _name;
const std::vector<T>* _type;
BasicDecomposer<T> _elemDecomposer;
typename std::vector<T>::const_iterator _it;
};
}

It also requires a constructor to optionally take a reference to a SerializationContext and a begin() function to set up the decomposer to start decomposing a vector by a name. The virtual function onBeginFormat() and onAdvanceFormat() are overridden to react to formatting events. The first is called when the vector should be started to be formatted. Therefore beginSequence() is used to format the begin of the sequence and we keep an iterator to the begin of the vector. The function onAdvanceFormat() is called, when the next element can be formatted. To finish the current element, finishElement() is called, beginElement() is called to start the next element and finishSequence() is called when the end of the vector is reached. A decomposer is returned, either for the current vector element, or the parent decomposer, if all elements were decomposed.

Both, the BasicComposer and BasicDecomposer may delegate to other composers and decomposers, which might or might not be specialized. In the example for std::vector, composers and decomposers for the element type are used. Note, that this works for a vector of vectors right away. It is usually a good compromise to rely on the serialization operators for small or medium sized data types, but specialize the coposer and decomposer for container types.