Pages

Monday, February 9, 2015

TMP Powered C++ Pretty Printer

Being able to pretty print STL containers is an important task for any application that uses the STL. Since standard containers don't overload operator << nor provide a printing function, having a modular,generic pretty printing class allows one to print containers with a programmer-friendly interface that makes it easy to override default behavior and customize printing, provide printing functionality for groups of containers that behave similarly semantically or otherwise, and introduce custom types.



The solution I came up with borrows heavily from Stephan T. Lavavej's solution, which he generously posted and even made a video about it on Channel 9 as a part of his Advanced STL series. Please check out his solution:

http://channel9.msdn.com/Series/C9-Lectures-Stephan-T-Lavavej-Advanced-STL/C9-Lectures-Stephan-T-Lavavej-Advanced-STL-6-of-n

His code is also available publicly.
https://onedrive.live.com/?cid=E66E02DC83EFB165&id=E66E02DC83EFB165%21293


My code is currently available on Github along with a demo program:

https://github.com/Hincoin/pretty_print



There are two primary differences between my solution and that of Stephan's, as well as a few smaller differences.  The first way in which it differs is how my code checks for containers. The only change comes within the non-specialized is_container struct that accepts arbitrary type T and determines if it is a container. T in this case, can be any type except std::basic_string, std::tuple, or std::pair, since is_container is specialized to handle those types.

Here is my non-specialized is_container:

template<typename T>
struct is_container
{

private:
template<typename U>

static std::true_type test(typename U::const_iterator *);
template<typename U>

static std::false_type test(...);

template<typename U>
static U makeU();

template<typename U, class = void>
struct CheckBeginEnd : public prettyprint::FalseType{};

template<typename U>
struct CheckBeginEnd<U, prettyprint::void_t<decltype(std::begin(std::declval<U&>())), decltype(std::end(std::declval<U&>()))>>
{
using type = typename std::is_same<decltype(std::begin(std::declval<U&>())),decltype(std::end(std::declval<U&>()))>::type;

};

public:
using type =  typename prettyprint::_Or< decltype(test<T>(nullptr)),typename CheckBeginEnd<T>::type >::type ;
};
Yes, it looks complicated and littered, but the only change I made was the addition of the CheckBeginEnd struct and its corresponding specialization, as well as changing the conditions for being deemed a container. Stephan's original mechanisms remain the same - make two test() functions, one taking U::const_iterator * and the other taking the last-resort ellipsis. The test() function will be called  nullptr. If the type U has a const_iterator, then nullptr initialization  to a const_iterator* is possible, and will be preferred over the ellipsis. However, if type U does not have a const_iterator, SFINAE will kick in and remove that function from overload resolution, leaving the ellipsis as the only viable candidate. These functions are never defined but their return types, std::true_type and std::false_type,  will be deduced within the decltype() -- this is the essence of Stephan's code.

 The part I added was a CheckBeginEnd struct which uses the void_t technique highlighted by Walter E. Brown in his CppCon2014 talk. void_t has yet to become part of the Standard, but implementing it isn't very difficult. The partially specialized CheckBeginEnd is only used whenever the type U has a valid type deduced whenever nonmember begin() and nonmember end() are invoked upon it. If that is the case, the using-alias then becomes the result of calling is_same on the types deduced for nonmember begin() and end().

This makes the process of determining whether or not a type is a container a bit more general -- it will deem container-ness to types that either have a const_iterator or types that can produce valid and equal types whenever begin() and end() are invoked upon them. You can perhaps imagine a type that itself does not provide an iterator, but rather has member variables such such as a std::vector yet it delegates begin() and end() to the internal std::vector. Such a type can now be printed as a container. Initially, this may appear to be a bad idea because someone may define begin() and end(), yet those functions, for the purpose of their type, has nothing to do with containers. I would argue, however, that if one wishes to incorporate pretty-printing functionality into their code base, they are probably well aware of STL containers and the begin() and end() functions ; misusing them will end up pointing out poor design choices in the aforementioned type. Further elaboration could be done to see if they even return iterator types, but such granularity is not required.

The determination process is handled by an internal _Or structure which uses TMP to determine at compile time, if any of the given expressions are std::true_type. If so, is_container<T>::type will be std::true_type, otherwise it will be std::false_type.

Perhaps the main difference, however, comes from the default printing class akin to Stephan's default_formatter. Stephan's approach overloads 3 functions - prefix(), separator(), and suffix() for each type that should be pretty printed. Each function takes an ostream& and its accompanying type. Thus, calling prefix(),separator(), and suffix()  on certain types will invoke the proper overloads. For example, the 3 functions are provided for an arbitrary type T but are also overloaded for std::set taking arbitrary types, compartors, and allocators (in other words, std::set<T,C,A>). Whenever prefix,separator, or suffix are called and passed a std::set, the overloaded versions will be called.

The only drawback I saw to this approach was that prefix,separator, and suffix had to be overloaded for every type that I wanted to be handled differently from the default "arbitrary T" behavior. What if I wanted all set-like types, that is, all types that mathematically behave like a Set, to be printed in a certain way? There are several of these types within the STL ; std::set, std::multiset, std::unordered_set, and std::unordered_multiset just to name a few,  and this doesn't even account for user-defined set types. With Stephan's approach, I would have to implement 3 functions for each of those types and consequentially repeat various lines of code.

My approach uses a form of tag-dispatching that categorizes a given type T and calls the appropriate overload for that tag. The prefix(), separator(), and suffix() functions are only written once but call the overloaded prefix_impl, separator_impl, and suffix_impl functions that are discriminated by using tags. As a concrete example, all of the set-types that I mentioned above would be given the tag SetType, and the overloaded prefix_impl(), separator_impl(), and suffix_impl() functions that take a SetType-tag will be called. This means that only a single overload of the 3 _impl() functions must be done per tag as opposed to per type.  It is easy to override the categorization process as well  if , for example, you want the unordered/hash versions of std::set to be handled differently. Let's see the code that performs this categorization ( this is only part of the default_formatter, but it is arguably the most important part):

public:


 template<typename T>
 void prefix(std::ostream& os, const T& t) const
 {
  prefix_impl(os, typename TypeCategorizer<T>::category());
 }
 template<typename T>
 void separator(std::ostream& os, const T&) const
 {
  separator_impl(os, typename TypeCategorizer<T>::category());
 }

 template<typename T>
 void suffix(std::ostream& os, const T&) const
 {
  suffix_impl(os, typename TypeCategorizer<T>::category());
 }
Notice how there is only a single prefix, separator, and suffix function. These functions will each call their respective _impl's passing along a tag generated from the TypeCategorizer struct. The TypeCategorizer makes use of extensive template machinery and provides default behavior, but can be easily overridden as we'll see later. The TypeCategorizer performs a series of checks on the given type to determine which category it should belong in. The scheme it uses follows a policy of "most inclusive" meaning it will check the type for unique characteristics that would surely distinguish it as a particular type,then it moves on to more general checks if the previous ones had failed. For the default_formatter that I wrote, this is the TypeCategorizer:



 template<typename T>
 struct TypeCategorizer
 {
  using category =
   std::conditional_t< has_map_types<T>::type::value, MapType,
   std::conditional_t< has_hash<T>::type::value, SetType,
   std::conditional_t< has_key_compare<T>::type::value,SetType,
   std::conditional_t< has_const_iter<T>::type::value, ArbitraryType,
   std::conditional_t< has_pair_types<T>::type::value, PairType, TupleType>>>>>;

 };


std::conditional_t is a shorthand for typename std::conditional<...>::type. It offers compile-time if statements. As with run-time if statements, compile-time ones can be chained as well; which is what was done here. It first checks to see if the type looks like a Map, i.e anything mapping key to value (std::map, std::multimap, and its unordered variants) and will assign category to MapType. Otherwise, if the type has a hasher struct, but is not a map, we can deduce this to be either std::unordered_set or std::unordered_multiset, and thus we tag it as SetType. Ditto for the key_compare in std::set and std::multiset. If our type is not a map nor a set, we can then check to see if it has a const_iterator, making it the "ArbitraryType". Then we have special handling for std::pair and std::tuple which we would like to consider as containers, yet they do not have iterators, begin()/end() functions, etc. I happened to use structs along with SFINAE to build my has_...<T>:: checks (e.g has_hash<T>, has_key_compare<T>, etc,), but overloaded constexpr functions could be used as well. These Type-tags are simple empty structs, very similar to what std::iterator_traits<> does.

struct ArbitraryType{ };
struct SetType{ };
struct TupleType{ };
struct PairType{ };
struct MapType { };



So when prefix()/suffix()/separator() is called with an std::unordered_map<int,int> , for example, TypeCategorizer<T> will deduce it to be MapType and prefix/separator/suffix_impl( ostream&, MapType) is called. The good thing is that this same functionality and behavior will be exhibited whenever either of the functions are called with std::map, std::multimap, std::unordered_multimap, or even user-defined types that have map-like qualities -- all without having to overload the _impl's for all possible map types. Same goes for set-types ; anything that behaves like a set will have the _impl function that takes SetType be called, no additional overloading needed. This also applies to user defined types! Here's an example of the _impl functions:



protected:
void prefix_impl(std::ostream& os, SetType) const
{
 os << "{";
}

void separator_impl(std::ostream& os, SetType) const
{
 os << ", ";
}

void suffix_impl(std::ostream& os, SetType) const
{
 os << "}";
}


void prefix_impl(std::ostream& os, MapType) const
{
 os << "{\n\t";
}

void separator_impl(std::ostream& os, MapType) const
{
 os << ",\n\t";
}

void suffix_impl(std::ostream& os, MapType) const
{
 os << "\n}";
}
The overloads exist on the Tags themselves as opposed to the types!

These functions shouldn't be visible to the user, but we want to be able to inherit them and override them; so we wrap them inside of a protected: scope.

The rest of the code that actually handles printing out containers, and specializing for tuples and pairs is essentially the same as Stephan's, so I won't dwell on that here ; Stephan explains it in a wonderful manner in his video.

I've been stating that changing the default behavior is easy. I will now demonstrate another formatting class, that inherits from default_formatter, yet introduces custom behavior. I'll start with changing default printing behavior.

The default printer prints ArbitraryType-tags with "[" as the prefix, ", " as the separator, and "]", as the suffix. It also does something similar with SetType-tags but instead of "[]" it prints "{}", which represents the mathematical nature of sets.Since the real behavior comes from the _impl functions, our derived class will just need to provide the proper overload.





class special_format : public default_formatting

protected:
       //...
 void prefix_impl(std::ostream& os, ArbitraryType) const
 {
   os << "<";
 }
 void suffix_impl(std::ostream& os, ArbitraryType) const
 {
  os << ">";
 }
     //...
};

The special_format class inherits from the default_formatting class discussed earlier and provides overloads for prefix_impl() and separator_impl() that take ArbitraryTypes. Instead of printing "[,]" for ArbitraryTypes, this will now print "<,>". But wait, where is separator_impl? It's inherited! Since that behavior of printing ", " is what I'd like to continue to do, there is no need to overload it; that function will simply be inherited.

 One subtlety  though is that because all the functions have similar declaration, namely an ostream& followed by a struct/tag, overriding functionality or introducing any new functionality must be done so carefully due to overload resolution. If I only overload prefix_impl and suffix_impl (each of them taking an ArbitraryType-tag, for argument's sake) in the derived class, then calling my derived class functions with any other type (i.e, it isn't tagged as ArbitraryType), will try to convert the differing tag into an ArbitraryType. Clearly, this is not possible. Why is this overload resolution's fault? It first scans the derived class for suitable functions -- it finds one within the derived class and therefore it need not search up the base class. This behavior is undesirable. We want functionality provided in the Base class to be inherited (and ultimately usable) in our derived class should we not provide an override for it. The solution is to include the Base class's functions within this scope, so that overload resolution will consider them. How to go about doing this? 3 lines should suffice.


class special_format : public default_formatting
{
protected:

 using default_formatting::prefix_impl;
 using default_formatting::separator_impl;
 using default_formatting::suffix_impl;
       
        //...

 void prefix_impl(std::ostream& os, ArbitraryType) const
 {
  os << "<";
 }

 void suffix_impl(std::ostream& os, ArbitraryType) const
 {
  os << ">";
 }
       //...
};

3 using-directives will grant us exactly what we're looking for.

The most interesting features however, come from being able to modify the TypeCategorizer and introducing custom types. Let's say we have a type called network_switch that maintains a mapping of MAC addresses to ports.


class network_switch
{

public:
 network_switch(const std::string& str) : switch_name(str){}
 void associate(const std::string& mac_addr, int port)
 {
  mac_address_to_ports[mac_addr] = port;
 }
 const std::string& name() const
 {
  return switch_name;
 }
 const std::unordered_map<std::string,int>& table()
 {
  return mac_address_to_ports;
 }
private:
 std::unordered_map<std::string,int> mac_address_to_ports;
 std::string switch_name;

};

Easy enough.  The network_switch class maintains along with a switch_name, an unordered_map from mac addresses to ports. I would like to pretty-print this type AND maintain some data from the network_switch. The beauty of having a class take care of formatting is, as Stephan mentioned, it can be stateless or stateful -- I can retain information about objects, program state, time, what-have-you,  and simply use this information within the _impl functions. Let's walk through creating a custom formatter that specializes in printing network_switches, but also inherits from default_formatter so that we can also use it for printing other types.


class network_switch_formatting : public default_formatting
{
};


Ok, so we inherit default behavior, but now lets introduce our custom formatting for network_switch types.


class network_switch_formatting : public default_formatting
{
  private:
   std::string switch_name; // lets maintain some info for the switch we use
  protected:
    using default_formatting::prefix_impl;
    using default_formatting::separator_impl;
    using default_formatting::suffix_impl;

    struct NetworkSwitchType{}; // create a custom tag for network_switch
    void prefix_impl(std::ostream& os, NetworkSwitchType) const
    {
        os << "Table for switch " << switch_name << "\n-------------\n\t";
    }

    void separator_impl(std::ostream& os, NetworkSwitchType) const
    {
        os << ",\n\t";
    }

    void suffix_impl(std::ostream& os, NetworkSwitchType) const
    {
        os << "\n-------------\n";
    }

  public:

    template<typename T>
 void prefix(std::ostream& os, const T& t) const
 {
  prefix_impl(os, typename TypeCategorizer2<T>::category()); // TypeCategorizer2... this will be our own categorizer!
 }

 template<typename T>
 void separator(std::ostream& os, const T& t) const
 {
  separator_impl(os, typename TypeCategorizer2<T>::category()); // TypeCategorizer2... this will be our own categorizer!
 }

 template<typename T>
 void suffix(std::ostream& os, const T& t) const
 {
  suffix_impl(os, typename TypeCategorizer2<T>::category()); // TypeCategorizer2... this will be our own categorizer!
 }
    
};

By keeping a variable for switch_name we can provide an interface to users of the network_switch_formatting class to maintain the switch name of some network_switch and then use that within the prefix_impl function that prints NetworkSwitchTypes. We also created a custom tag so that we can discriminate our custom types from default types. The using-statements make sure that the functions that are inherited can properly participate in overload resolution. Then we define _impl functions for NetworkSwitchType-tags. Finally, the public functions use a yet-to-be-defined TypeCategorizer2 struct that will take into consideration network_switches. Let's do that now.


template<typename T>
struct TypeCategorizer2
{
 using is_switch = typename is_network_type<T>::type;
 using category = std::conditional_t<is_switch::value,NetworkSwitchType,typename TypeCategorizer<T>::category>;

};

This bit is interesting; we first check to see if T is a network_switch and either designate NetworkSwitchType or the result of TypeCategorizer<T>. Since this class inherits default_formatter's TypeCategorizer<T> struct, we can provide custom checks, designate custom tags and then delegate the rest of the work to the base class if certain conditions were not met. The is_network_type struct is fairly trivial to do with some help from the type_traits header.


template<typename T>
struct is_network_type
{
 using type = typename std::is_same<std::decay_t<T>,network_switch>::type;
};

As should be evident, "type" will be defined as the result of comparing a decay on T to network_switch. If they're the same type, we get std::true_type, std::false_type otherwise.

Let's wrap it altogether:



class network_switch_formatting : public default_formatting
{
  private:
   std::string switch_name; // lets maintain some info for the switch we use
  protected:
    using default_formatting::prefix_impl;
    using default_formatting::separator_impl;
    using default_formatting::suffix_impl;

    struct NetworkSwitchType{}; // create a custom tag for network_switch

    template<typename T>
    struct TypeCategorizer2
    {
          using is_switch = typename is_network_type<T>::type;
          using category = std::conditional_t<is_switch::value,NetworkSwitchType,typename TypeCategorizer<T>::category>;

    };

    void prefix_impl(std::ostream& os, NetworkSwitchType) const
    {
        os << "Table for switch " << switch_name << "\n-------------\n\t";
    }

    void separator_impl(std::ostream& os, NetworkSwitchType) const
    {
        os << ",\n\t";
    }

    void suffix_impl(std::ostream& os, NetworkSwitchType) const
    {
        os << "\n-------------\n";
    }

  public:

    network_switch_formatting(std::string str) : switch_name(str) {} // constructor

    template<typename T>
 void prefix(std::ostream& os, const T& t) const
 {
  prefix_impl(os, typename TypeCategorizer2<T>::category()); // TypeCategorizer2... this will be our own categorizer!
 }

 template<typename T>
 void separator(std::ostream& os, const T& t) const
 {
  separator_impl(os, typename TypeCategorizer2<T>::category()); // TypeCategorizer2... this will be our own categorizer!
 }

 template<typename T>
 void suffix(std::ostream& os, const T& t) const
 {
  suffix_impl(os, typename TypeCategorizer2<T>::category()); // TypeCategorizer2... this will be our own categorizer!
 }
        
    void set_switch_name(std::string new_name) // provide interface for setting switch names
    {
            switch_name = new_name;
    }
   
        
       
    
};

The basis of the entire code base revolves around TMP ( mainly type-traits) and overriding. By exploiting the power and modularity of compile-time mechanisms such as SFINAE, type_traits, decltype, declval, etc, you can not only leverage this pretty-printer to perform robust type checking while still remaining highly generic, but can also transfer this power to design high performance,generic, and scalable libraries.

To finish this off, let's see an example of this pretty printer in action:




#include "pretty_print.hpp"
int main()
{
 default_formatting df;
 special2 sp2;
 std::vector<std::vector<int>> vvi;
 for (int i = 0; i < 10; ++i)
 {
  std::vector<int> vtor_sub{ i, i + 1, i + 2 };
  vvi.push_back(vtor_sub);
 }
 std::cout << "vector<vector<int>>: " << std::endl;
 print_line(vvi, std::cout, df);
 std::cout << std::endl;

 std::tuple< int, char, std::vector<std::vector<double>>, double > tuple_1{ 1, 'a', { { 4.0, 5.0 }, { 6.0, 7.0 } }, 5.6 };

 std::pair<std::vector<std::vector<int>>, std::vector<double>> pair_1{ { { 5, 6, 7 }, { 8, 9, 10 } }, { 5.6, 7.8, 9.45 } };

 std::pair<int, int> p_int_int{ 5, 7 };
 std::unordered_map<int, std::string> my_map;
 my_map[1] = "Hello!";
 my_map[2] = "goodbye!";
 std::map<int, std::vector<int>> map_int_vec;
 map_int_vec[0] = { 4, 5, 6 };
 map_int_vec[1] = { 7, 8, 9 };

 std::cout << "map<int,vector<int>>: " << std::endl;
 print_line(map_int_vec, std::cout, sp2);
 std::cout << std::endl;


 std::cout << "vector<vector<int>>:" << std::endl;
 print_line(vvi, std::cout, sp2);
 std::cout << std::endl;

 std::cout << "std::string: " << std::endl;
 print_line(std::string("hi!"), std::cout, sp2);
 std::cout << std::endl;

 std::cout << "std::tuple< int, char, std::vector<std::vector<double>>, double >: " << std::endl;
 print_line(tuple_1, std::cout, sp2);
 std::cout << std::endl;

 std::cout << "std::pair<int, int>:\n";
 print_line(p_int_int, std::cout, sp2);
 std::cout << "\n";

 std::cout << "std::pair<std::vector<std::vector<int>>, std::vector<double>>:\n";
 print_line(pair_1, std::cout, sp2);
 std::cout << std::endl; 

 std::cout << "C-style array: " << std::endl;
 int a[5] = { 1, 2, 3, 4, 5 };
 print_line(a, std::cout, sp2);
 std::cout << std::endl;

 std::cout << "forward_list<int>: " << std::endl;
 std::forward_list<int> flist = { 1, 2, 3, 4, 5 };
 print_line(flist, std::cout, sp2);
 std::cout << std::endl;

 std::cout << "list<int>: " << std::endl;
 std::list<int> mylist{2,4,6,8,10};
 print_line(mylist,std::cout,sp2);
 std::cout << std::endl;

 std::cout << "[*] Testing network_switch\n";
 network_switch ns1("SW1-MIAMI");
 ns1.associate("ab-de-3f-45-6b-7f",2);
 ns1.associate("ab-de-3f-39-47-21",10);
 network_switch_formatting ns_formatter(ns1.name());
 print_line(ns1,std::cout,ns_formatter);

}

The "special2" class provides some overridden functions for certain types, to demonstrate the ease at which a custom formatter can be built. Additionally, the categorizer can be changed from a struct to a constexpr function that can be overloaded on several types, instead of having to create "is_a_..." structs and usually relying on constructs such as void_t or SFINAE. Overloading a constexpr function may prove to be a cleaner solution to the categorization problem than a struct.

Here's the output from the code above:


vector<vector<int>>: 
[[0, 1, 2], [1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6], [5, 6, 7], [6, 7, 8], [7, 8, 9], [8, 9, 10], [9, 10, 11]]

map<int,vector<int>>: 
{{
 (0, <4, 5, 6>),
 (1, <7, 8, 9>)
}}

vector<vector<int>>:
<<0, 1, 2>, <1, 2, 3>, <2, 3, 4>, <3, 4, 5>, <4, 5, 6>, <5, 6, 7>, <6, 7, 8>, <7, 8, 9>, <8, 9, 10>, <9, 10, 11>>

std::string: 
"hi!"

std::tuple< int, char, std::vector<std::vector<double>>, double >: 
(1, a, <<4, 5>, <6, 7>>, 5.6)

std::pair<int, int>:
(5, 7)

std::pair<std::vector<std::vector<int>>, std::vector<double>>:
(<<5, 6, 7>, <8, 9, 10>>, <5.6, 7.8, 9.45>)

C-style array: 
<1, 2, 3, 4, 5>

forward_list<int>: 
[[ --> 1 
 --> 2 
 --> 3 
 --> 4 
 --> 5
]]

list<int>: 
<2, 4, 6, 8, 10>

[*] Testing network_switch
Table for switch SW1-MIAMI
--------------
 ("ab-de-3f-39-47-21", 10),
 ("ab-de-3f-45-6b-7f", 2)
--------------

Several types were tested and pretty printed. One neat feature is the printing of C-Style arrays which originally default to TupleType since neither of the type-traits would apply to it. Via partial specialization, the TypeCategorizer struct assigns ArbitraryType to arrays to print them as if they behaved like std::vector.

No comments:

Post a Comment