Added the LAZY_STACK_PROF #define for Lazy. If enabled lazy will print the (roughly) maximum stack used by any openmp thread over the course of this session.