I think the default solver when running in serial is UMFPACK which is notorious for a very low memory limit as default. Running in parallel will select MUMPS by default, which should use all memory available.
With regards to solving larger problems, you should consider iterative solvers, e.g. here cf. here.
BoxMesh is generated on a single process and then distributed to all other processes. It looks like you’re generating an extremely large (in terms of number of elements 100^3) mesh for a single process.
You should either:
Generate a coarser mesh, distribute to all processes, then refine that mesh, e.g.
where the redistribute option indicates whether to rebalance the mesh across all processes. Use or don’t use that as you require. If you’re performing uniform refinement everywhere, you probably don’t need to redistribute. You can also refine the mesh as many times as you like. Figures 3 & 4 of this give some indication of what to expect in terms of mesh quality.
Read in a mesh from a scalable I/O scheme, e.g. XDMFFile/HDF5.