I’ve made a version that works in parallel:
The only remaining task is inserting the contributions from a cell into the matrix/vector.