Broadcasting error when running in parallel

I am receiving the following error when running with the command ``mpirun -n 3 python3 script.py
ValueError: operands could not be broadcast together with shapes (152510,4) (53972,4)

I understand the nature of the error as the shapes of the two numpy arrays are not the same. The computations run correctly in parallel however the question is, how do I make sure the array “conc” includes the data from all 3 processes, the last line of the attached script. I’ve attached a significantly redacted version of the script to just highlight the error.

from dolfin import *
import numpy as np

# Load the MeshFunction
mesh = UnitSquareMesh(10,10)
nv = mesh.num_vertices()
C = FunctionSpace(mesh, 'Lagrange', 2)

# Parameters
DBT = np.zeros(2)
DSPAS = np.zeros(5)
DPPAS = np.zeros(3)
len_DBT = DBT.shape[0]
len_DSPAS = DSPAS.shape[0]
len_DPPAS = DPPAS.shape[0]
DBT[0] = 0.2
DBT[1] = 0.24
DSPAS[0] = 50
DSPAS[1] = 100
DSPAS[2] = 140
DSPAS[3] = 200
DSPAS[4] = 250
DPPAS[0] = 200
DPPAS[1] = 500
DPPAS[2] = 1000
dt = 1.04
T = 83.2
time_pts = 8

# Load concentration data
conc_m = np.zeros((121, 10))

for k in range(len_DBT):
    for l in range(len_DSPAS):
        for m in range(len_DPPAS):
            c = Function(C)
            t = dt
            time_pt = 0
            conc = np.zeros((nv, time_pts))

            # Time stepping loop
            while t <=T+.0000000001:
                if (t<10.5 and t>10.3):
                    vertex_values_c = c.compute_vertex_values(mesh)
                    conc[:, time_pt] = vertex_values_c[:]
                    time_pt += 1
                t += dt

            len_conc = conc_m.shape
            print('Comp: ', np.sqrt((conc_m[:,0:4]-conc[:,0:4])**2/len_conc[0]))

Please reduce this to a minimal example that is reproducible. This implies that anyone should be able to copy-paste the code and run it without modification. To me, it seems like you are trying to do the following:

  • Create a distributed mesh
  • Create a function c
  • Compute vertex values of the function
  • Do some linear operation on the vertex data

All these steps should be possible to show without having custom input data.
For starters, use a built in mesh, i.e. mesh=UnitSquareMesh(10,10).
Secondly, remove all for-loops that are not required to reproduce the error. As this error should not be time dependent, or dependent of your len_DPPas, I suggest you remove these loops.

@dokken I’ve u updated the script to be agnostic. Script successfully runs when running in serial however when running in parallel, we get the broadcasting error.

Please note that you can reduce your problem even further:

from dolfin import *
import numpy as np
mesh = UnitSquareMesh(10,10)
nv = mesh.num_vertices()
C = FunctionSpace(mesh, 'Lagrange', 2)

conc_m = np.zeros((121, 10))
c = Function(C)
time_pt = 0
conc = np.zeros((nv, 1))

vertex_values_c = c.compute_vertex_values(mesh)
conc[:, time_pt] = vertex_values_c[:]
time_pt += 1

len_conc = conc_m.shape
print(MPI.comm_world.rank, conc_m.shape, nv)
print('Comp: ', np.sqrt((conc_m[:,0:4]-conc[:,0:4])**2/len_conc[0]))

The key here is that the mesh is distributed when you are using multiple processors. This means that each processor does not have all the data of the mesh, function space, and the function values, as you observe by printing nv running in serial and with for instance 2 processes. If you want to store all data on a single process, you need to use MPI operations to gather the data.

Thanks for the help. For example, are you saying if the rank is not 0, I need to gather nv to properly perform the computations?

It depends on if you can perform the computations locally on each process or not.
If you can perform the operation locally, it will be faster (and then only gather the results when writing them as output…
If you have to get information about vertex values from other processes, you need to communicate the data required.

Understood. The conc_m array is constant and populated by known data. The conc array is generated through the simulations. The two arrays are then needed to compare for results. There are around 30 simulations ran and each time a new conc is generated and populated with the vertex values and then used to compare against conc_m and then written to output.

Essentially, the computations need to be ran in parallel, running in serial can take upwards of 30 hours. Once the computation is done, I need all the data in conc across all processes to compare the results with conc_m.

Please note that you have an assumption of the vertex ordering when you read in conc_m from file.
When running in parallel, the data is distributed, and the vertices might not be ordered in the same way as in serial.
This is easily illustrated with the following:

from dolfin import *
mesh = UnitSquareMesh(3,3)
coordinates = mesh.coordinates()
global_indices = mesh.topology().global_indices(0)
for i in range(len(global_indices)):
    print("Rank:", MPI.comm_world.rank, "Local index: ", i, " Global index: ", global_indices[i], " Coordinate: ", coordinates[i])

Run this in serial and with two processes and you will see the local ordering, global ordering and the coordinates

Makes sense. Is there a way to make sure the vertices are ordered correctly or will I have to stick running in serial? Thanks. Also, for curiosity, what is the proper way to gather the data even if the array is out of order.

I would rather put the focus on gathering parallel data so you can post process it in serial. You simply have to make a map between your input data, and the local indicies on the process, which is not very complicated. You can use MPI operations such as allgather to collect data (global indices from each process + function values and add them).