Remove many zero entries from assembled sparse matrix or avoid them

Dear PETSc experts,

The following question is related to Removing zeros for matrix - FEniCS Q&A

For calculating explicit Jacobians for geophysical inverse modeling applications, I need to assemble the derivative of the system matrix after hundrets of model parameters (see code below) - the point is I need these matrices more than one time, and re-assembling requires quite some time compared to using the cached ones.

My problem has “n_domains” model parameters, whose derivatives are expressed by the discontinuous parameter function “dg_func”. This “dg_func” contains only zeros except for the cells which belong to the specified domain in each iteration. Accordingly, my assembled matrix dA contains like 99% zero values and only a few thousand non-zero values, but requies a HUGE amount of memory (up to more than 1 GB per matrix).

for di in range(n_domains):
            dg_func = df.Function(S_dg)
            np_vals = np.zeros(len(self.PP.MP.sigma))

            np_vals[di] = 1.
            dg_func.vector().set_local(np_vals[domain_func.array()])
            dg_func.vector().apply('insert')

            dA_form = df.inner(dg_func * self.PP.FE.u_i,
                               self.PP.FE.v_r) * self.PP.FS.DOM.dx_0 + \
                      df.inner(dg_func * self.PP.FE.u_r,
                               self.PP.FE.v_i) * self.PP.FS.DOM.dx_0
            dA.append(df.assemble(dA_form))

I have tested different assembly options, and I assume the sparsitiy pattern of the underlying PETScMatrix is set at the beginning of the assembly. So at the beginning of the assembly process, it is unknown that almost all integrals will results in zero values and afterwards it’s too late to remove the zero entries as the sparsity pattern was already set?

Is that True or are there assembly options to avoid zero entries? I have already tried things like “eliminate zeros” as described in https://fenics.readthedocs.io/projects/ffc/en/latest/_modules/ffc/quadrature/parameters.html, but without any difference.

Alternatively, is there any fast option to extract the non-zero values of the assembled matrices and build a new sparse Matrix with them in PARALLEL?

I have also used the alternative option for incorporating different model domains during assembly (see code below), but this requires quite some time for the initial assembly and the assembled matrices require less, but still a huge amount of memory:

for di in range(n_domains):
            dA_form = df.inner(self.PP.FE.u_i,
                                           self.PP.FE.v_r) * self.PP.FS.DOM.dx(di) ...