I may have made some mistakes regarding the history of finite elements here, but here are my impressions. Please feel free to correct me. Tom Hughes’ book will cover this with far more precision and detail.
I believe the mass and stiffness matrix names come from original work using finite elements for engineering problems discretising the elastodynamics problem,
\rho \frac{\partial^2 \vec{u}}{\partial t^2} - \nabla \cdot \sigma = 0.
Here \vec{u} is some known displacement field, \rho is the density and \sigma is a stress tensor.
The discretisation of the the time harmonic or \theta-scheme formulations of \rho \frac{\partial^2 \vec{u}}{\partial t^2} yields something along the lines of \int_\Omega \rho \vec{u} \cdot \vec{v} \, \mathrm{d}\vec{x}, i.e. the mass component. If \rho is constant this simplifies to \rho \int_\Omega \vec{u} \cdot \vec{v} \, \mathrm{d} \vec{x} such that the finite element formulation \int_\Omega \vec{u}_h \cdot \vec{v}_h \, \mathrm{d}\vec{x} = \mathrm{M} is called the mass matrix.
Similarly, in the simple case that \sigma = k \nabla \vec{u} where we assume k to be constant, the “stiffness” matrix comes from the FE formulation \mathrm{K} = \int_\Omega \nabla \vec{u}_h : \nabla \vec{v}_h \, \mathrm{d} \vec{x}. Here “stiffness” perhaps is thought of as a resistance to bending.
It was popular to distinguish the two because the underlying matrices need only be assembled once if one were to parametrise over constant material coefficients (\rho, k and possibly a time step \Delta t).
Somehow this terminology has made its way into FEM discussions even for problems unrelated (although similar) to the original elastodynamics application. E.g. in electromagnetism \int_\Omega \nabla \times \vec{E}_h \cdot \nabla \times \vec{F}^*_h \, \mathrm{d} \vec{x} could be called a “stiffness” matrix in parlance.
With regards to physical interpretations of the weak/finite element formulations: this is user dependent. My mind doesn’t really see finite elements as more than a discretisation of the weak formulation, which is a projection of the original problem onto a “more convenient” space.
However; some interpretations cast the weak formulation as the minimisation of a manifold’s energy embedded in its hyperdimensional space. E.g. in elasticity this would be the configuration which minimises its potential energy. In fact (I believe) this motivated the naming of the “energy norm” which is commonplace in finite element analysis. Sometimes the test function which arises in such a derivation of the FE formulation is called “virtual work” relating to this minimisation of potential energy. You can see an example with this idea in the old hyperelasticity demo.