Hi, good question.
I actually measured testpmd performance in a previous part where it was running on the Bluefield (hence, it was in Embedded CPU mode).
Overall the performance was similar to what I measured here: with 64-byte packets, RSS enabled, most of the cores assigned (i.e., the max of 3), the throughput was around 16 Gbps.
Here, I used Separated mode just to avoid one more abstraction and also for investigating the scaling without hardware offload. Using the host CPU cores should be better than using the ARM cores on the Bluefield.
Regarding your first question:
OvS when offloaded gives the best performance, then comes testpmd and then ovs-dpdk. Which is somewhat expected. When something is offloaded to an ASIC, you cannot compete with it with CPU. When it comes to CPU, a huge portion depends on the complexity of the software: and this case testpmd is much lighter than OvS.
Regarding your second question: I don't think anyone would run testpmd in any production environment as it is just a dumb app for testing. On the other hand, running OvS in Embedded mode, i.e., on the Bluefield, and Offloaded to the hardware will give you the best performance.
I know it's a long response but there is something to note: It might be possible that with beefy Xeon cores, running OvS-DPDK on the Host (in separated mode) can achieve better overall performance than OvS offloaded to the Bluefield (in embedded mode), but the aim of the SmartNICs is to free up resources in the host without significantly compromising the performance (and sometimes providing even better one). So, it will always be a trade-off and you have to optimize for your own need.
Thanks for the question, and let me know if you have more :)