Multi-Cores

When More Cores Means Less Speed: Debugging PyTorch with Valgrind on ARM

If you’ve ever tried to debug a PyTorch program on an ARM64 system using Valgrind, you might have stumbled on something really odd: “Why does it take so long?”. And if you’re like us, you would probably try to run it locally, on a Raspberry pi, to see what’s going on… And the madness begins!