
If you’ve ever tried to debug a PyTorch program on an ARM64 system using
Valgrind, you might have stumbled on something really
odd: “Why does it take so long?”. And if you’re like us, you would probably try
to run it locally, on a Raspberry pi, to see what’s going on… And the madness
begins!
That’s what we thought when setting up a BERT-based hate speech classifier.
This was part of a broader experiment using
vAccel, our hardware acceleration
abstraction for AI inference across the Cloud-Edge-IoT continuum.

Kubernetes is the de-facto platform to automate the management, deployment, and scaling of containerized applications. One of its strongest features is its ability to scale, allowing users to customize the system according to their needs. A key mechanism for this extensibility is Custom Resource Definitions (CRD).