1 writing found
Meta's RCCLX brings Direct Data Access and low-precision collectives to AMD GPUs, delivering 10-50% speedups for LLM inference on MI300X hardware.