distributed-systems.
4 writings found
Latest Archives
Meta's Adaptive Ranking Model: The Real Cost of Serving Trillion-Parameter Ads
Meta scaled ads recommendations to LLM complexity while keeping latency under a second. Here's why their inference trilemma solution matters beyond advertising.
Meta's RCCLX: Why AMD's GPU Communication Stack Just Got Interesting
Meta open-sources RCCLX with Direct Data Access and FP8 collectives for AMD GPUs. A deep look at what this means for multi-GPU AI workloads.
Meta's RCCLX: Why AMD GPU Communication Just Got Interesting
Meta open-sources RCCLX with Direct Data Access and low-precision collectives, potentially reshaping distributed AI workloads on AMD hardware.
Meta Open Sources RCCLX: AMD Gets Serious Performance Boosts for AI Workloads
Meta's RCCLX brings Direct Data Access and low-precision collectives to AMD GPUs, delivering 10-50% speedups for LLM inference on MI300X hardware.
Prev
Page 1 of 1 Next