"Cloud" is a boring answer. User base of interest is somewhere between hobbyists with a budget and companies with a couple of self-hosted racks.
News:
people eyeballing Mac Studio Thunderbolt clusters on X
yet more progress on clustering in llama.cpp
My bet is locally on Apple CPU / GPU (by whatever name called).
And since this will still be expensive, the rest will run in a datacenter on server class GPU/inference chips (not sure what those look like as yet).
* Apple will find a way to compress/store weights on firmware such that you can work with say 64Gb RAM.
@MingweiSamuel Whatever looks Pareto-dominant based on vibes from Twitter, /g/, and r/LocalLLaMa. For example current 70B meta looks like multiple gaming GPUs or Apple unified memory with very rare DIY-adaptered A100 frankenracks.
If community doesn't settle on viable 405B solutions by EoY everything gets a NO.