Will a LLM trained with FP4 have competitive performance in 2 years time?
13
Ṁ1381
2025
22%
chance

"Currently, the technology for 4-bit training does not exists, but research looks promising and I expect the first high performance FP4 Large Language Model (LLM) with competitive predictive performance to be trained in 1-2 years time." (see: https://timdettmers.com/2023/01/16/which-gpu-for-deep-learning/)

Granted, the model must be open source for us to know, so the market will resolve based on publicly available information.

Get Ṁ1,000 play money
Sort by:
predicts NO

Exclusively in FP4? Or does partially in FP4 count. What if the model is on average 60% FP4 over the course of training?

I guess you covered this with "trained in 4-bit (to some extent)"

predicts NO

@NoaNabeshima This is ab post-training precision adjustments

Competitive with what? SOTA with fp16?

predicts NO

This seems important @typedfemale
Will this resolve YES if scaling laws suggest a 4-bit model would be competitive if compute-matched to a SOTA 16-bit model?

predicts NO

(but there isn't a trained SOTA 4-bit model)

@NoaNabeshima Yes, you need to be better than everything else, but be trained in 4-bit (to some extent)

@typedfemale Finetuned w 4 bit would trigger Yes? 80% of parameters in 4 bit would trigger Yes?