Mutli-modal with any input other than plain UTF-8 text regardless output modes. Model must also be 405B in size not 400B
3.2 dropped. "The two largest models of the Llama 3.2 collection, 11B and 90B,". Seems likely if they release a larger multimodal model it'll be in line with these and larger than the text only model.
Description requires "Model must also be 405B in size not 400B".
@Kearm20 how would you resolve a >405B model? No because it's not exactly 405?
@Lun Well what is said is said I suppose. I didn't anticipate a 100 layer multi-modal Llama 3 class model with 90B parameters when the question was created. The 400B provision was created because of early leaks but in the spirit of the original question it would have to be a ~405B, as weights are not exactly whole numbers most of the time, Llama 3 class model to resolve as yes before the end of 2024.
Full transparency I do even have a yes position on this but it simply isn't multimodal.... yet. Hence the wait. @Bayesian
I do as the weights are on hugging face in .safetensors format as well as the paper is out with all the technical details about how this model came into being. Inference code as well was given as an example. Sure it was "gated" but reuploads are not being taken down and we consider MIT or Apache 2.0 software to be "open source" so this is about as open as a model can be period with an even less restrictive license this time around.
@Daniel_MC Considering they had no issue with me jailbreaking their model at the Meta Hackathon this weekend exceeding so. We also got information about the model so I think this is a really strong bet.