https://arxiv.org/abs/2404.19756
https://github.com/KindXiaoming/pykan?tab=readme-ov-file
https://news.ycombinator.com/item?id=40219205
This sounds big to me.
I'm not sure exactly how to set the criteria. Perhaps based on https://paperswithcode.com/sota?
Someone who knows more about ML benchmarks please chime in.
These KANs are "just" rebranded symbolic regressors with nonparametric functions taking the place of a bag of functions like "exp", "cos", and so on. The paper is a master class in doing this well, and it is super fun to read. So, the results are not super surprising. We get very high expressive power and good interp, but also the usual pitfalls---this is much more complex than a bunch of matmult, i.e., slow.
They approximate the functions with Bernstein polynomials. These polynomials are defined as a recursion of non-linear functions. Stable algorithms like De Casteljau's run in O(k^2) where k is the degree of the polynomial. So, the action of just one neuron hides a lot of complexity already. And you have to pay the costs both at inference and training time. It is not clear that this can scale easily.
On the bright side, there's lots of work on using splines for computer graphics, so perhaps something can be adapted relatively easily. Or maybe a huge lookup table might do the trick.
Alternatively, someone could swap out splines for something better behaved, like piecewise linear approximations.
PS: I created a market operationalizing ubiquity/SOTA differently, as significant adoption at NeurIPS by 2027 https://manifold.markets/jgyou/will-kolmogorovarnold-networks-kan
Case in point here's someone pointing out how a piecewise linear approximation gives you an MLP back, and thus good scaling: https://twitter.com/bozavlado/status/1787376558484709691
My first impression is, "Universal Approximation Theorem means the fancy learnable function can be replaced by a bunch of simple non-linearity nodes (e.g. ReLU) to give the same result, possibly more efficiently in real life application when considering the additional complexity of the spline thingy". But I may be missing something.
From the author: "although we have to be honest that KANs are slower to train due to their learnable activation functions"
https://twitter.com/ZimingLiu11/status/1785489312563487072?t=oDCxKpId2MY3pfe7O84NBA&s=19from th