Would you be willing to speculate a bit on their first proposal to use Josephson Junction fluxonium qubits to implement p-bits ? They seem to drop this idea for now, but it would've been great if we found a better use for superconducting circuits other than NISQ quantum computers.
IMO if you're going to use superconducting electronics, it's probably easier to just use adiabatic flux paramatrons (AFPs), which are quasi-classical. There's a great deal of work out there on superconducting digital logic architectures, which would probably be easier to bend to these purposes than qubits, which you really don't want to use unless you think you're going to get benefits from the coherence properties.
"Instead, their GPU estimates measure the power consumption of very small models on large, powerful hardware; this likely results in extremely poor utilization of the GPU and a lot of wasted energy"
This is false. The estimates for the GPUs were based on a theoretical model that underestimates true energy consumption (basically assumes ideal utilization). This is made extremely clear in appendix E of our paper.
> The empirical estimates of energy were conducted by drawing a batch of samples from the model and measuring the GPU energy consumption and time via Zeus [5].
This will result in an overestimate, for the reasons above -- which is actually in line with the data you showed comparing empirical to theoretical efficiency!
Similarly, dividing the FLOPs/W of the GPU by the total FLOPs of the algorithm will result in a (smaller) overestimate, because all of the high-power, heavy duty I/O circuits, support circuits, and shared control circuits are getting counted as part of the total wattage when they wouldn't be necessary for running small models.
Unfortunately, it's just hard to get "fair" GPU baselines for very small models.
"These architectures use digital pseudorandom number generators (PRNGs), which offer an efficient way to generate numbers with sufficient randomness"
by what definition of extremely efficient?
https://youtu.be/dRuhl6MLC78?si=-m_wWvY95RWjVLub&t=2359
Actually, I make most of your points in this talk. Might be worth a watch for anyone that found this post interesting.
Would you be willing to speculate a bit on their first proposal to use Josephson Junction fluxonium qubits to implement p-bits ? They seem to drop this idea for now, but it would've been great if we found a better use for superconducting circuits other than NISQ quantum computers.
IMO if you're going to use superconducting electronics, it's probably easier to just use adiabatic flux paramatrons (AFPs), which are quasi-classical. There's a great deal of work out there on superconducting digital logic architectures, which would probably be easier to bend to these purposes than qubits, which you really don't want to use unless you think you're going to get benefits from the coherence properties.
"Instead, their GPU estimates measure the power consumption of very small models on large, powerful hardware; this likely results in extremely poor utilization of the GPU and a lot of wasted energy"
This is false. The estimates for the GPUs were based on a theoretical model that underestimates true energy consumption (basically assumes ideal utilization). This is made extremely clear in appendix E of our paper.
https://arxiv.org/pdf/2510.23972
> The empirical estimates of energy were conducted by drawing a batch of samples from the model and measuring the GPU energy consumption and time via Zeus [5].
This will result in an overestimate, for the reasons above -- which is actually in line with the data you showed comparing empirical to theoretical efficiency!
Similarly, dividing the FLOPs/W of the GPU by the total FLOPs of the algorithm will result in a (smaller) overestimate, because all of the high-power, heavy duty I/O circuits, support circuits, and shared control circuits are getting counted as part of the total wattage when they wouldn't be necessary for running small models.
Unfortunately, it's just hard to get "fair" GPU baselines for very small models.