Let's consider a dice with 1 or -1 on its faces, opposites faces adding up to 0.
The point of this post is to relate the quantum coherence of a single qubit to something "weird", i.e., incapable of being reproduced by classical probability.
All the faces of the dice are covered, and we are repeatedly going to uncover and read the top face, and then cover it again. A $\sigma_z$ measurement will do simply this, while a $\sigma_x$ measurement turns the dice 90 degrees to the right, uncovers the top face, reads and covers it, and finally rotates the dice back, but assume that a device (maybe yourself) MUST roll the dice by keeping its vertical axis fix right after covering the face, similar to spinning a top. Note that this protocol tries to reensemble the incompatibility of the two measurements in QM.
The state preparation is what we do to the dice before giving it to someone else.
By simply rolling the dice, performing a $\sigma_z$ or $\sigma_x$ measurement will give a -1 or 1 with half probability, as for the quantum state {{1/2,0},{0,1/2}}.
By rolling the dice and then making sure that the left face has a "1", you will get the same output as if I gave you the quantum state {{1/2,1/2},{1/2,1/2}} for both $\sigma_z$ and $\sigma_x$ measurements!
After this analysis, I see that it is possible to reproduce the quantum superposition in Q information by classical means, which then makes it not "weird" at all. The only "special" feature I see is that nature itself automatically does the covering of the faces and the spinning at the measurements for you, but this is not my point, so...
...without going into non-locality, Bell inequalities and so on (which I understand are truly quantum and needed for Q cryptography), is there anything "weird" in the coherence alone of Q states in Q information that I can use to say "this is not classical" after looking at the measurements (which is all I can look at)? Am I missing something? If not, then I don't understand how a single qubit can be somehow better than a single bit in the same sense as a quantum superposition state can be more powerful than a single (maybe tricked and in a state not prepared by you) dice.
So what you are describing is a hidden variable theory of quantum mechanics, and I believe the answer is ultimately no, there is nothing fundamentally non-classical in a single qubit. You can construct hidden variable theories of quantum mechanics as long as they are non-local, and since there is no notion of locality in a single qubit, that last caveat should not be a concern.
Bell wrote a paper in 1966 on this topic:
The question at issue is whether the quantum mechanical states can be regarded as ensembles of states further specified by additional variables, such that given values of these variables together with the state vector determine precisely the results of individual measurements.
He concluded that indeed such models can be constructed, but that they have a non-local nature:
The curious feature is that the trajectory equations for the hidden variables have in general a grossly nonlocal character.
With regard to a qubit being more powerful than a bit, we must be careful about the comparisons we are making. A hidden variable model of a qubit will have more states than a classical bit, so in that sense already a qubit is greater than a bit, although that is not particularly compelling. Additionally, I do not know of any useful quantum computations that can be performed with a single qubit.
The more interesting question is why are collections of qubits, which presumably as a group should have some non-local but classical hidden variable description, able to perform computations which a classical computer can not? I do not have a comprehensive answer to that question, but there is a paper written by Scott Aaronson which claims that for any hidden variable theory, if you had access to the hidden variables you would have even more computational power than a quantum computer. So it seems in some sense that computational power does not come from non-classicality exclusively. We must remember that computers are physical systems, and when you lose non-locality you change the physical rules under which they operate, and thus what they are capable of doing.