Suppose I have an atom in its ground state $|g⟩$, and it has an excited state $|e⟩$ sitting at an energy $E_a=\hbar\omega_0$ above it. To excite the atom, one generally uses a photon of frequency $\omega$ equal (or sufficiently close to) the transition frequency $\omega_0$, and this will stimulate a transition.
One thing that is often left by the roadside,* though, is the fact that the incoming photon has momentum as well as energy, and that if the atom wants to swallow the energy it also needs to swallow the momentum. So, in the nuts and bolts of states and operators,
how does one describe the transfer of momentum during an atomic transition?
In addition to this, the fact that this recoil momentum is rarely mentioned is a good indication that it is also rarely an issue. Why is it that in most circumstances we can safely ignore the photon's momentum when describing electronic transitions?
*Apart from treatments of Doppler cooling, which simply take the momentum transfers for granted and do not explain how and why they happen.
Introduction
The transfer of momentum gets included properly when one incorporates the motion centre-of-mass $\mathbf R$ of the atom as a dynamical variable. Performing the dipole approximation allows one to treat all the electrons as interacting with some field at the centre of the atom, $\mathbf F(\mathbf R,t)$, but now $\mathbf R$ is an operator on the centre-of-mass degrees of freedom, which means that transition probabilities need to take this into account.
In hand-waving terms, the interaction hamiltonian can be rephrased as $$ \hat H_\mathrm{int}=\mathbf d\cdot\mathbf F(\mathbf R,t), $$ where $\mathbf d$ is some dipole operator which acts on the internal, electronic degrees of freedom, and $\mathbf F(\mathbf R,t)$ is a field operator which depends on $\mathbf R$. Transition probabilities must be taken between an initial state $|\Psi_i⟩=|\chi_i⟩|\psi_i⟩$ which is a joint state of the internal degrees of freedom in state $|\psi_i⟩$ and the centre of mass motion in state $|\chi_i⟩$, and an analogous final state. The total transition probability then includes a spatial-matching factor $$\left\langle\chi_f|\mathbf F(\mathbf R,t)|\chi_i\right\rangle$$ which controls the momentum transfer. Thus, if both $|\chi_i⟩$ and $|\chi_f⟩$ have definite linear momentum and the field is monochromatic, then the field momentum $\hbar\mathbf k$ needs to match, exactly, the momentum difference between the two, or the transition amplitude will vanish.
I provide, below, a more detailed account of this calculation. References are relatively hard to find because they are drowned in a sea of Doppler-cooling papers and textbooks, but SJ van Enk's Selection rules and centre-of-mass motion of ultracold atoms (Quantum Opt. 6, 445 (1994), eprint) gives a good introduction, which I follow below.
Relevance
Before I get down to some nitty-gritty maths, I want to address why it's generally OK to not do any of what follows. Very few introductory textbooks include any of this, and it is rarely a consideration in day-to-day physics, but it is definitely required by energy and momentum conservation. So what gives?
There are two reasons for this.
The first is that the energy changes involved are really not that big to begin with. Consider, for instance, the Lyman-$\alpha$ line of hydrogen, which has a relatively high frequency (and hence photon momentum) and happens on a light atom, so the effect should be relatively strong. The photon momentum feels like it's significant, at $p=m_\mathrm{H}\times 3.3\:\mathrm{m/s}$, but the velocity change it imparts is tiny with respect to the atomic unit of velocity, $\alpha c=2.18\times 10^{6}\:\mathrm{m/s}$.
More importantly, the kinetic energy for the change is small, at $\tfrac1{2m_\mathrm{H}}p^2=55\:\mathrm{neV}$, so it accounts for a fractional detuning of the order of $5\times 10^{-9}$ with respect to the frequency the transition would have if the atom were fixed. This is doable with precision spectroscopy, but you need all of those nine significant figures in your detection apparatus to be able to detect it.
To add insult to injury, the tiny photon pushes are generally drowned out by the comparatively huge fluctuations in the atom's position from its thermal motion. At room temperature, $k_B T\approx 26\:\mathrm{meV}$, which means that the atom's motion, and its accompanying (uncontrolled) Doppler shift will cause a large Doppler broadening that will completely mask the photon recoil. (For hydrogen at room temperature, the effect is a fractional broadening of the order of $10^{-5}$, so the line is still looks narrow, but it's on the order of $30\:\mathrm{GHz}$, compared to the $530\:\mathrm{MHz}$ shift from the photon recoil.)
This is not a problem, though, if you can cool your atoms to a proper temperature. If you can get down to temperatures of the order of $p^2/2mk_B\approx0.64\:\mathrm{mK}$, then the effects will be clearly measurable. Indeed, typically you use the photon recoil to help you cool using Doppler cooling to get there (though that's typically not enough, and you need additional steps of sub-Doppler cooling such as Sisyphus or sideband cooling to finish the job).
On the other hand, all of these challenges have been overcome and observing photon recoil has been more or less routinely possible for forty years or so. Modern high-precision spectroscopy techniques can reach well past 15 or 16 significant figures, and photon recoil is an integral part of the theory and the experimental toolkit.
Nuts and bolts
Consider a bunch of particles of charge $q_i$ and mass $m_i$ at positions $\mathbf r_i$, which are exposed to a radiation field described by the vector potential $\mathbf A(\mathbf r,t)$ in the radiation gauge (so $\nabla\cdot\mathbf A(\mathbf r,t)=0$), and subject to a (translation-invariant) potential $\hat V=V(\mathbf r_0,\ldots,\mathbf r_N)$. The full hamiltonian for the system is given by \begin{align} \hat H &= \sum_i \frac1{2m_i}\left(\mathbf p_i-q_i\mathbf A(\mathbf r_i,t)\right)^2+\hat V \\&= \sum_i\left[\frac{\mathbf p_i^2}{2m_i}-\frac{q_i}{m_i}\mathbf p_i\cdot\mathbf A(\mathbf r_i,t)+\frac{\mathbf A(\mathbf r_i,t)^2}{2m_i}\right]+\hat V \\&= \sum_i\frac{\mathbf p_i^2}{2m_i}+\hat V-\sum_i\frac{q_i}{m_i}\mathbf p_i\cdot\mathbf A(\mathbf r_i,t) +\sum_i\frac{\mathbf A(\mathbf r_i,t)^2}{2m_i}. \end{align} The quadratic term $\sum_i\frac{\mathbf A(\mathbf r_i,t)^2}{2m_i}$ is known as the diamagnetic term and it is generally safe to ignore because it can be eliminated with a trivial gauge transformation within the dipole approximation. (Outside it, you do need to worry about it.)
The main interaction hamiltonian is then $$ \hat H_\mathrm{int}=-\sum_i\frac{q_i}{m_i}\mathbf p_i\cdot\mathbf A(\mathbf r_i,t). $$ (In most cases, this 'velocity gauge' interaction hamiltonian of the form $\mathbf p\cdot\mathbf A$ can be rephrased, via a gauge transformation, to a more familiar $\mathbf r\cdot\mathbf E$-style interaction in the length gauge. However, this isn't really necessary here so I'll stick with the velocity gauge.)
Coordinate transformations
To expose the role of the centre of mass, we transform to the variables $$ \mathbf R=\sum_{i=0}^N\frac{m_i}{M}\mathbf r_i \quad\text{and}\quad \newcommand{\rro}{\boldsymbol{\rho}} \rro_i=\mathbf r_i-\mathbf r_0 \quad\text{for }i=1,\ldots, N $$ with $M=\sum_im_i$, and where the position of the zeroth particle (i.e. the nucleus) drops out as a dynamical variable. The momenta transform as $$ \mathbf P=\sum_{i=0}^Np_i \quad\text{and}\quad \newcommand{\ppi}{\boldsymbol{\pi}} \ppi_i=\mathbf p_i-\frac{m_i}{M}\sum_{j=0}^N\mathbf p_j $$ and the inverse relations read \begin{align} \mathbf r_0&=\mathbf R-\sum_{j=1}^N\frac{m_j\rro_j}{M} & & \mathbf r_i=\mathbf R+\rro_i-\sum_{j=1}^N\frac{m_j\rro_j}{M} \\ \mathbf p_0&=\frac{m_0}{M}\mathbf P-\sum_{j=1}^N\ppi_j & & \mathbf p_i=\frac{m_i}{M}\mathbf P+\ppi_i .\end{align}
The vector potential, finally, can simply be approximated at the centre of mass, so $$\mathbf A(\mathbf r_0,t)\approx\mathbf A(\mathbf r_i,t)\approx\mathbf A(\mathbf R,t).$$ The interaction hamiltonian, then, reads \begin{align} \hat H_\mathrm{int} &= -\frac{q_0}{m_0}\mathbf p_0\cdot\mathbf A(\mathbf r_0,t) -\sum_{i>0}\frac{q_i}{m_i}\mathbf p_i\cdot\mathbf A(\mathbf r_i,t) \\&= -\frac{q_0}{m_0}\left( \frac{m_0}{M}\mathbf P-\sum_{i>0}\ppi_i \right)\cdot\mathbf A(\mathbf R,t) -\sum_{i>0}\frac{q_i}{m_i}\left( \frac{m_i}{M}\mathbf P+\ppi_i \right)\cdot\mathbf A(\mathbf R,t) \\&= \sum_{i>0} \left(\frac{q_0}{m_0}-\frac{q_i}{m_i}\right)\ppi_i\cdot\mathbf A(\mathbf R,t) \end{align} for a neutral system.
Transition amplitudes
This is really all that one needs. The transition probability from an initial state $|\Psi_i⟩$ to a possible final state $|\Psi_f⟩$ can be simply read as $$ ⟨\Psi_f|\hat H_\mathrm{int}|\Psi_i⟩, $$ with some more subtleties if one wants to be rigorous with the time evolution, and derive e.g. Fermi's golden rule.
If the centre of mass is being held fixed in space, then all that matters is the atomic dipole moment, which for this interaction hamiltonian reads $$ \sum_{i>0}\left(\frac{q_0}{m_0}-\frac{q_i}{m_i}\right)⟨\psi_f|\ppi_i|\psi_i⟩, $$ taken between internal states $|\psi_i⟩$ and $|\psi_f⟩$; this is then dotted with the fixed vector potential $\mathbf A(\mathbf R,t)$ to give the transition rate.
For a dynamical centre of mass, though, which starts off in the state $|\chi_i⟩$ and which we're probing for at the state $|\chi_f⟩$, the full transition probability reads $$ \sum_{i>0}\left(\frac{q_0}{m_0}-\frac{q_i}{m_i}\right)⟨\psi_f|\ppi_i|\psi_i⟩ \cdot ⟨\chi_f|\mathbf A(\mathbf R,t)|\chi_i⟩. $$
Here the matrix element $⟨\chi_f|\mathbf A(\mathbf R,t)|\chi_i⟩$ directly controls the absorption of one quantum of momentum into the centre-of-mass state. To get full momentum conservation, you should really consider an example with a monochromatic field, $$\mathbf A(\mathbf R,t)=\mathbf A_0\cos(\mathbf k\cdot\mathbf R-\omega t),$$ so the field gives a well-defined momentum contribution, and with initial and final states that have definite momenta $\mathbf k_i$ and $\mathbf k_f$ respectively - i.e. plane waves with those wavevectors. The matrix element then reads \begin{align} ⟨\chi_f|\mathbf A(\mathbf R,t)|\chi_i⟩ &= \mathbf A_0 \int\frac{\mathrm d\mathbf R}{(2\pi\hbar)^3} e^{i(\mathbf k_i-\mathbf k_f)\cdot\mathbf R/\hbar}\cos(\mathbf k\cdot\mathbf R-\omega t) \\&= \frac12\mathbf A_0\left( \delta(\mathbf k_i-\mathbf k_f+\mathbf k)e^{-i\omega t} + \delta(\mathbf k_i-\mathbf k_f-\mathbf k)e^{+i\omega t} \right). \end{align} In a quantized-field picture, the first, positive-frequency term becomes an annihilation operator which subtracts one photon from the field and adds $\hbar\mathbf k$ momentum to the centre-of-mass motion, and the second term becomes a creation operator which emits one photon while eliminating $\hbar\mathbf k$ momentum from the atom's motion. If you're using a classical field with quantized matter, the rotating-wave approximation will typically require you to keep only the first term for absorption and only the second term for emission, with the corresponding effects on the centre-of-mass momentum.
Energy
Finally, what about the kinetic energy? Naively, the photon energy should ideally be slightly higher than the transition energy to account for the increase in the centre-of-mass kinetic energy (this forgets that the laser can also slow the atom down if it's flying into the laser and the laser is redshifted, but it's all the same, really). How does one account for this?
In fact, you'll notice that I haven't spoken at all about energy considerations, and I certainly haven't imposed any relation between the initial and final internal states and the atomic hamiltonian. As it turns out, the external motion gets treated in exactly the same way.
At the start, I split up the hamiltonian into an atomic and an interaction part: $$ \hat H = \sum_i\frac{\mathbf p_i^2}{2m_i}+V(\mathbf r_0,\ldots,\mathbf r_N)-\sum_i\frac{q_i}{m_i}\mathbf p_i\cdot\mathbf A(\mathbf r_i,t) =\hat H_\mathrm{at}+\hat H_\mathrm{int} $$ (For a quantized field, you'd also need to include a field hamiltonian, of course.) Now the atomic hamiltonian as stated is a function of the individual coordinates, but ideally we want to rephrase it in terms of the internal plus centre-of-mass coordinates. This then gives $$ \hat H_\mathrm{at} =\frac{\mathbf P^2}{2M} +\left[ \sum_{i>0}\frac{\ppi_i^2}{2\mu_i}+\sum_{i\neq j>0}\frac{\ppi_i\cdot\ppi_j}{2m_0} + V(\mathbf 0,\rro_1,\ldots ,\rro_N) \right] =\hat H_\mathrm{COM}+\hat H_\mathrm{el}. $$ The kinetic energy of the centre of mass is directly accounted for, and the internal hamiltonian $\hat H_\mathrm{el}$ is what we actually diagonalize when we find the electronic eigenstates. (Here $\mu_i=(m_i^{-1}+m_0^{-1})^{-1}$ is the $i$th reduced mass, and the cross kinetic terms are generally suppressed by the large nuclear mass $m_0$.)
More importantly, though, if we want to say that the system went from a state of definite energy to another state of definite energy by absorbing a photon, then it needs to go from one eigenstate to another of the full atomic hamiltonian $\hat H_\mathrm{at}$, and this includes the centre-of-mass degree of freedom. The photon energy then needs to account for the change in energy in the whole thing, not just the electronic transition.