I assume that entanglement emerges from quantum mechanics because the idea was around before experimental verification (e.g the EPR paper). How then does entanglement emerge from the theory (please provide a less technical answer if possible)
Entanglement is simply a particular kind of quantum multiparticle state: it happens to be the "most common" kind of state in the sense that if you choose a random quantum superposition from a multiparticle state space, it will almost surely be (in the measure-theoretic sense) entangled, so it's a little curious why entanglement takes some effort to observe in the laboratory.
The technical details, through a simple example. We think of several quantum "particles", each with three-dimensional quantum state spaces: let's take two of them. Let's number each individual particle basis states $1,\,2,\,3$, so a general superposition for one particle is the vector $\alpha\left.\left|1\right.\right>+\beta\left.\left|2\right.\right>+\gamma\left.\left|3\right.\right>$.
The quantum state space of the combined system has nine, not six, basis states. Let $\left.\left|j,\,k\right.\right>$ stand for the basis state where the first particle is in basis state $j$, the second in basis state $k$. You should be able to see that there are nine such basis states: $\left.\left|1,\,1\right.\right>,\,\left.\left|1,\,2\right.\right>,\,\left.\left|1,\,3\right.\right>,\,\left.\left|2,\,1\right.\right>,\,\left.\left|2,\,2\right.\right>,\,\cdots,\,\left.\left|3,\,3\right.\right>$.
Some states are factorisable, that is they can be written in the form $\psi_1\otimes\psi_2$ where $\psi_1$ and $\psi_2$ are individual particle quantum states. So, let $\psi_1=\alpha\left.\left|1\right.\right>+\beta\left.\left|2\right.\right>+\gamma\left.\left|3\right.\right>$ and $\psi_2 = a\left.\left|1\right.\right>+b\left.\left|2\right.\right>+c\left.\left|3\right.\right>$. Then, on noting that in our notation above we have $\left.\left|j\right.\right>\otimes\left.\left|k\right.\right>\stackrel{def}{=}\left.\left|j,\,k\right.\right>$
$$\psi_1\otimes\psi_2 = \alpha\,a\,\left.\left|1,\,1\right.\right>+\alpha\,b\,\left.\left|1,\,2\right.\right>+\cdots+\gamma\,b\,\left.\left|2,\,3\right.\right>+\gamma\,c\,\left.\left|3,\,3\right.\right>\tag{1}$$
The point about this state is that if we measure particle 2 and force it into its base state say $\left.\left|2\right.\right>$, then we know that the particle 1 must be determined by the part of the superposition in (1) that contains only basis vectors of the form $\left.\left|j,2\right.\right>$, because we know particle 2 is in state 2. So, from (1), the system must be in state $\psi_1\otimes\left.\left|2\right.\right>$, i.e. our knowledge about particle one has not changed with our measurement. Our measurement tells us nothing about particle 1, so particle 1 is independent of particle 2.
Now let's choose any old superposition: let's choose:
$$\frac{1}{\sqrt{2}}\left.\left|1,\,1\right.\right>+\frac{1}{\sqrt{2}}\left.\left|2,\,2\right.\right>\tag{2}$$
and now let's measure particle 2. If our measurement forces particle 2 into state $\left.\left|2\right.\right>$, then from (2) we know particle 1 is in state $\left.\left|2\right.\right>$, because the only term in (2) with particle 2 in state 2 has particle 1 in state 2. Likewise, if our measurement forces particle 2 into state $\left.\left|1\right.\right>$, then we know for the same reasons that particle 1 must be in state $\left.\left|1\right.\right>$. Our measurement of particle 2 influences particle 1.
I would advise you to work through this example yourself in detail. You will then understand the following: measurement of particle 2 influences the state of particle 1 if and only if the initial state of the two particle system is not factorisable in the sense above.
So you can see that entanglement is a natural theoretical consequence of the tensor product, which in turn is really the only plausible way one would expect many particle systems to behave. Experiment has reproduced and confirmed this theoretical behaviour.
You are right insofar that that entanglement was theoretically foretold and discussed in the EPR paper and also by Schrödinger shortly afterwards. Our word "entanglement" was Schrödinger's own translation of his name for the phenomenon, "Verschränkung".
NOTE: educators who use two two-dimensional spaces to illustrate the tensor product deserve to be cut up into teeny-tiny little bits and be forced to do the total time that all their students they have wasted in dead-end understanding in purgatory, or at least some way bad place if you don't believe in purgatory