David Bohm in Section (4.5) of his wonderful monograph Quantum Theory after defining the usual density probability function $P(x,t)=\psi^{*} \psi$ for the Schrödinger equation for the free particle in one dimension: \begin{equation} i \hbar \frac{\partial \psi}{\partial t}= - \frac{\hbar^2}{2m} \frac{\partial^2 \psi}{\partial x^2}, \end{equation} states that $P(x,t)$ is the unique function of $\psi(x,t)$ and the partial derivatives of $\psi$ with respect to $x$ all computed in $(x,t)$ which satisfies the following properties:
P is never negative;
the probability is large when $|\psi|$ is large and small when $|\psi|$ is small;
the significance of $P$ does not depend in a critical way on any quantity which is known on general physical grounds to be irrelevant: in particular this implies (since we are dealing with a nonrelativistic theory) that $P$ must not depend on where the zero of energy is chosen;
$\int P(x,t) dx$ is conserved over time, so that by eventually normalizing $P$ we can choose $\int P(x,t) dx=1$ for all $t$.
Bohm gives no mathematical argument at all and actually the statement seems completely unjustified to me.
Does someone know some reason why it should be true?
NOTE (1). Since the Schrödinger equation is a first-order equation, the time evolution of $\psi$ is fixed given the initial state $\psi(x,0)$: this is the reason why we require that the probability $P(x,t)$ depends on the state at time $t$, that is $\psi(x,t)$, and its spatial derivatives. To be explicit, the requirement that $P(x,t)$ is a function only of $\psi(x,t)$ and the partial derivatives of $\psi$ with respect to $x$ all computed in $(x,t)$ means that there exists a function $p$ such that $P(x,t)=p\left(\psi(x,t),\frac{ \partial \psi}{\partial x}(x,t),...,\frac{\partial^m \psi}{\partial x^m}(x,t)\right)$.
NOTE (2). The one given above is not a mathematically rigorous formulation of the problem, but the one originally given by Bohm. So we can feel free to attach a rigorous mathematical meaning to the different properties. In particular, as for property (iv) we can formulate it in a different (and not equivalent) mathematical form, by requiring that a local conservation law holds, in the sense that there exists a function $\mathbf{j}$, such that, if we put $\mathbf{J}(x,t)=\mathbf{j}\left(\psi(x,t),\frac{ \partial \psi}{\partial x}(x,t),...,\frac{\partial^m \psi}{\partial x^m}(x,t)\right)$, we get \begin{equation} \frac{\partial P}{\partial t} + \nabla \cdot \mathbf{J} = 0. \end{equation}
NOTE (3). Similar questions are raised in the posts Nonexistence of a Probability for Real Wave Equations and Nonexistence of a Probability for the Klein-Gordon Equation. Presumably Bohm had in mind the same kind of mathematical argument to tackle these three problems, so the real question is: what mathematical tool did he envisage to use? Maybe some concept from classical field theory or the theory of partial differential equations?
Bohm's assumptions are not mathematically precise, so you must attach a mathematical interpretation to them (especially statements $2$ and $3$). Since you have not done so yourself, I will try to interpret them in a way that I find reasonable.
Definition for $P$: We will require that the probability density $P_\psi(x,t)$ of any smooth function $\psi$ be a local function of its partial derivatives at $(x,t)$.
More formally, let $\psi(x,t)$ be a smooth function and let $j^\infty_{x,t}\psi$ denote the infinite jet prolongation of $\psi$ at $(x,t)$, i.e., its formal Taylor series expansion about $(x,t)$. Then we can write $$P_\psi(x,t) = p(j_{x,t}^\infty\psi),$$ for some function $p$ defined on the jet bundle. In terms of regularity, we will require $p$ to be continuous.
This is essentially the definition you proposed for $P$. In fact, this is already problematic since we will necessarily have to work with wavefunctions which are not smooth. Therefore it is already problematic to require that $P$ depend on higher derivatives since they are not guaranteed to even exist. Nevertheless, we will allow arbitrary dependence on higher partials and apply the consistency conditions for $P$ evaluated on smooth functions. We can then recover $P$ uniquely for arbitrary $L^2$ functions by continuity.
Assumption 1: The function $p$ is a real-valued and non-negative.
Assumption 2: The probability density $P_\psi(x,t)$ is a non-decreasing function of $|\psi(x,t)|$, i.e., $$P_{\psi_1}(x,t) \ge P_{\psi_2}(x,t) \iff |\psi_1(x,t)| \ge |\psi_2(x,t)|.$$
Assumption 3: The function $p$ is invariant under global phase, i.e., $$p(e^{i\theta} j^\infty_{x,t}\psi) = p(j^\infty_{x,t}\psi).$$
Assumption 4: If $\psi(x,t)$ is a normalized function, then $P_\psi(x,t)$ is likewise a normalized function.
Let us go through the assumptions one by one. Assumption $1$ is relatively straightforward as probability densities must be real and non-negative.
Property $2$ is, in my view, the most difficult to properly interpret. The way I have interpreted the property in Assumption $2$ is to say that the magnitude of the probability density at a point is directly reflective of the magnitude of the wavefunction at that point. This is what I feel to be the most direct transcription of Bohm's second property.
This assumption is in fact extremely strong, and it necessarily implies that $p$ is independent of all derivatives of $\psi$. This is essentially because the value of a smooth function and all of its derivatives can be independent prescribed at any point. This was already pointed out by @Kostas.
Lemma: Suppose that $p(j^\infty_{x,t}\psi)$ is a continuous non-decreasing function of $|\psi(x,t)|$. Then $p$ is independent of all derivatives of $\psi$, i.e., $$p(j^\infty_{x,t}\psi) = p(j^0_{x,t}\psi) = p(\psi(x,t)).$$
Proof: By Borel's theorem, given any complex sequence $(a_{n,m})_{n,m=0}^\infty,$ and any point $(x,t)$, there exists a smooth function $\psi$ such that $$\frac{\partial^{n+m}}{\partial x^n \partial t^m}\psi(x,t) = a_{n,m}.$$ Therefore we are able vary the individual entries of the Taylor series completely independently.
Suppose that $p$ is not constant on some partial $\partial_i \psi$. Then by Borel's theorem we can find smooth functions $\psi_1$ and $\psi_2$ such that all Taylor coefficients of $\psi_1$ and $\psi_2$ agree at $(x,t)$ except for $\partial_i$. Then $$p(j_{x,t}^\infty\psi_1) \neq p(j_{x,t}^\infty\psi_2),$$ and without loss of generality we may assume that $$p(j_{x,t}^\infty\psi_1) > p(j_{x,t}^\infty\psi_2).$$ Next, we can find some other smooth function $\psi_3$ which agrees with $\psi_2$ for all Taylor coefficients at $(x,t)$ except for the constant term, $\psi_3(x,t)\neq \psi_2(x,t)$. By continuity, we can choose $\psi_3(x,t)$ slightly larger than $\psi_2(x,t)$ but still such that $$p(j_{x,t}^\infty\psi_1) > p(j_{x,t}^\infty\psi_3).$$ Therefore we have $$|\psi_1(x,t)| = |\psi_2(x,t)| < |\psi_3(x,t)|,\ \ \ \ \text{and}\ \ \ \ \ p(j_{x,t}^\infty\psi_1) > p(j_{x,t}^\infty\psi_3),$$ in contradiction to the monotonicity assumption. $\square$
Therefore we will assume that $P_\psi(x,t) = p(\psi(x,t))$ from now on. Note that at this point, we no longer need the assumption that $\psi$ is smooth.
Assumption $3$ is also essentially just a direct transcription of Property 3. If we change the energy, we effectively change the wavefunction by a global phase factor $e^{iEt}$. Since we can always make the wavefunction real and positive at any given point $(x,t)$ by an appropriate choice of a global phase factor, it follows that $p$ is independent of the phase of $\psi(x,t)$, i.e., $$p(\psi(x,t)) = p(|\psi(x,t)|).$$
Note that this assumption is actually completely unnecessary. We could've deduced the above equation by a slightly modification of our lemma using Assumption $2$. I will keep it just for the sake of completeness.
Finally, we come to Assumption $4$. Bohm's statement for his Property $4$ is that the probabilities should be normalized at all times, namely for all $t$ we should have $$1 = \int_\mathbb{R} P(x,t)\ dx.$$
This has certain ambiguities however. Which time-evolution should we use? Naively, any self-adjoint operator $H$ with a spectrum which is bounded below (so that there is a lowest energy level) should be able to serve as a valid Hamiltonian. If we require that the assignment $\psi \mapsto P_\psi$ be be universally valid, i.e., Hamiltonian independent, then we must require that $P(x,t)$ be normalized with respect to the unitary evolution generated by any Hamiltonian.
It can be shown that given any unitary $U$, there exists some admissible (self-adjoint, bounded below) Hamiltonian $H$ such that $U = e^{iH}$. In fact, we do not even need to consider the set of all admissible Hamiltonians, but rather just the set of all bounded Hamiltonians due to the following theorem.
Theorem: Let $\mathcal{H}$ be a Hilbert space and let $U$ be any unitary operator on $\mathcal{H}$. Then there exists a bounded self-adjoint operator $A$ (with norm at most $\pi$) such that $$U = e^{iA}.$$
Proof: This is a simple consequence of the Borel functional calculus for bounded operators applied to the principal branch of the logarithm. See here for a complete proof. $\square$
Now, let $\psi_1(x)$ be some normalized wavefunction. Let us assume without loss of generality that $P$ is normalized so that $$1 = \int_\mathbb{R} P_{\psi_1}(x)\ dx.$$ Let $\psi_2(x)$ be some other arbitrary normalized wavefunction. Let $U$ be any unitary such that $\psi_2 = U\psi_1$. Then there exists some bounded Hamiltonian such that the time-evolution brings the initial state $\psi(x,t=0) = \psi_1(x)$ to $\psi(x,t=1) = \psi_2(x)$. This means that we must have $$1 = \int_\mathbb{R} P_{\psi_1}(x)\ dx = \int_{\mathbb{R}} P_{\psi_2}(x)\ dx.$$ Since $\psi_2$ was an arbitrary normalized function, it follows that $P_\psi$ is normalized for all normalized $\psi$. We take this as our Assumption $4$.
Physically, this assumption is essentially saying that we should be able to vary the potential of the Hamiltonian so as to drive any normalized wavefunction arbitrarily close to any other normalized wavefunction. Since $P$ is conserved under this evolution, it must be normalized given any normalized wavefunction.
Note that this implies that we must have $p(0) = 0$. Otherwise $p(0) > 0$ will give a divergent integral for any normalized compactly supported function $\psi$.
Now let $y>0$. Define $\psi_y(x,t)$ to be equal to $1/\sqrt{y}$ for $x\in (0,y)$ and zero elsewhere. Then we have $$\int_\mathbb{R} |\psi_{y}(x,t)|^2\ dx = 1 = \int_\mathbb{R} p(|\psi_y(x,t)|)\ dx = \int_0^y p(1/\sqrt{y})\ dx = yp(1/\sqrt{y}).$$ Therefore we must have $$p(|\psi(x,t)|) = |\psi(x,t)|^2.$$
This is the desired claim. Of course, you might disagree with how I've interpreted some of Bohm's statements. But as you've said yourself in the question, some rigorous definitions must be assigned to these physical properties. These are simply what I felt to be the most faithful.