kinematics - Extracting the 3D coordinates of a moving object from a video

Sunday, 25 February 2018

kinematics - Extracting the 3D coordinates of a moving object from a video

Take a look at these two pictures, which are stills from a video which demonstrates magnus effect in football:

I want to extract the coordinates of this ball in 3D space from this video. These are the steps I intend to use:

The ball is initially 1 m away from the camera. I can use this information to calculate the distance from camera in the later frames. (with it's angular diameter)

A football is 22 cm across. This can be used to calculate a quantity which I'm calling anglePerPixel(which is 22/100/. It can be used to calculate the angle of elevation of the ball from the horizon.

Imagine a plane perpendicular to the ground and the camera direction, which cuts the camera view in two equal parts. It will appear as a line in the camera view. We can measure perpendicular distance of the ball from this plane in ball units, by measuring how many footballs we can fit between this plane and our football.

These 3 independent coordinates could be used to calculate and plot the path of this ball, i.e., if this procedure was correct, which it isn't.

I'm confident that the first step is correct. The second step yields incorrect results(about half of the expected value). The third step also looks correct to me.

How do I fix the second step? (and any mistake in the other two steps, if there's any)

Edit:

It's possible to use the method of second step to calculate the elevation as well, but it won't be very accurate since the camera is about 30 cm above ground and is aimed about 3-4 degrees above the horizon.

Maybe we could calculate the position of ball relative to the direction of the camera (instead of the ground) and try to translate it once it's done.

Answer

If you are looking for geometric accuracy, your approach is quite vague and possibly not completely accurate. I also have some doubts about your initial assumptions.

Let's say indeed, the ball's diameter is $22$ cm and the distance, along the straight line on the ground, connecting the camera's position to the place where the ball touches the ground, is $100$ cm. From what I am seeing on the first photo, the camera is not $30$ cm above ground. Otherwise it would be looking at the ball (which is $22$ cm in diameter) from above (because $22 < 30$ ), while in fact, we see that the camera is looking at it a bit from below. More precisely, from what I am seeing, the camera's lower edge is placed almost on the ground and the camera is slightly tilted upwards. So I am going to assume that.

Geometrically the situation is more complex than you assume, because the ball, being originally a sphere, projects onto the photo as an ellipse. I seem to be measuring the height of the ball in the first photo at something like $3.7$ cm or so, while the horizontal width seems to be something like $3.65$ cm or so. However, I am going to make a simplification, otherwise things are much more complex. I am going to assume that when we are making our calculations, the 3D ball can be represented by a disk of diameter $22$ cm always facing the camera fully, i.e. the plane of the disk representing the ball is parallel to the screen $s$ . I repeat, this is not completely true, this is more of an estimate. Consequently, the results you are going to get are reasonable (I hope :) ) estimates. Intuitively, it seems to me that this approximation leads to a very small error.

First of all, for the most standard camera models, the geometric representation of a camera is a pair, consisting of a point $O$ (point of observation) and a screen (a plane $s$ ) not passing through $O$ . Let $O_s$ be the orthogonal projection of $O$ onto $s$ .

Basically, the representation of the three dimensional world onto the two dimensional screen $s$ is obtained by connecting the observation point $O$ to any other point $P$ from the three dimensional space. The intersection point $P_s$ of the straight line $OP$ with the screen $s$ is the two dimensional projection (2D image) of $P$ onto the the screen $s$ .

To be able to calculate relationships between 3D objects and their 2D images, you would need several important parameters of the camera representation. First, you would need to know the location of the point $O_s$ , the orthogonal projection of $O$ onto $s$ . And second, you would definitely need the distance $d = |OO_s|$ between $O$ and the plane $s$ . So to be able to carry out calculations like the ones you want, you need to be able to find the location of point $O_s$ and to find somehow $d$ .

Part 1. By knowing the initial position of the ball and teh camera on photo 1 and by measuring some lengths on the photo, you can deduce the location of $O_s$ and the distance $d=|OO_s|$ .

Part 2. By measuring the coordinates of the center of the ball's image on photo 2 and by measuring/estimating the radius of the ball's image, calculate the 3D position of the ball on the second photo.

Part 1, Step 1. Fix the necessary coordinate systems on the photo and in three dimensions. According to my measurements, the frames of the first and the second photo are the same: identical rectangles with horizontal width $= 7.1$ cm and vertical height $= 14.3$ cm. This also allows me to assume that the photos have not been cropped. Consequently, since I believe that most cameras' screens are probably designed so that the projection $O_s$ of $O$ onto the screen coincides with the geometric center of the screen's rectangle $s$ , point $O_s$ should be the intersection point of the diagonals of $s$ .

Let us introduce the 2D coordinate system $Lwh$ to be with origin the lower left corner $L$ of the photos, horizontal axis $L\vec{w}$ (width) along the lower horizontal edge of the photos, and vertical axis $O\vec{h}$ (height) along the left vertical edge of the photos. Next, let us introduce the coordinate system $O_suv$ to be with origin point $O_s$ , horizontal axis $O_s\vec{u}$ parallel to the horizontal edge of the photos (and thus to $L\vec{w}$ ), and vertical axis $O_s\vec{v}$ parallel to the vertical edge of the photos (and thus to $L\vec{h}$ ). Finally, define the 3D ortho-normal coordinate system $Oxyz$ , with origin $O$ , axis $O\vec{x}$ parallel to $O_s\vec{u}$ , axis $O\vec{z}$ parallel to $O_s\vec{v}$ and axis $O\vec{y}$ perpendicular to the screen $s$ .

Observe axis $O\vec{z}$ is not perpendicular to the ground and axis $O\vec{y}$ is not parallel to the ground! However, axis $O\vec{x}$ is parallel to the ground.

Part 1, Step 2. Find the location of $O_s$ in coordinate system $Lwh$ . Now, in the coordinate system $Lwh$ the point $O_s$ has coordinates $(7.1/2, 14.3/2) = (3.55, 7.15)$ . Thus

$\begin{align} &u = w - 3.55\\ &v = h - 7.15 \end{align}$

Part 1, Step 3. Carry out measurements and preliminary constructions on photo 1. Let $A$ be the midpoint of the lower edge of the photos and $B$ be the midpoint of the upper edge of the photos. Then line $AB \, || \, L\vec{h} \, || \, O_s\vec{w}$ and thus $O_s$ is the midpoint of $AB$ (as well as $O_s$ is the intersection point of the diagonals of the photos). My measurements show (more or less) that the image of the ball on photo 1 is symmetric with respect to $AB$ . Denote by $D$ the point where the actual ball touches the ground. By my simplifying assumption I spoke about earlier, the 3D ball is interpreted as a flat disk of diameter $22$ cm whose plane is parallel to $s$ . Then, denote by $U$ this disk's diametrically opposite point of $D$ . Thus $|DU| = 22$ cm and $|AD| = 100$ cm. See the figure I have added below.

Then, if $D_s = s \cap OD$ and $U_s = s \cap OU$ , then $D_s$ and $U_S$ are respectively the lowest and the highest intersection points of the ball's image with the vertical line $AB$ on the screen $s$ . Thus, we are in the situation of the figure above.

Point $D'$ on $DU$ is such that $O_sD'$ is parallel to $AD$ , i.e. $O_sD' \, || \, AD$ , and since by assumption $AB \, || \, DU$ , the quad $ADD'O_s$ is a parallelogram so $|DD'| = |AO_s|$ . Point $D$ is $C = OO_s \cap DU$ . Therefore, triangle $O_sD'C$ is right angled with $\angle \, O_sCD' = 90^{\circ}$ because $OO_s$ is orthogonal to $AB$ and $AB$ is parallel to $DU$ .

I measured on photo 1 that

$\begin{align} &|AD_s| = 4.3 \text{ cm }\\ &|AO_s| = \frac{1}{2}\, |AB| = 7.15 \text{ cm }\\ &|AU_s| = 8 \text{ cm }\\ \end{align}$ Consequently,

$\begin{align} &|D_sO_s| = 7.15 - 4.3 = 2.85 \text{ cm }\\ &|O_sU_s| = 8 - 7.15 = 0.85 \text{ cm }\\ &|D_sU_s| = 8 - 4.3 = 3.7 \text{ cm }\\ \end{align}$

Part 1, Step 4. Calculate $d=|OO_s|$ . By Thales' intercept theorem (or similarity of triangles if you prefer)

$\frac{|DC|}{|DU|} = \frac{|D_sO_s|}{|D_sU_s|}$

$\frac{|DC|}{22} = \frac{2.85}{3.7}$

$|DC| = \frac{2.85 \cdot 22}{3.7} = 16.95 \text{ cm }$ Thus

$|D'C| = |DC| - |DD'| = |DC| - |AO_s| = 16.95 - 7.15 = 9.8 \text{ cm }$ By Pythagoras' theorem for right triangle

$O_sD'C$ we find

$|O_sC| = \sqrt{|O_sD'|^2 - |D'C|^2} = \sqrt{|AD|^2 - |D'C|^2} = \sqrt{100^2 - 9.8^2} = 99.52 \text{ cm }$ and we can even calculate the angle

$\theta = \angle D'O_sC = \arcsin{\frac{|D'C|}{|O_sD'|}} = \arcsin{\frac{9.8}{100}} = 5.624^{\circ}$ which shows how much the camera is tilted relative to the ground. Finally, again by Thales' theorem or similarity

$\frac{|OO_s|}{|OC|} = \frac{|OO_s|}{|OO_s| + |O_sC|} = \frac{d}{d+99.52} = \frac{|D_sO_s|}{|DC|} = \frac{2.85}{16.95}$ which when we solve for

$d$ , gives

$d = |OO_s| = 20.12 \text{ cm}$

Part 2, Step 1. Measuring the location of the center and the radius of the ball's image on photo 2. Since the image of the ball on photo 2 is almost a circular disc (we assume that because of the earlier assumption that the real ball is represented by a disk parallel to $s$ ), I measured the distance between the lower edge of photo 2 and the lowest point from the ball's image and found that it is $h_l = 10$ cm. Similarly, the distance between the lower edge of photo 2 and the uppermost point from the ball's image is $h_u = 10.3$ cm. The horizontal distance between the left vertical edge of photo 2 (axis $L\vec{h}$ ) and either of the two points, mentioned in the previous sentence, is $w_2 = 3.87$ cm. Thus, the coordinates of the center $Q_2$ of the ball's image on photo 2 with respect to the coordinate system $Lwh$ are approximately

$Q_2 = \big(w_2, \, (h_u+h_l)/2\big) = \big(3.87, \, (10.3+10)/2\big) = \big( 3.87, \, 10.15\big)$ and the diameter of the ball's image on photo 2 is

$h_u-h_l = 0.3$ cm.

For future reference, let us denote by $Q$ the 3D center of the real ball in the case of photo 2.

Part 2, Step 2. Calculating the 3D coordinates of $Q$ with respect to the coordinate system $Oxyz$ . To that end, we work only with photo 2. By the simplifying assumption from before, we assume that the image of the real ball on the screen $s$ is a circular disk (we call it image disk), and at the same time the real ball in 3D is represented by a circular disk, parallel to $s$ (we call it real disk). Therefore the center of the real disk, which is $Q$ , the center of the image disk $Q_2$ and the point $O$ are collinear. Moreover, the image disk and the real disk are (by assumption) homothetic to each other from the origin $O$ . In other words, there is a similarity transformation (a stretching of 3D space with respect to point $O$ ) which maps the image disk to the real disk, so in particular it maps the center $Q_2$ to the center $Q$ , while keeping the origin $O$ of $Oxyz$ fixed. Thus, the coefficient of similarity (the coefficient of stretching) is

$\lambda = \frac{\text{ diameter of real disk }}{\text{ diameter of image disk }} = \frac{ 22 }{ 0.3 } = 220/3 = 73.33$ The coordinates of

$Q_2$ with respect to coordinate system

$O_suv$ are simply

$\begin{align} &u_2 = 3.87 - 3.55 = 0.32 \text{ cm}\\ &v_2 = 10.15 - 7.15 = 3 \text{ cm} \end{align}$ Consequently, the 3D coordinates of point

$Q_2$ with respect to

$Oxyz$ are

$Q_2 = \big(u_2,\, d,\, v_2\big) = \big(0.32,\, 20.12 ,\, 3 \big)$ Therefore, to obtain the coordinates of

$Q$ we simply have to multiply the coordinates of

$Q_2$ by the factor

$\lambda$ and obtain

$Q = \big(\lambda u_2,\, \lambda d ,\, \lambda v_2\big) = \big(0.32 \cdot 220/3 ,\, 20.12\cdot 220/3 ,\, 3\cdot 220/3\big)$ Thus, we finally have the coordinates of the center of the real 3D ball from picture 2

$Q = \big( 23.47,\, 1475.47,\, 220\big)$ with respect to the coordinate system

$Oxyz$ .

If however, we would like to find the coordinates of $Q$ with respect to a coordinate system $Ox\tilde{y}\tilde{z}$ , where the latter is the rotation of $Oxyz$ around axis $O\vec{x}$ at an angle of $- \theta = - \, \angle\, D'O_sC = -\, 5.624^{\circ}$ so that now not only the axis $O\vec{x}$ is parallel to the ground but also the axis $O\vec{\tilde{y}}$ is parallel to the ground, while the axis $O\vec{\tilde{z}}$ is vertical (orthogonal) to the ground. In order to do that, we simply have to multiply the $Oxyz-$ coordinates of $Q$ by the rotation matrix

$\text{ROT}(\theta) = \begin{pmatrix} 1 & 0 & 0 \\ 0 & \cos{\theta} & -\,\sin{\theta} \\ 0 & \sin{\theta} & \cos{\theta} \\ \end{pmatrix}= \begin{pmatrix} 1 & 0 & 0 \\ 0 & 0.995 & -\,0.098 \\ 0 & 0.098 & 0.995 \\ \end{pmatrix}$ and obtain the

$Ox\tilde{y}\tilde{z}-$ coordinates of

$Q$

$\begin{pmatrix} 1 & 0 & 0 \\ 0 & 0.995 & -\,0.098 \\ 0 & 0.098 & 0.995 \\ \end{pmatrix} \begin{pmatrix} 23.47 \\ 1475.47 \\ 220 \end{pmatrix} = \begin{pmatrix} 23.47\\ 1446.53 \\ 363.5 \end{pmatrix}$ Furthermore, if we want to calculate how high point

$O$ is from the ground, we can go back to the geometric figure and see that the height of

$O$ can be calculated as

$|AO_s| \cos{\theta} - d \sin{\theta} = 7.15 \cdot 0.995 - 20.12 \cdot 0.098 = 5.142 \text{ cm}$

Finally we can conclude that in the situation depicted on photo 2, the ball has moved from its initial position on photo 1, by approximately $23.47$ centimeters to the right, it's height from the ground is approximately $363.5+5.142 = 368.642$ centimeters, which is, give or take, $3$ meters and $68$ centimeters. Horizontally the ball has moved $14$ meters and $47$ centimeters from point $O$ , which is roughly $13$ meters from it's initial position.

Blog

Sunday, 25 February 2018

kinematics - Extracting the 3D coordinates of a moving object from a video

No comments:

Post a Comment

Understanding Stagnation point in pitot fluid