C-Tracking-background

Comprehensive Tracking Protocol

One Unified Standard for Virtual Production Tracking

Camera and object tracking are critical components of advanced virtual production workflows. Yet, the industry faces a major challenge: countless tracking devices use different communication protocols, creating complexity for hardware manufacturers, software developers, and production teams.

The Comprehensive Tracking Protocol (C-Tracking) solves this problem. Created by industry experts, C-tracking is a free, unified, and future-proof protocol designed to streamline tracking across all platforms and devices. It’s accurate, flexible, simple to implement, and free to use without restrictions. Making it the ideal standard for the entire industry.

Whether you're building hardware, developing virtual production platforms, or managing large-scale productions, C-Tracking is built to meet your needs, without licensing fees or usage restrictions.

Download Comprehensive Tracking Protocol Specification

Learn how to implement the protocol and join the movement toward seamless, standardized tracking.
No registration required.

download

Revision 43

Comprehensive Tracking Protocol Specification

Table of contents
1 General

This protocol is used to transmit tracking data about devices across a network. C-Tracking is designed for sending real physical values. See more info in appendix D. In addition to tracking information, it can include data about other measurements.

Messages of this protocol are sent over UDP. The maximum length of a message, not including the IP and UDP headers, is 1400 bytes.

Messages are generated by the source (“sender”) and are sent to a pre-configured destination IP address-UDP port pair (“receiver”). The messages may be generated at regular intervals or as soon as the data becomes available at the sender’s discretion. The destination IP address may be unicast, broadcast or multicast. There is no communication in the other direction (from the receiver to the sender).

An UDP packet should contain information about a single logical device, and all UDP packets sent to the same port should contain information about the same device. If the device is a camera, all information about it (lens information, position, etc.) is expected to be sent in a single packet. A device can also represent only the measurement of the position and/or orientation of a single object in space, in which case the packet will probably only contain position/orientation information.

Any destination port number may be used, although port numbers below 1024 are not recommended for default values because they may cause problems if the receiver system requires special permissions to listen on such ports. The recommended default destination port is 2001 for the first device, and subsequent port numbers for any additional devices.

The packet format is flexible, with all the transmitted data (except for a short header) contained in elements, each of which are optional. This allows the sender to omit data which is unknown, or which is not applicable to the device whose data is being reported (for example, lens distortion information for a tracking device). However, every packet must contain all the available data for the device, even if it did not change since the last packet.

All numerical fields in a packet spanning multiple bytes are stored in big endian byte order. Signed integer fields are stored in two’s complement notation. Floating point values are encoded according to IEEE 754-2008 in binary32 (“single precision”) format. Infinite and NaN values are only allowed where explicitly stated. All NaN values sent must be quiet (non-signaling), with the specified payload, or with a payload of 0 if it is not mentioned.

Description of the types used in the tables below:

  • N bytes: a fixed number of bytes.
  • uint8: an 8-bit unsigned integer.
  • uint16: a 16-bit unsigned integer in big endian byte order.
  • uint32: a 32-bit unsigned integer in big endian byte order.
  • uint64: a 64-bit unsigned integer in big endian byte order.
  • single: a single precision binary floating point value encoded according to IEEE 754-2008 in big endian byte order.

2 Packet structure
2.1 Header

The header starts at address 0 in the packet.

The receiver discards packets with an unexpected protocol identifier.

The receiver discards packets with a header length less than 6. The receiver silently ignores the contents of the header beyond the first 6 bytes if the header length is greater than 6.

ProtocolIdentifier 4 bytes “CTrk” (the value 0x4354726B in big endian byte order).
HeaderLength uint16 6
2.2 Elements

All elements are optional. However, it's recommended that the sender includes the Frame rate element in the packets it sends. In the absence of this element, the receiver has to assume the packets may arrive from the sender irregularly. Each element may only be present once in a packet unless otherwise noted in the element description.

All elements start at the first address aligned to a multiple of 4 after the previous element or (in the case of the first element) after the header. Padding bytes (no more than 3) are inserted by the sender after the header, between elements, and optionally after the last element to enforce this alignment. The sender sets the padding bytes to 0; the receiver ignores the contents of the padding bytes. The packet may contain no data after the last element apart from optionally at most 3 padding bytes.

The receiver discards packets containing an element with an element length less than 4.

The receiver silently ignores any elements with an unknown element type.

If the receiver encounters an element with a length less than it expects, it silently ignores the element.

If the receiver encounters an element with a length greater than it expects, it parses the element as if it were of the expected length, ignoring the excess data. The address of the next element is still calculated based on the value of the ElementLength field.

ElementType uint16 Element type
ElementLength uint16 Length of the element, including the ElementType and ElementLength fields, excluding any padding bytes.
Payload Element type-specific payload
2.2.1 Timecode

This element contains the timecode for the packet.

The four parts of the HH:MM:SS:FF timecode are sent in separate fields, with an additional optional field for a subframe index as described below.

The timecode base is normally equal to the frame rate of the sending system. The protocol allows any integral timecode base up to 255. However, only timecode bases up to 30 are compliant with the SMPTE standard. To send an SMPTE-compliant timecode for higher frame rates, set Base to the highest divisor of the frame rate that is less than or equal to 30. In this case, multiple consecutive frames will have the same HH:MM:SS:FF timecode. To differentiate between these, increase Subframe by one for every subsequent frame with the same frame number, and reset Subframe to zero when the frame number changes.

For example, to represent 50 frames per second, Base can simply be set to 50. In this case, Frame will increase from 0 to 49 within a single second, and Subframe will always contain zero. Alternatively, for SMPTE compliance, Base can be set to 25. In this case, Frame will only increase from 0 to 24 before jumping back to zero at the end of a second. Each HH:MM:SS:FF timecode will be repeated in two consecutive frames. Subframe should be set to 0 in the first one, and to 1 in the second.

In order to specify a timecode base for an NTSC-based frame rate (23.976, 29.97, 59.94, 119.88, 239.76), set Base to 24, 30, 60, 120 or 240 respectively, and set the NTSC bit (the least significant bit) in Flags. In an SMPTE-compliant representation, the maximum frame rate representable without subframes is 29.97. To specify, for example, 239.76 frames per second in SMPTE-compliant format, set Base to 30, set the NTSC bit, and use Subframe values from 0 to 7 cyclically to distinguish between the same HH:MM:SS:FF values.

Also note that all NTSC-based frame timecodes, except for 23.976, are assumed to be drop-frame ones.

ElementType uint16 0
ElementLength uint16 11
Hours uint8 The hours part of the timecode
Minutes uint8 The minutes part of the timecode
Seconds uint8 The seconds part of the timecode
Frames uint8 The frames part of the timecode
Subframe uint8 The subframe indexd
Base uint8 The timecode base
Flags uint8 The least significant bit is set to 1 for NTSC frame rates, and to 0 otherwise. The other bits are currently unassigned and should be set to 0 when sending.
2.2.2 Field of view

This element contains the field of view and the frame aspect ratio for a camera.

ElementType uint16 1
ElementLength uint16 12
HorizontalFieldOfView single Horizontal field of view in degrees
FrameAspectRatio single Frame aspect ratio, width / height
2.2.3 Basic lens distortion

This element contains the K1/K2-based lens distortion parameters for a camera.

This element is only used to describe radial distortion in general-purpose lenses. In particular, it is not suitable to describe the distortion of fisheye, anamorphic, or tilt-shift lenses.

A packet containing this element should also contain a “Field of view” element. Without field of view information, the contents of this element are not useful to the receiver.

The fields of this element contain the coefficients according to the distortion model used by OpenCV. A description of the distortion model is provided in appendix B, with the assumption that all coefficients other than K1 and K2 are zero.

Fields with unknown (not measured/calibrated) values should contain zero.

On choosing between the basic and extended parameter set see appendix B.2.

ElementType uint16 2
ElementLength uint16 20
CenterX single X coordinate of the principal point, relative to the image center. The left edge of the image is -0.5, the right edge is 0.5.
CenterY single Y coordinate of the principal point, relative to the image center. The top edge of the image is -0.5, the bottom edge is 0.5.
K1 single First radial distortion coefficient
K2 single Second radial distortion coefficient
2.2.4 Extended lens distortion

This element contains an extended set of lens distortion parameters for a camera.

This element is only used to describe distortion in general-purpose lenses. In particular, it is not suitable to describe the distortion of fisheye, anamorphic, or tilt-shift lenses.

A packet containing this element should also contain a “Field of view” element. Without field of view information, the contents of this element are not useful to the receiver.

The fields of this element contain the coefficients according to the distortion model used by OpenCV. A description of the distortion model is provided in appendix B.

Fields with unknown (not measured/calibrated) values should contain zero.

On choosing between the basic and extended parameter set see appendix B.2.

ElementType uint16 3
ElementLength uint16 60
CenterX single X coordinate of the principal point, relative to the image center. The left edge of the image is -0.5, the right edge is 0.5.
CenterY single Y coordinate of the principal point, relative to the image center. The top edge of the image is -0.5, the bottom edge is 0.5.
K1 single First radial distortion coefficient
K2 single Second radial distortion coefficient
K3 single Third radial distortion coefficient
K4 single Fourth radial distortion coefficient
K5 single Fifth radial distortion coefficient
K6 single Sixth radial distortion coefficient
P1 single First tangential distortion coefficient
P2 single Second tangential distortion coefficient
S1 single First thin prism distortion coefficient
S2 single Second thin prism distortion coefficient
S3 single Third thin prism distortion coefficient
S4 single Fourth thin prism distortion coefficient
2.2.5 Focus distance

This element contains the distance to the object in focus for a camera. The distance is measured from the entrance pupil as described in appendix B. This distance may be different from the physical distance calculated from the camera sensor. In most practical cases however, the difference between these two values is negligible.

ElementType uint16 4
ElementLength uint16 8
FocusDistance single The focus distance in meters
2.2.6 Sensor information

This element contains information about the camera’s sensor.

ElementType uint16 5
ElementLength uint16 16
ActiveSensorWidth single Width of the part of the camera sensor used to produce the output image, in millimeters. May be zero if unknown.
ActiveSensorHeight single Height of the part of the camera sensor used to produce the output image, in millimeters. May be zero if unknown.
ActiveHorizontalResolution unit16 Horizontal resolution of the part of the camera sensor used to produce the output image, in pixels. May be zero if unknown.
ActiveVerticalResolution unit16 Vertical resolution of the part of the camera sensor used to produce the output image, in pixels. May be zero if unknown.
2.2.7 Aperture

This element contains the width of the aperture, represented as an f-number, for a camera.

ElementType uint16 6
ElementLength uint16 8
FNumber single The f-number. For example, an aperture of f/5.6 should be represented by the value 5.6.
2.2.8 Vignetting

This element contains information about the gradual reduction of the image’s brightness towards periphery of the image.

This element is only used to describe vignetting in general-purpose lenses with a circular vignetting effect. In particular, it may not be suitable to describe the effect in fisheye, anamorphic or tilt-shift lenses.

This element contains an arbitrary number N (≥ 1) of data fields. Each of these fields contain the ratio of brightness lost at a specific distance from the image center. Each field must contain a value between 0.0 and 1.0, with 0.0 representing no brightness loss and 1.0 representing a complete loss of brightness (a pixel that is always black).

The values are ordered from the image center to the corners and are at equal intervals. The first value represents the brightness loss at 1/N the way from the image center to a corner, the second value at 2/N the way, and so on. The last value is the brightness loss at the corners. The value for the center of the image is not transported and is always assumed to be zero (no loss of brightness).

ElementType uint16 7
ElementLength uint16 Length of the element. Equal to 8 + 4⋅RatioCount.
RatioCount uint16 The number of ratio fields in the packet (N).
Reserved uint16 Zero
Ratio1 single Brightness loss at 1/N the way from the image center to a corner
Ratio2 single Brightness loss at 2/N the way from the image center to a corner
...
RatioN single Brightness loss at the image corners
2.2.9 Position

This element contains information about the position of a camera or a standalone tracking device.

If the device is a camera, the position reported is that of the entrance pupil as described in appendix B.

The position is represented as the combination of a translation and a rotation. These are applied in the usual order (rotation first, then translation) to get the transformation representing the position.

The translation is specified in meters in a right-handed coordinate system. The axes of the coordinate system represent the right (X), up (Y) and backward (Z) directions of the tracking system. If the tracking system is capable of ensuring it, Y = 0 should correspond to the floor level, with Y > 0 representing objects above the floor.

The rotation is represented as a unit quaternion (using Hamilton’s definition for multiplying basis elements) in the form (x, y, z, w), where x, y and z are the components corresponding to the respective coordinates, and w is the real component.

The error of the translation value is described in the TranslationError field. This field contains the maximum magnitude of the difference between the reported translation and the real translation vector, with a confidence of 99.73% (3σ, assuming a measurement error with normal distribution).

If the translation is unknown, the translation error fields should be set to positive infinity. If the error of the translation is unknown, the translation error fields should contain zero.

The error of the rotation value is described in the RotationError field. It contains the maximum angle of the rotation between the reported rotation (described by the RotationX, RotationY, RotationZ and RotationW fields) and the real rotation, with a confidence of 99.73%.

If the rotation is unknown, the rotation error field should be set to positive infinity. If the error of the rotation is unknown, the rotation error field should contain zero.

ElementType uint16 8
ElementLength uint16 40
TranslationX single X component of the translation vector in meters
TranslationY single Y component of the translation vector in meters
TranslationZ single Z component of the translation vector in meters
RotationX single X component of the rotation quaternion
RotationY single Y component of the rotation quaternion
RotationZ single Z component of the rotation quaternion
RotationW single real component of the rotation quaternion
TranslationError single Error of the translation vector in meters. +∞ if the transla􀆟on vector is unknown; 0.0 if the error is unknown.
RotationError single Error of the rotation component in radians. +∞ if the rotation is unknown; 0.0 if the error is unknown.
2.2.10 Velocity

This element contains information about the velocity and angular velocity of a camera or standalone tracking device. It can help the receiver approximate the device’s position between packets.

The velocity is specified in the same coordinate system as the position in meters per second; the angular velocity (per second) is represented as a unit quaternion like the rotation.

If either the velocity or the angular velocity is unknown, the component fields of the unknown quantity should contain NaN.

ElementType uint16 9
ElementLength uint16 32
VelocityX single X component of the velocity vector in meters per second; NaN if the velocity is unknown
VelocityY single Y component of the velocity vector in meters per second; NaN if the velocity is unknown
VelocityZ single Z component of the velocity vector in meters per second; NaN if the velocity is unknown
AngularVelocityX single X component of the angular velocity quaternion (per second); NaN if the angular velocity is unknown
AngularVelocityY single Y component of the angular velocity quaternion (per second); NaN if the angular velocity is unknown
AngularVelocityZ single Z component of the angular velocity quaternion (persecond); NaN if the angular velocity is unknown
AngularVelocityW single real component of the angular velocity quaternion (persecond); NaN if the angular velocity is unknown
2.2.11 Frame rate

This element contains the frame rate of the tracking device.

Presence of this element indicates that the sender generates packets in regular intervals, with one packet being generated every 1/framerate seconds. If this element is present, the sender must ensure it sends packets as close to this schedule as possible.

Frame rate is expressed in the form of a rational number (quotient of two integers). This is necessary to express NTSC-based frame rates precisely.

In the non-NTSC cases when the frame rate is integer we recommend using 1 as the denominator, for example 50 fps should be expressed as 50 / 1. In the case of a fractional frame rate use a suitable denominator, for example 12.5 fps would be 25 / 2.

In the NTSC cases it is mandatory to use 1001 as the denominator. For example 23.976 fps must be expressed as 24000 / 1001, 59.94 fps as 60000 / 1001 and so on.

ElementType uint16 10
ElementLength uint16 12
FrameRateNumerator uint32 The numerator part of the frames per second value
FrameRateDenominator uint32 The denominator part of the frames per second value
2.2.12 Raw encoder value and custom measurement

This element can be used to report the value of a measurement. The measurement can be one of the usual encoder values like zoom, focus, or aperture, or any custom measurement value, for example a temperature. Multiple elements of this type can be present in a packet, but each instance has to contain a different value in the MeasurementType field.

It is important to note that sending raw zoom and focus values should only be a supplementary addition, for example as a debugging data. It is mandatory to also include the Field of view element in the packet when the raw zoom encoder value is present, the Focus distance element when the raw focus encoder value is present, and the Aperture element when the raw aperture encoder value is present.

ElementType uint16 11
ElementLength uint16 20
MeasurementType uint32 An identifier describing the type of value contained in the element. 0 for zoom encoder, 1 for focus encoder, 2 for aperture encoder; any other value designates a custom measurement.
Value single The value of the measurement
MinValue single The lowest possible value of the measurement or NaN if unknown. The value may be negative infinity if there is no lower bound.
MaxValue single The highest possible value of the measurement or NaN if unknown. The value may be positive infinity if there is no upper bound.
A Calculated information
A.1 Depth of field

The receiver can calculate the depth of field from the focal length of the lens, the focus distance, and the aperture width. The focal length itself is calculated from the field of view and the size of the part of the camera sensor used to produce the output image.

This means that a packet needs to contain the following elements for the receiver to be able to calculate depth of field information:

  • Field of view,
  • Focus distance,
  • Sensor information, with the ActiveSensorWidth and ActiveSensorHeight fields filled (not set to zero),
  • Aperture.
A.2 Vertical Field of View

The transmitted data contains the “Horizontal Field of View” and “Frame Aspect Ratio” values. If you also need the Vertical Field Of View value, you can calculate it this way:The transmitted data contains the “Horizontal Field of View” and “Frame Aspect Ratio” values. If you also need the Vertical Field Of View value, you can calculate it this way:

\[ \textit{FOV}_y = 2 \cdot \textit{arctan}\left( \textit{tan}\left( \frac{\textit{FOV}_x}{2} \right) \cdot \frac{1}{\textit{AR}} \right) \]

B Camera model
B.1 Description of the model

To represent the projection and the distortion of the camera, a pinhole camera model is used with the addition of a distortion model similar to that used by OpenCV as described at https://docs.opencv.org/4.x/d9/d0c/group__calib3d.html#details.

This model describes most general-purpose camera lenses well but is not suitable for describing the distortion of fisheye, anamorphic, or tilt-shift lenses.

The model is centered around the entrance pupil of the camera (the center of perspective, sometimes also incorrectly called the nodal point). The entrance pupil is not a physical point in the lens or the camera; it’s the point where a virtual pinhole camera must be placed for it to perfectly overlap the field of view of the real camera. The entrance pupil is located on the camera’s optical axis, somewhere behind the lens. In the case of a zoom lens, its position is not fixed; it moves forward and backward as the field of view changes.

The camera’s coordinate system is centered on the entrance pupil, with the X axis pointing right, the Y axis pointing up, and the Z axis pointing backward.

The following steps are used to find the coordinates (u, v) of the pixel at which the camera displays a point at coordinates (xc, yc, zc) in this coordinate system:

The coordinates are projected onto the z = 1 plane:

$$x_1 = \frac{x_c}{z_c}$$

$$y_1 = \frac{y_c}{z_c}$$

Distance from the optical axis is calculated:

$$r = \sqrt{x_1^2 + y_1^2}$$

Lens distortion is applied:

$$ x_d = x_1 \cdot \frac{1 + k_1 r^2 + k_2 r^4 + k_3 r^6}{1 + k_4 r^2 + k_5 r^4 + k_6 r^6} + 2p_1 x_1 y_1 + p_2 (r^2 + 2x_1^2) + s_1 r^2 + s_2 r^4 $$

$$ y_d = y_1 \cdot \frac{1 + k_1 r^2 + k_2 r^4 + k_3 r^6}{1 + k_4 r^2 + k_5 r^4 + k_6 r^6} + p_1 (r^2 + 2 y_1^2) + 2 p_2 x_1 y_1 + s_3 r^2 + s_4 r^4 $$

The distorted point is projected onto the image sensor:

$$u = f_x \cdot x_d + c_x$$

$$v = f_y \cdot y_d + c_y$$

Coefficients k1 through k6 represent radial distortion (barrel distortion, pincushion distortion, or a combination of the two). p1 and p2 represent tangential (de-centering) distortion. s1 through s4 represent thin prism distortion. The “Lens distortion” element contains these coefficients. If any of these coefficients is not measured, it should be passed as zero.

The most prevalent type of distortion on general-purpose lenses is radial distortion, and it can be usually described fairly accurately using only k1 and k2. If lens distortion is calculated in a model using only these two coefficients, the rest of the fields in the “Lens distortion” element should be zero.

In the final step, fx and fy contain the focal length measured in pixels, and (cx, cy) are the coordinates of the principal point (the point where the optical axis intersects the image sensor) in pixels. These values are not contained in the packets directly to avoid dependence on the resolution of the image transmitted by the camera. The following data is transmitted instead:

  • in the “Field of view” element:

$$\textit{FOV}_x = 2 \cdot \textit{arctan}\left(\frac{r_x}{2 f_x}\right)$$

\[ \textit{AR} = \frac{r_x}{r_y} \cdot \frac{f_x}{f_y} \]

(FOV is measured in degrees), and

  • in the “Lens distortion” element:

\[ C_x = \frac{c_x}{r_x} - 0.5 \]

\[ C_y = \frac{c_y}{r_y} - 0.5 \]

where rx and ry are the horizontal and vertical resolution of the image, respectively.

B.2 Choosing between basic and extended parameter sets

Though the extended model is a superset of the basic model, its K1 and K2 coefficients can differ from the K1 and K2 values of the basic model. The sender should send the model that the lens calibration is performed with. However, if the sender uses the extended model, it is highly recommended to also calculate and send the basic parameter set, or at least provide a selectable option on its UI to do so.

The receiver is allowed to support only one of the models, depending on which one is more suitable for its purposes.

C Conversion from other models

The protocol accepts lens distortion data in only one format. If the distortion of the lens is calculated using a different model, it needs to be converted to this representation. The exact way to do this depends on the distortion model used; some common differences are listed below.

C.1 Focal length

Camera models usually use focal length to describe the field of view; this protocol expects field of view angles instead. In general, the

$$\textit{FOV}_x = 2 \cdot \textit{arctan}\left(\frac{s_x}{2 f_x}\right)$$

$$\textit{FOV}_y = 2 \cdot \textit{arctan}\left(\frac{s_y}{2 f_y}\right)$$

equations can be used to calculate the field of view from the focal length (fx, fy) and the sensor size (sx, sy). The two quantities have to be in the same unit of measurement. Calibration tools usually output fx and fy in pixels, in which case sx and sy are the horizontal and vertical of the image, respectively. Some devices may output both in millimeters, though, in which case the same formula can be used.

C.2 Principal point

Other models may describe the principal point using different units or in different directions. If the model’s description doesn’t explicitly mention what the principal point is, it’s often easiest to find it as the central point of the radial distortion.

C.3 Distortion applied after projection and coefficient scaling

Some models apply distortion and projection in the opposite order; that is, they first project the object coordinates onto the sensor (using fx, fy, cx and cy), and apply distortion after this. To convert the distortion coefficients from such a model to the one used in this protocol, one first has to reproject the image onto a plane at unit distance and transform the coefficients accordingly.

In addition, the values of the coefficients depend on the choice of unit for the coordinates. The coordinates are often specified in pixels, millimeters (on the camera sensor), or in multiples of the image width, height, or diagonal. As an example, if the coordinates are measured in millimeters, k1 is measured in mm-2 and k2 is measured in mm-4. The unit of measurement needs to be taken into account when converting the coefficients to the representation used in this protocol.

In the model used in this protocol, the undistorted coordinates x1 and y1 are measured on a plane at unit distance from the entrance pupil, which results in what could be considered a natural representation. This means that the distortion coefficients are independent of both the focal length and of the unit used for measuring xc, y and zc.

It should also be noted that technically distortion applied after projection using the focal lengths (fx and fy) cannot be represented using the model used in this protocol (and vice versa): a circularly symmetric radial distortion in one will generally result in a non-circular distortion in the other. However, as long as fx and fy are almost identical—as is the case in any lens which does not purposefully compress the image in one direction—the result should be visually indistinguishable.

C.4 Inverse direction

In addition to the above, some models describe distortion in the inverse direction; that is, they give the coordinates of the undistorted pixels as a function of the coordinates of the distorted ones. There is generally no exact way to calculate the inverse form of such a model, but here is a method that works well with not loo large distortions. areaWidth refers to the projection area, typically the used part of the camera sensor witdh. Its unit depends on the model you’re converting from.


                            bool InvertRadial(double K1, double K2, float areaWidth, double& invK1, double& invK2)
                            {
                                invK1 = invK2 = 0.0;
                                const int N = 5;
                                double pa = 0.5f * areaWidth;
                                double dx = 1.0 / N;
                                double x = dx;
                                double D[N][2];
                                double d[N];
                                
                                for (int i = 0; i < N; i++, x += dx)
                                {
                                    double undist = x;
                                    double p = x * pa;
                                    double p2 = p * p;
                                    p *= (1.0f + p2 * (K1 + p2 * K2));
                                    double dist = p / pa;
                                    double r = dist * pa;
                                    double q = undist * pa;
                                    double r2 = r * r;
                                    D[i][0] = r * r2;
                                    D[i][1] = r * r2 * r2;
                                    d[i] = q - r;
                                }
                            
                                // D.transp() * D
                                double A[2][2];
                                memset(A, 0, sizeof(A));
                                
                                for (int i = 0; i < N; i++)
                                {
                                    A[0][0] += D[i][0] * D[i][0];
                                    A[0][1] += D[i][0] * D[i][1];
                                    A[1][1] += D[i][1] * D[i][1];
                                }
                                A[1][0] = A[0][1];
                            
                                // (D.transp() * D).inv()
                                double B[2][2];
                                double det = (A[0][0] * A[1][1] - A[0][1] * A[1][0]);
                                
                                if (fabs(det) < 1e-14) return false;
                                
                                B[0][0] = A[1][1] / det;
                                B[1][1] = A[0][0] / det;
                                B[0][1] = B[1][0] = -A[0][1] / det;
                            
                                // (D.transp() * D).inv() * D.transp()
                                double C[2][N];
                                for (int i = 0; i < N; i++)
                                {
                                    C[0][i] = B[0][0] * D[i][0] + B[0][1] * D[i][1];
                                    C[1][i] = B[1][0] * D[i][0] + B[1][1] * D[i][1];
                                }
                                                    
                                // (D.transp() * D).inv() * D.transp() * d
                                for (int i = 0; i < N; i++)
                                {
                                    invK1 += C[0][i] * d[i];
                                    invK2 += C[1][i] * d[i];
                                }
                                
                                return true;
                            }
                            
D Sending physical data

C-Tracking is designed for sending real physical values. For example, instead of sending the raw encoder value of a zoom sensor, it is recommended to send the real FOV and lens distortion data. This requires that the lens calibration happens on the tracking device's side. It might need a little bit more involvement, but in return the tracking device becomes a real plug-and-play device. In some cases, like for example PTZ cameras, this can happen in the factory. For a batch of cameras with the same building parameters, it is enough to happen once, and the calibration profile can be stored in the firmware.