Making lossy compression practically lossless

Much like our eyes looking to the stars and for our brains to recognise what the star constellations are, capturing reality through cameras and sensors requires computer intelligence to recognise objects. Evolution caused the human system and brain to detect and recognise patterns flawlessly. Doing this via a computer comes with limitations such as bandwidth, latency of transfer, processing capacity and speed.

To overcome these limitations, data compression is used to make data more compact. That directly begs the question: “What is the quality loss with compression?”. The answer highly depends on the techniques used and the application needs for quality. Every sensor data stream - be it position, speed, camera or 3D visual - is essentially a measured reality represented by a digital number. Algorithms provide methods to represent these digital arrays of numbers into a more compact size (encoding). The compact representation can - once again - be brought back to the original data (decoding). Compression can be distinguished into two categories:

  1. Lossless: a compression that can decode back to exactly the original data. Typically, reduction performance here is limited.

  2. Lossy: a compression that decodes back with a controlled deviation from the original. Here reduction performance is significant (factor 10x and more).

Lossless compression

The algorithms used to compress the data can completely reproduce the original data without change. As an analogy: if the original data represents an inflated balloon, the compression takes the air out and the reconstruction brings back the balloon to its original blown state. Examples of compression in computer application would be a zip file or WebP for images. Lossless compression follows algorithms such as the Lempel–Ziv–Welch Algorithm, arithmetic encoding, Huffman encoding, Shannon Fano coding. These algorithms use a predefined set of character dictionary along with methodological grouping of numerical characters from input based on the frequency of repetition. Application of lossless compression applies to compression of text or fundamentally accurate data where every data point has a focus and function. The compression rate of lossless compression is relatively low and deliberately depends on the frequency of repeated characters in it. The required accuracy cripples the application of dynamic statistics and approximation to it.

Lossless compression schematic

Lossy compression

The lossy algorithms compress to a far large extent and recreate a very accurate representation of the input data within defined error limits. The analogy used here is: if the original data represents a full car, the compressed version would be an accurate, small clay model that represents the full model. That model can be used to produce back the full car with an almost perfect accuracy. Most Images (JPEG), Videos (MP4/H264), Audios (MP3) are compressed with lossy compression to be able to process and transmit them quickly and at low (hardware, network and storage) costs. Some of the well-known applied algorithms that segments and divides data into fragments are: discrete cosine transform, transform coding, discrete wavelet transform, fractal compression, rectangle segmentation and SparseMatrix storage. Focus points within the prioritized data assists the reconstruction guarantee i.e. the deviation between the original supplied data to the reconstructed data. A major application of this type of compression is in the processing of images. The compression rate of lossy compression is very high compared to lossless, and the exact level of compression can vary depending on the algorithms and targeted reconstruction guarantee. Almost every digital data ever created in the form of video or image is processed through a lossy algorithm to attain a workable performance for a practical application at acceptable costs.

Lossy compression example

Sensor noise and very efficient, near-lossless data reduction

Every sensor has a natural limitation in its physical accuracy, due to various factors including electronic noise, environment noise etc. This is called: sensor noise, sensor inaccuracy or measurement uncertainty.

With Teraki technology (www.teraki.com) one can set the maximum acceptable deviation for the reconstructed data at the same level as the sensor inaccuracy. The allowed maximum deviation (hard bound) is fully configurable and determined by the customer. For instance, a sensor with 1% (telematics) or 2 cm (LiDAR) inaccuracy would allow each reconstructed data point to deviate with respectively 1% or 2 cm or less.

An accuracy lower than the sensor inaccuracy (noise) is therefore a meaningless accuracy. Using the fact that one cannot be more accurate than the sensor. When Teraki technology operates within this sensor noise it delivers factor 10X or more of data reduction, without any loss of meaningful accuracy in physical information.

Although theoretically a lossy scheme, Teraki technology makes the customer control the level its use case can accept. The maximum allowed deviation can be configured by the customer and enables to exceed the sensor noise only by a tiny fragment. For instance, taking a sensor noise of 2 cm and bring it to (maximum!) 2.1 cm whilst achieving factor 6x reduction. With Teraki one can control this number to be at a level where it is not having an adverse effect on the quality of the AI-models.

With this one reaches the best efficiency and latency levels without meaningful impacting the relevant outcomes of Neural Networks or Machine Learning algorithms.

In some use cases where latency would be more important than accuracy, the Teraki product allows the maximum deviation to be configured higher than the sensor accuracy. If one does so, factors of 30x to 50x data reduction are easily reached. As opposed to the allowed deviations lower than the sensor noise, inaccuracy levels exceeding the sensor noise would have an impact on the accuracy of AI-models (but deliver high operational efficiency).

Overall, the benefits of high(er) reduction are clear: low latency, low hardware requirement, low energy consumption. When delivering these benefits, the deterministic Teraki technology can guarantee that:

  • Teraki data reduction does not compromise on any data point loss.

  • Teraki data reconstruction guarantees no outliers.

Physical error analysis

Comparison of Measured physical error [σ (M)] and Physical + Compression error [σ (M+C)] relative to actual value for customer defined hard bound maximum deviations [Max. Dev.].

Consider the case of a front LiDAR in a car, for the purpose of autonomous driving. If the LiDAR is limited to 2 cm precision, Teraki can contain that accuracy in its enhanced reduction techniques in compression with objective focus to bind the compression error within the probability of natural error occurrence of the sensor.

Given a 2 cm sensor error, adding a compression with 1 cm maximum deviation leads to a cumulative error of 2.1 cm, while achieving a factor 9x data reduction. A minor increase of total error increase of 0.1 cm is an acceptable for LiDAR driven application use case. Based on the application requirement, Teraki can provide a customised and tailor-made solution to deliver various multitude of size reductions based on the customer defined limits of meaningful accuracy.

Conclusion:

For developing any physical applications of autonomous driving, an optimised pre-processing is essential without compromising on the quality of information. As the data inputs of the sensors and cameras have intrinsic physical inaccuracy, we can use this fact to apply a lossy compression within the sensor error limits to achieve - in practice - a lossless quality. Teraki provides data reduction rates of factor 10x to 20x compatible with efficient operation of the neural networks. In other words: when allowed deviation is lower than the sensor error, the Teraki pre-processing of sensor data has no or negligible impact on the functional behaviour of a NN or AI-model. Turning a lossy technology into a highly efficient and practically lossless technology.

Share this Post: