Silicon Valley: The Weissman Score And A Struggle For Lossless Compression



AUTHOR

CollegeTime

LIKES

2

LIKE


Watching HBO’s show ‘Silicon Valley’, you must have come across these terms mentioned in the headline of this article. If you are unaware of basic computer science, those terms are nothing but a mumbo jumbo to you. But this will not be a problem anymore. Fortunately, this article will take you across all the things you need to know, not just to laugh but also acknowledge the struggle of Richard Hendricks, CEO and later CTO of Pied Piper.

Disclaimer: Considering the comedy genre of the show there is probably information in this article that will ruin your fun while watching the show. Thus, NO SPOILERS AHEAD! 

The basic storyline of Silicon Valley depicts the life of Richard Hendricks who designed a lossless compression algorithm to store more media files that occupy comparatively less storage avoid any form of data loss. As he continues to work on it, he realized that the same algorithm can be used to compress various other types of files like Photos and Videos. He then lays the foundation of a company named Pied Piper and then works hard to get venture capitalists to invest in his form. At the same instance, we come across a term called ‘Weissman Score’ which was used as a parameter to test the efficiency and quantify the level of compression.

This leaves us with the following questions:

  • What is Lossless Compression? 

  • Are there any more type of Compression systems that is not ‘lossless’?

  • What is the Weismann score?  And is this used to benchmark various compression algorithms in reality?  

So psych up to learn some basic Computer Science.


What is Compression?

The method of representing data in a compact form is called Data Compression. This definition probably adds no value to your current understanding of Data Compression so let us attempt to understand the same through a different pair of lens.

As the amount of data to be stored across the storage devices increased rapidly there was a need to arrange data in a way that it takes up less space without diminishing in quality. In other words, squeeze the information stored on a storage device in a way that it can be used or edited later without any data loss. Data Compression can also aid efficient data transfer across the internet with no significant data loss like Whatsapp compresses image and video files. For this, there are many techniques used across various file formats like mp3, wav, jpeg, etc. to compress data. Some of these basic compression techniques are:

  1. Run-Length Coding

  2. Huffman Coding 

  3. LZ77 Compression

Let us understand one of these Compression Techniques.


Run-Length Coding

A lot of data we come across may have some repetitive information. In these simple compression techniques, we replace the information that is repeated by adding a factor of the number of times the data gets repeated along the length of code. This eventually becomes a little easier with binary digits like 0 and 1 as the computer stores everything in the form of 0s and 1s.

Take for example this data:

 heeyyyy howw yoouuu ddoooinnggg  

 Above data has multiple repetitions which may make sense to you but it is not required to store them as it is so by using Run-Length Coding the code will become,

1h2e4y 1h1o2w 1y2o3u 2d3o1i2n3g

Practically, both will take up the same space due to the same number of characters. But this gives you an idea about how is it done.

The other two and many different compression techniques use different logic to compress but the type we discussed is Lossless Compression Techniques. Other than lossless compression we have another lossy compression technique. Let us further breakdown these individually:


Lossless Compression

As the name suggests while using this type of compression there is no loss of data while Encoding or Decoding a data file. This type of compression algorithm was developed by Richard Hendricks in HBO’s Silicon Valley. Another feature of this algorithm was it was based on a middle-out compression technique which we will keep for a future article; still, if you wish to find it out yourself I will link some good sources below.

This is usually used for text files, spreadsheets, and other important documents as you want to get the same text or numbers when you read that file every time.






Lossy Compression

Suppose you are recording music, the music file may contain sounds that are not audible to Human ears like Ultrasonic sound. Therefore, even though we lose the data it is still a win-win situation. The same is done with video files, images, etc. Following is a great example that explains compression for images.C:\Users\chirag jain\Downloads\LossyCompressionMeme.jpg


This is an uncompressed JPEG image of a dog. If we zoom out enough we can see the pixels arranged in the following pattern.



Each of these pixels is not visible to us or we cannot perceive that level of details effectively like in the case of audio or video. Below is the same 8* 8-pixel grid with individual pixels replaced by columns of one color. This is what happens when you compress a JPEG image.C:\Users\chirag jain\Pictures\Screenshots\Screenshot (192).png

  

Though the image is a little rough the size is reduced to 1/3rd of the original. Thus, lossy compression may not be a perfect option but it serves the purpose of storing and transferring important data in less space.C:\Users\chirag jain\Pictures\Screenshots\Screenshot (193).png


The Weissman Score

Sadly, there is no term as Weismann score in reality. It is completely a fictional term made just for the show. Similar to like Dothraki and High Valyrian languages in HBO’s show Game of Thrones. Fictionally, it is an efficiency metric for lossless compression applications. The show hired two technical advisers to devise such a term. One was Tsachy Weissman, a professor at Stanford University, and the other was Vinith Mishra, a graduate student. It compares both the required time and compression ratio of measured applications, with those of a de facto standard according to the data type.


The formula is the following;

Here is the meaning of each term,Compression – The New Bits

  • Alpha is a scaling constant.

  • r is the compression ratio

  • T is the time required for compressing

  • r and T (overlined ones) represent the same metrics but for a standard compressor.

Even though the term is fictional many attempts are being made to bring it to reality. One such article on this is published by IEEE. I have linked that down below. I sincerely hope that you learned something interesting and knowledgeable. You can also check out an amazing article by me which is based on understanding the chemistry of making methamphetamine from Breaking Bad with this link.


A Fictional compression metrics moves to the real world

https://spectrum.ieee.org/view-from-the-valley/computing/software/a-madefortv-compression-metric-moves-to-the-real-world

Middle-Out Compression Algorithm

 https://techcrunch.com/2016/07/14/dropboxs-lepton-lossless-image-compression-really-uses-a-middle-out-algorithm/#:~:text=out'%20algorithm%20%7C%20TechCrunch-,Dropbox's%20Lepton%20lossless%20image%20compression,a%20'middle%2Dout'%20algorithm&text=The%20middle%2Dout%20bit%20comes,the%20decoding%20is%20already%20done.


-Submitted by Chirag Jain, via CollegeTime


Please Login to Comment






COMMENTS