Weissman score

{{Short description|Performance metric for lossless compression}}

The Weissman score is a performance metric for lossless compression applications. It was developed by Tsachy Weissman, a professor at Stanford University, and Vinith Misra, a graduate student, at the request of producers for HBO's television series Silicon Valley, a television show about a fictional tech start-up working on a data compression algorithm.{{Cite web|url=https://spectrum.ieee.org/a-madefortv-compression-metric-moves-to-the-real-world|title=A Fictional Compression Metric Moves Into the Real World|last=Perry|first=Tekla|date=July 28, 2014|access-date=January 25, 2016}}{{cite web|url=https://spectrum.ieee.org/a-madefortv-compression-algorithm|title=A Made-For-TV Compression Algorithm|last=Perry|first=Tekla|date=July 25, 2014|access-date=January 25, 2016}}{{cite web|url=http://www.hollywoodreporter.com/live-feed/hbos-silicon-valley-tech-advisor-695513|title=HBO's 'Silicon Valley' Tech Advisor on Realism, Possible Elon Musk Cameo|last=Sandberg|first=Elise|date=April 12, 2014|work=The Hollywood Reporter|accessdate=June 10, 2014}}{{cite web|url=https://www.wsj.com/articles/SB10001424052702303987004579479244213599118|title=There's a New Geek in Town: HBO's 'Silicon Valley'|last1=Jurgensen|first1=John|last2=Rusli|first2=Evelyn M.|date=April 3, 2014|work=The Wall Street Journal|accessdate=June 10, 2014}} It compares both required time and compression ratio of measured applications, with those of a de facto standard according to the data type.

The formula is the following; where r is the compression ratio, T is the time required to compress, the overlined ones are the same metrics for a standard compressor, and alpha is a scaling constant.

W = \alpha {r \over \overline{r}} {\log{\overline{T}} \over \log{T}}

The Weissman score has been used by Daniel Reiter Horn and Mehant Baid of Dropbox to explain real-world work on lossless compression. According to the authors it "favors compression speed over ratio in most cases."{{Cite web |title=Lossless compression with Brotli in Rust for a bit of Pied Piper on the backend |url=https://dropbox.tech/infrastructure/lossless-compression-with-brotli |access-date=2017-06-24 |website=Dropbox Tech Blog |language=en}}

Example

This example shows the score for the data of the Hutter Prize,{{Cite web|last=Hutter|first=Marcus|title=Contestants|date=July 2016|url=http://prize.hutter1.net/#contestants|accessdate=January 25, 2016}} using the paq8f as a standard and 1 as the scaling constant.

class="wikitable sortable"

!Application

!Compression ratio

!Compression time [min]

!Weissman score

paq8f

|5.467600

|300

|1.000000

raq8g

|5.514990

|420

|0.720477

paq8hkcc

|5.682593

|300

|1.039321

paq8hp1

|5.692566

|300

|1.041145

paq8hp2

|5.750279

|300

|1.051701

paq8hp3

|5.800033

|300

|1.060801

paq8hp4

|5.868829

|300

|1.073826

paq8hp5

|5.917719

|300

|1.082325

paq8hp6

|5.976643

|300

|1.093102

paq8hp12

|6.104276

|540

|0.620247

decomp8

|6.261574

|540

|0.63623

decomp8

|6.276295

|540

|0.637726

Limitations

Although the value is relative to the standards against which it is compared, the unit used to measure the times changes the score (see examples 1 and 2). This is a consequence of the requirement that the argument of the logarithmic function must be dimensionless. The multiplier also can't have a numeric value of 1 or less, because the logarithm of 1 is 0 (examples 3 and 4), and the logarithm of any value less than 1 is negative (examples 5 and 6); that would result in scores of value 0 (even with changes), undefined, or negative (even if better than positive).

class="wikitable"

!rowspan="2"|#

colspan="3"|Standard compressorcolspan="3"|Scored compressorrowspan="2"|Weissman scorerowspan="2"|Observations
Compression ratio||Compression time||Log (compression time)||Compression ratio||Compression time||Log (compression time)
1

|2.1||2 min||0.30103||3.4||3 min||0.477121||1×{{sfrac|3.4|2.1}}×{{sfrac|0.30103|0.477121}}=1.021506

|rowspan="2"|Change in unit or scale, changes the result.

2

|2.1||120 sec||2.079181||3.4||180 sec||2.255273||1×{{sfrac|3.4|2.1}}×{{sfrac|2.079181|2.255273}}=1.492632

3

|2.2||1 min||0||3.3||1.5 min||0.176091||1×{{sfrac|3.3|2.2}}×{{sfrac|0|0.176091}}=0

|rowspan="2"|If time is 1, its log is 0; then the score can be 0 or ±.

4

|2.2||0.667 min||−0.176091||3.3||1 min||0||1×{{sfrac|3.3|2.2}}×{{sfrac|−0.176091|0}}=−

5

|1.6||0.5 h||−0.30103||2.9||1.1 h||0.041393||1×{{sfrac|2.9|1.6}}×{{sfrac|−0.30103|0.041393}}=−13.18138

|rowspan="2"|If time is less than 1, its log is negative; then the score can be negative.

6

|1.6||1.1 h||0.041393||1.6||0.9 h||−0.045757||1×{{sfrac|1.6|1.6}}×{{sfrac|0.041393|−0.045757}}=−0.904627

See also

References