stochastic computing

Stochastic computing is a collection of techniques that represent continuous values by streams of random bits. Complex computations can then be computed by simple bit-wise operations on the streams. Stochastic computing is distinct from the study of randomized algorithms.

Motivation and a simple example

Suppose that $p,q \in [0,1]$ is given, and we wish to compute $p \times q$ . Stochastic computing performs this operation using probability instead of arithmetic.

Specifically, suppose that there are two random, independent bit streams called stochastic numbers (i.e. Bernoulli processes), where the probability of a 1 in the first stream is $p$ , and the probability in the second stream is $q$ . We can take the logical AND of the two streams.

class="wikitable" style="text-align:left;" ! $a_i$ \| 1	0	1	1	0	1	...
$b_i$ \| 1 \|\| 1 \|\| 0 \|\| 1 \|\| 1 \|\| 0 \|\| ...
$a_i \land b_i$ \| 1 \|\| 0 \|\| 0 \|\| 1 \|\| 0 \|\| 0 \|\| ...

class="wikitable" style="text-align:left;"

! $a_i$

| 1

...

b_i

| 1 || 1 || 0 || 1 || 1 || 0 || ...

a_i \land b_i

| 1 || 0 || 0 || 1 || 0 || 0 || ...

The probability of a 1 in the output stream is $pq$ . By observing enough output bits and measuring the frequency of 1s, it is possible to estimate $pq$ to arbitrary accuracy.

The operation above converts a fairly complicated computation (multiplication of $p$ and $q$ ) into a series of very simple operations (evaluation of $a_i \land b_i$ ) on random bits.

To put in another perspective, assuming the truth table of an AND gate. Conventional interpretation is that the output is true if and only if input A and B are true. However, if the table is interpreted vertically, (0011) AND (0101) is (0001), i.e., 1/2 x 1/2 = 1/4, which is exactly an arithmetic multiplication. As the information is presented in probability distribution, probability multiplication is literally an AND operation.

class="wikitable"

!Out

More generally speaking, stochastic computing represents numbers as streams of random bits and reconstructs numbers by calculating frequencies. The computations are performed on the streams and translate complicated operations on $p$ and $q$ into simple operations on their stream representations. (Because of the method of reconstruction, devices that perform these operations are sometimes called stochastic averaging processors.) In modern terms, stochastic computing can be viewed as an interpretation of calculations in probabilistic terms, which are then evaluated with a Gibbs sampler. It can also be interpreted as a hybrid analog/digital computer.

History

Image:RASCEL stochastic computer 1969.png

Stochastic computing was first introduced in a pioneering paper by John von Neumann in 1953.{{cite conference |last = von Neumann

|first = J.

|title = Probabilistic logics and the synthesis of reliable organisms from unreliable components

|book-title = The Collected Works of John von Neumann

|publisher = Macmillan

|year = 1963

|isbn = 978-0-393-05169-8

}} However, the

theory could not be fully developed until advances in computing of the 1960s,{{cite conference

| last1 = Petrovic | first1= R. | last2=Siljak | first2=D. | title=Multiplication by means of coincidence | year = 1962 | book-title =ACTES Proc. of 3rd Int. Analog Comp. Meeting |url=https://books.google.com/books?id=94BQAAAAMAAJ}}

{{citation

| last=Afuso

| first=C.

| title=Quart. Tech. Prog. Rept.

| location=Department of Computer Science, University of Illinois, Urbana, Illinois

| year=1964

}}

mostly through a series of simultaneous and parallel efforts in the US

{{cite book

| last1=Poppelbaum

| first1=W.

| last2=Afuso

| first2=C.

| last3=Esch

| first3=J.

| title=Proceedings of the November 14-16, 1967, fall joint computer conference on - AFIPS '67 (Fall)

| chapter=Stochastic computing elements and systems

| volume=31

| pages=635–644 |doi=10.1145/1465611.1465696 |isbn=9781450378963

| year=1967

| s2cid=8504153

}}

and the UK.

{{cite book

| last=Gaines

| first=B.

| title=Proceedings of the April 18-20, 1967, spring joint computer conference on - AFIPS '67 (Spring)

| chapter=Stochastic computing

| year=1967

| volume=30

| pages=149–156 |doi=10.1145/1465482.1465505 |isbn=9781450378956

| s2cid=832296

}}

By the late 1960s, attention turned to the design of

special-purpose hardware to perform stochastic computation. A host

{{cite book

| last1=Mars

| first1=P.

| last2=Poppelbaum

| first2=W.

| title=Stochastic and deterministic averaging processors

| year=1981 |publisher=P. Peregrinus |isbn=978-0-906048-44-3

}}

of these machines were constructed between 1969 and 1974; RASCEL

{{cite thesis

| last=Esch

| first=John W.

| title=RASCEL, a programmable analog computer based on a regular array of stochastic computing element logic

| year=1969 |id=AAI700084 |type=PhD |url=https://dl.acm.org/doi/book/10.5555/904878

| publisher=University of Illinois, Urbana, Illinois

}}

is pictured in this article.

Despite the intense interest in the 1960s and 1970s, stochastic

computing ultimately failed to compete with more traditional digital

logic, for reasons outlined below. The first (and last)

International Symposium on Stochastic Computing

{{cite conference

| title=Proceedings of the first International Symposium on Stochastic Computing and its Applications

| location= Toulouse, France

| year=1978 |oclc=499229066

}}

took place in 1978; active research in the area dwindled over the next

few years.

Although stochastic computing declined as a general method of

computing, it has shown promise in several applications. Research has

traditionally focused on certain tasks in machine learning and

control.

{{cite conference

| book-title=Advances in Information Systems Science

| title=Stochastic Computing Systems

| last=Gaines

| first= B. R.

| editor-last=Tou

| editor-first=Julius

| volume=2

| orig-year=1969 |publisher=Springer |year=2013 |isbn=9781489958433

}}

{{cite book

|s2cid=14929278 }}

Somewhat recently, interest has turned towards stochastic

decoding, which applies stochastic computing to the decoding of error

correcting codes.

{{cite journal

| title=Iterative decoding using stochastic computation

| last1=Gaudet

| first1=Vincent

| last2=Rapley

| first2=Anthony

| journal=Electronics Letters

| volume=39

| number=3

| pages=299–301

|date=February 2003

| doi=10.1049/el:20030217

| bibcode=2003ElL....39..299G

}}

More recently, stochastic circuits have been successfully used in image processing tasks such as edge detection

{{Cite book | last1 = Alaghi | first1 = A. | last2 = Li | first2 = C. | last3 = Hayes | first3 = J. P. | doi = 10.1145/2463209.2488901 | chapter = Stochastic circuits for real-time image-processing applications | title = Proceedings of the 50th Annual Design Automation Conference on - DAC '13 | pages = 1 | year = 2013 | isbn = 9781450320719 | s2cid = 18174415 }} and image thresholding.{{Cite journal | last1 = Najafi| first1 = M. H. | last2 = Salehi | first2 = M. E. | title = A Fast Fault-Tolerant Architecture for Sauvola Local Image Thresholding Algorithm Using Stochastic Computing | journal = IEEE Transactions on Very Large Scale Integration (VLSI) Systems | doi = 10.1109/TVLSI.2015.2415932 | volume = 24 | issue = 2 | pages = 808–812 | year = 2016 | s2cid = 6591306 }} Recent advancement in stochastic circuits also shows promising speed and energy efficiency advantages in artificial intelligence (AI) hardware acceleration on edge computing.

Strengths and weaknesses

Although stochastic computing was a historical failure, it may still remain relevant for

solving certain problems. To understand when it remains relevant, it is useful to

compare stochastic computing with more traditional methods of digital computing.

= Strengths =

Suppose we wish to multiply

two numbers each with $n$ bits of precision.

Using the typical [[Multiplication_algorithm#Long_multiplication|long

multiplication]] method, we need to perform

$n^2$ operations. With stochastic computing, we can

AND together any number of bits and the expected value will always

be correct. (However, with a small number of samples the variance will

render the actual result highly inaccurate).

Moreover, the underlying operations in a digital multiplier are

full adders, whereas a stochastic

computer only requires an AND gate. Additionally,

a digital multiplier would naively require $2n$ input wires,

whereas a stochastic multiplier would only require two input wires{{citation needed|date=October 2014}}.

(If the digital multiplier serialized its output, however, it would also

require only two input wires.)

Additionally, stochastic computing is robust against noise; if a few

bits in a stream are flipped, those errors will have no significant impact

on the solution.

Furthermore, stochastic computing elements can tolerate skew in the arrival time of the inputs.

Circuits work properly even when the inputs are misaligned temporally. As a result, stochastic

systems can be designed to work with inexpensive locally generated clocks instead of using a global clock and

an expensive clock distribution network.{{Cite book | last1 = Najafi | first1 = M. H. | last2 = Lilja | first2 = D. J. | last3 = Riedel| first3 = M. D. | last4 = Bazargan | first4 = K. | title = 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC) | chapter = Polysynchronous stochastic circuits | doi = 10.1109/ASPDAC.2016.7428060 | pages = 492–498 | year = 2016 | isbn = 978-1-4673-9569-4 | s2cid = 8973285 }}

Finally, stochastic computing provides an estimate of the solution

that grows more accurate as we extend the bit stream. In particular,

it provides a rough estimate very rapidly. This property is usually

referred to as progressive precision, which suggests that the precision

of stochastic numbers (bit streams) increases as computation proceeds.

{{Cite journal | last1 = Alaghi | first1 = A. | last2 = Hayes | first2 = J. P. | doi = 10.1145/2465787.2465794 | title = Survey of Stochastic Computing | journal = ACM Transactions on Embedded Computing Systems | volume = 12 | issue = 2s | pages = 1 | year = 2013 | citeseerx = 10.1.1.296.4448 | s2cid = 4689958 }}

It is as if the most significant bits of the number

arrive before its least significant bits; unlike the

conventional arithmetic circuits where the most

significant bits usually arrive last. In some

iterative systems the partial solutions obtained through progressive precision can provide faster feedback

than through traditional computing methods, leading to faster

convergence.

= Weaknesses =

Stochastic computing is, by its very nature, random. When we examine

a random bit stream and try to reconstruct the underlying value, the effective precision

can be measured by the variance of our sample. In the example above, the digital multiplier

computes a number to $2n$ bits of accuracy, so the

precision is $2^{-2n}$ . If we are using a random bit

stream to estimate a number and want the standard deviation of our

estimate of the solution to be at least $2^{-2n}$ , we

would need $O(2^{4n})$ samples. This represents an

exponential increase in work. In certain applications, however, the

progressive precision property of stochastic computing can be exploited

to compensate this exponential loss.

Second, stochastic computing requires a method of generating random

biased bit streams. In practice, these streams are generated with

pseudo-random number generators. Unfortunately, generating

(pseudo-)random bits is fairly costly (compared to the expense of,

e.g., a full adder). Therefore, the gate-level advantage of

stochastic computing is typically lost.

Third, the analysis of stochastic computing assumes that the bit

streams are independent (uncorrelated). If this assumption does not

hold, stochastic computing can fail dramatically. For instance, if we

try to compute $p^2$ by multiplying a bit stream for

$p$ by itself, the process fails: since $a_i \land
a_i=a_i$ , the stochastic computation would yield $p \times p
= p$ , which is not generally true (unless $p=$ 0 or 1).

In systems with feedback, the problem of decorrelation can manifest in

more complicated ways. Systems of stochastic processors are prone to

latching, where feedback between different components can achieve

a deadlocked state.

{{cite book

| last1=Winstead

| first1=C.

| last2=Rapley

| first2=A.

| last3=Gaudet

| first3=V.

| last4=Schlegel

| first4=C.

| title=Proceedings. International Symposium on Information Theory, 2005. ISIT 2005

| chapter=Stochastic iterative decoders

| location=Adelaide Australia

|date=September 2005

| pages=1116–1120

| arxiv=cs/0501090 |doi=10.1109/ISIT.2005.1523513 |isbn=0-7803-9151-9

| s2cid=16390484

}}

A great deal of effort must be spent decorrelating the system to

attempt to remediate latching.

Fourth, although some digital functions have very simple stochastic

counterparts (such as the translation between multiplication and the

AND gate), many do not. Trying to express these functions stochastically

may cause various pathologies. For instance, stochastic decoding requires

the computation of the function $f(p,q)\rightarrow pq/(pq + (1-p)(1-q))$ .

There is no single bit operation that can compute this function; the usual solution

involves producing correlated output bits, which, as we have seen above, can cause

a host of problems.

Other functions (such as the averaging operator $f(p,q)\rightarrow (p+q)/2$ require

either stream decimation or inflation. Tradeoffs between precision and memory

can be challenging.

Stochastic decoding

Although stochastic computing has a number of defects when considered

as a method of general computation, there are certain applications

that highlight its strengths. One notable case occurs in the

decoding of certain error correcting codes.

In developments unrelated to stochastic computing, highly effective

methods of decoding LDPC codes using

the belief propagation algorithm were

developed. Belief propagation in this context involves iteratively

reestimating certain parameters using two basic operations

(essentially, a probabilistic XOR operation and an averaging

operation).

In 2003, researchers realized that these two operations could be

modeled very simply with stochastic computing.

{{cite journal

| title=Iterative decoding using stochastic computation

| last1=Gaudet

| first1=Vincent

| last2=Rapley

| first2=Anthony

| journal=Electronics Letters

| volume=39

| number=3

| pages=299–301

|date=February 2003

| doi=10.1049/el:20030217

| bibcode=2003ElL....39..299G

}}

Moreover, since the

belief propagation algorithm is iterative, stochastic computing provides partial

solutions that may lead to faster convergence.

Hardware implementations of stochastic decoders have been built on FPGAs.

{{cite conference

| title=Stochastic implementation of LDPC decoders

| last1=Gross

| first1=W.

| last2=Gaudet

| first2=V.

| last3=Milner

| first3=A.

| book-title=Conference Record of the Thirty-Ninth Asilomar Conference on Signals, Systems and Computers

| year=2006

}}

The proponents of these methods argue that the performance of stochastic decoding is

competitive with digital alternatives.

Deterministic Methods to Stochastic Computing

Deterministic methods of SC has been developed to perform completely accurate computation with SC circuits.{{Cite journal|last1=Najafi|first1=M. Hassan|last2=Jenson|first2=Devon|last3=Lilja|first3=David J.|last4=Riedel|first4=Marc D.|date=December 2019|title=Performing Stochastic Computation Deterministically|journal=IEEE Transactions on Very Large Scale Integration (VLSI) Systems|volume=27|issue=12|pages=2925–2938|doi=10.1109/tvlsi.2019.2929354|s2cid=201888463|issn=1063-8210|doi-access=free}} The essential principle of these methods is that every bit of one bit-streams interacts with every bit of the other bit-streams exactly once. To produce completely accurate result with these methods, the operation must run for the product of the length of input bit-streams. Deterministic methods are developed based on unary bit-streams,{{Cite book|last1=Jenson|first1=Devon|last2=Riedel|first2=Marc|title=Proceedings of the 35th International Conference on Computer-Aided Design |chapter=A deterministic approach to stochastic computation |date=2016-11-07|pages=1–8|location=New York, NY, USA|publisher=ACM|doi=10.1145/2966986.2966988|isbn=978-1-4503-4466-1|s2cid=11281124}}{{Cite journal|last1=Najafi|first1=M. Hassan|last2=Jamali-Zavareh|first2=Shiva|last3=Lilja|first3=David J.|last4=Riedel|first4=Marc D.|last5=Bazargan|first5=Kia|last6=Harjani|first6=Ramesh|date=May 2017|title=Time-Encoded Values for Highly Efficient Stochastic Circuits|journal=IEEE Transactions on Very Large Scale Integration (VLSI) Systems|volume=25|issue=5|pages=1644–1657|doi=10.1109/tvlsi.2016.2645902|s2cid=5672761|issn=1063-8210|doi-access=free}} pseudo-random bit-streams,{{Cite journal|last1=Najafi|first1=M. Hassan|last2=Lilja|first2=David|date=2018|title=High Quality Down-Sampling for Deterministic Approaches to Stochastic Computing|journal=IEEE Transactions on Emerging Topics in Computing|volume=9|pages=7–14|doi=10.1109/tetc.2017.2789243|issn=2168-6750|doi-access=free}} and low-discrepancy bit-streams.{{Cite book|last1=Najafi|first1=M. Hassan|last2=Lilja|first2=David J.|last3=Riedel|first3=Marc|title=Proceedings of the International Conference on Computer-Aided Design |chapter=Deterministic methods for stochastic computing using low-discrepancy sequences |date=2018-11-05|pages=1–8|location=New York, NY, USA|publisher=ACM|doi=10.1145/3240765.3240797|isbn=978-1-4503-5950-4|s2cid=53236540|doi-access=free}}

Variants of stochastic computing

There are a number of variants of the basic stochastic computing

paradigm. Further information can be found in the referenced book by

Mars and Poppelbaum.

Bundle Processing involves sending a fixed number of

bits instead of a stream. One of the advantages of this approach is

that the precision is improved. To see why, suppose we transmit

$s$ bits. In regular stochastic computing, we can

represent a precision of roughly $O(1/\sqrt{s})$ different

values, because of the variance of the estimate. In bundle

processing, we can represent a precision of $1/s$ .

However, bundle processing retains the same robustness to error of

regular stochastic processing.

Ergodic Processing involves sending a stream of bundles, which

captures the benefits of regular stochastic and bundle processing.

Burst Processing encodes a number by a higher base increasing

stream. For instance, we would encode 4.3 with ten decimal digits as

::: 4444444555

since the average value of the preceding stream is 4.3. This

representation offers various advantages: there is no randomization

since the numbers appear in increasing order,

so the PRNG issues are avoided, but many of the advantages of

stochastic computing are retained (such as partial estimates of the

solution). Additionally, it retains the linear precision of bundle

and ergodic processing.

stochastic computing

Motivation and a simple example

History

Strengths and weaknesses

= Strengths =

= Weaknesses =

Stochastic decoding

Deterministic Methods to Stochastic Computing

Variants of stochastic computing

See also

References

Further reading