• Home
  • Textbooks
  • Probability and Statistics with R
  • A Brief Introduction to S

Probability and Statistics with R

Maria Dolores Ugarte, Ana F. Militino, Alan T. Arnholt

Chapter 1

A Brief Introduction to S - all with Video Answers

Educators


Chapter Questions

01:15

Problem 1

Calculate the following numerical results to three decimal places with S:
(a) $(7-8)+5^3-5 \div 6+\sqrt{62}$
(b) $\ln 3+\sqrt{2} \sin (\pi)-e^3$
(c) $2 \times(5+3)-\sqrt{6}+9^2$
(d) $\ln (5)-\exp (2)+2^3$
(e) $(9 \div 2) \times 4-\sqrt{10}+\ln (6)-\exp (1)$

James Kiss
James Kiss
Numerade Educator
01:24

Problem 2

Create a vector named countby 5 that is a sequence of 5 to 100 in steps of 5 .

Anas Venkitta
Anas Venkitta
Numerade Educator
00:37

Problem 3

Create a vector named Treatment with the entries "Treatment One" appearing 20 times, "Treatment Two" appearing 18 times, and "Treatment Three" appearing 22 times.

Samantha Lincroft
Samantha Lincroft
Numerade Educator
01:47

Problem 4

Provide the missing values in $\operatorname{rep}\left(\operatorname{seq}\left(\__{-}, \ldots\right), \ldots\right.$ ) to create the sequence 20 , $15,15,10,10,10,5,5,5,5$.

Yujie Wang
Yujie Wang
College of San Mateo
02:46

Problem 5

Vectors, sequences, and logical operators
(a) Assign the names $\mathrm{x}$ and $\mathrm{y}$ to the values 5 and 7 , respectively. Find $x^y$ and assign the result to $z$. What is the valued stored in $z$ ?
(b) Create the vectors $\mathrm{u}=(1,2,5,4)$ and $\mathrm{v}=(2,2,1,1)$ using the $\mathrm{c}()$ and $\operatorname{scan}()$ functions.
(c) Provide $\mathrm{S}$ code to find which component of $\mathrm{u}$ is equal to 5 .
(d) Provide $\mathrm{S}$ code to give the components of $\mathrm{v}$ greater than or equal to 2.
(e) Find the product $\mathrm{u} \times \mathrm{v}$. How does $\mathrm{S}$ perform the operation?
(f) Explain what $\mathrm{S}$ does when two vectors of unequal length are multiplied together. Specifically, what is $\mathrm{u} \times \mathrm{c}(\mathrm{u}, \mathrm{v})$ ?
(g) Provide $\mathrm{S}$ code to define a sequence from 1 to 10 called $\mathrm{G}$ and subsequently to select the first three components of G.
(h) Use $\mathrm{S}$ to define a sequence from 1 to 30 named $\mathrm{J}$ with an increment of 2 and subsequently to choose the first, third, and eighth values of $\mathrm{J}$.
(i) Calculate the scalar product (dot product) of $q=(3,0,1,6)$ by $r=(1,0,2,4)$.
(j) Define the matrix $\mathbf{X}$ whose rows are the $\mathbf{u}$ and $\mathrm{v}$ vectors from part (b).
(k) Define the matrix $\mathbf{Y}$ whose columns are the $\mathbf{u}$ and $\mathrm{v}$ vectors from part (b).
(l) Find the matrix product of $\mathbf{X}$ by $\mathbf{Y}$ and name it $\mathbf{W}$.
(m) Provide $\mathrm{S}$ code that computes the inverse matrix of $\mathbf{W}$ and the transpose of that inverse.

Varsha Aggarwal
Varsha Aggarwal
Numerade Educator
09:15

Problem 6

Wheat harvested surface in Spain in 2004: Figure 1.4 on the next page, made with $\mathrm{R}$, depicts the autonomous communities in Spain. The Wheat Table that follows gives the wheat harvested surfaces in 2004 by autonomous communities in Spain measured in hectares. Provide $\mathrm{S}$ code to answer all the questions.
figure cant copy
$$
\begin{array}{|rr|rr|}
\hline {}{}{\text { Wheat Table }} \\
\hline \text { community } & \text { wheat.surface } & \text { community } & \text { wheat.surface } \\
\hline \text { Galicia } & 18817 & \text { Castilla y León } & 619858 \\
\text { Asturias } & 65 & \text { Madrid } & 13118 \\
\text { Cantabria } & 440 & \text { Castilla-La Mancha } & 263424 \\
\text { País Vasco } & 25143 & \text { C. Valenciana } & 6111 \\
\text { Navarra } & 66326 & \text { Región de Murcia } & 9500 \\
\text { La Rioja } & 34214 & \text { Extremadura } & 143250 \\
\text { Aragón } & 311479 & \text { Andalucía } & 558292 \\
\text { Cataluña } & 74206 & \text { Islas Canarias } & 100 \\
\text { Islas Baleares } & 7203 & & \\
\hline
\end{array}
$$
(a) Create the variables community and wheat. surface from the Wheat Table in this problem. Store both variables in a data.frame named wheatspain.
(b) Find the maximum, the minimum, and the range for the variable wheat.surface.
(c) Which community has the largest harvested wheat surface?
(d) Sort the autonomous communities by harvested surface in ascending order.
(e) Sort the autonomous communities by harvested surfaces in descending order.
(f) Create a new file called wheat.c where Asturias has been removed.
(g) Add Asturias back to the file wheat.c.
(h) Create in wheat.c a new variable called acre indicating the harvested surface in acres ( 1 acre $=0.40468564224$ hectares).
(i) What is the total harvested surface in hectares and in acres in Spain in 2004?
(j) Define in wheat.c the row names () using the names of the communities. Remove the community variable from wheat.c.
(k) What percent of the autonomous communities have a harvested wheat surface greater than the mean wheat surface area?
(1) Sort wheat.c by autonomous communities' names (row. names ()).
(m) Determine the communities with less than 40,000 acres of harvested surface and find their total harvested surface in hectares and acres.
(n) Create a new file called wheat. sum where the autonomous communities that have less than 40,000 acres of harvested surface have their actual names replaced by "less than $40,000 . "$
(o) Use the function dump() on wheat.c, storing the results in a new file named wheat.txt. Remove wheat.c from your path and check that you can recover it from wheat.txt.
(p) Create a text file called wheat.dat from the wheat.sum file using the command write.table(). Explain the differences between wheat.txt and wheat.dat.
(q) Use the command read.table() to read the file wheat. dat.

Victor Salazar
Victor Salazar
Numerade Educator

Problem 7

The data frame wheatUSA2004 from the PASWR package has the USA wheat harvested crop surfaces in 2004 by states. It has two variables, STATE for the state and ACRES for thousands of acres.
(a) Attach the data frame wheatUSA2004 and use the function row names () to define the states as the row names.
(b) Define a new variable called ha for the surface area given in hectares where 1 acre $=0.40468564224$ hectares.
(c) Sort the file according to the harvested surface area in acres.
(d) Which states fall in the top $10 \%$ of states for harvested surface area?
(e) Save the contents of wheatUSA2004 in a new file called wheatUSA,txt in your favorite directory. Then, remove wheatUSA2004 from your workspace, and check that the contents of wheatUSA2004 can be recovered from wheatUSA .txt.
(f) Use the command write.table() to store the contents of wheatUSA2004 in a file with the name wheatUSA.dat. Explain the differences between storing wheatUSA2004 using dump() and using write.table().
(g) Find the total harvested surface area in acres for the bottom $10 \%$ of the states.

Check back soon!
02:02

Problem 8

Use the data frame vit2005 in the PASWR package, which contains data on the 218 used flats sold in Vitoria (Spain) in 2005 to answer the following questions. A description of the variables can be obtained from the help file for this data frame.
(a) Create a table of the number of flats according to the number of garages.
(b) Find the mean of totalprice according to the number of garages.
(c) Create a frequency table of flats using the categories: number of garages and number of elevators.
(d) Find the mean flat price (total price) for each of the cells of the table created in part (c).
(e) What command will select only the flats having at least one garage?
(f) Define a new file called data.c with the flats that have category $=$ " $3 \mathrm{~B}$ " and have an elevator.
(g) Find the mean of totalprice and the mean of area using the information in data.c.

Jorge Villanueva
Jorge Villanueva
Numerade Educator
05:07

Problem 9

Use the data frame EPIDURALf to answer the following questions:
(a) How many patients have been treated with the Hamstring Stretch?
(b) What proportion of the patients treated with Hamstring Stretch were classified as each of Easy, Difficult, and Impossible?
(c) What proportion of the patients classified as Easy to palpate were assigned to the Traditional Sitting position?
(d) What is the mean weight for each cell in a contingency table created with the variables Ease and Treatment?
(e) What proportion of the patients have a body mass index (BMI $\left.=\mathrm{kg} /(\mathrm{cm} / 100)^2\right)$ less than 25 and are classified as Easy to palpate?

Samuel Goyette
Samuel Goyette
Numerade Educator
00:12

Problem 10

The millions of tourists visiting Spain in 2003, 2004, and 2005 according their nationalities are given in the following table:
$$
\begin{array}{|l|rrr|}
\hline \text { Nationality } & 2003 & 2004 & 2005 \\
\hline \text { Germany } & 9.303 & 9.536 & 9.918 \\
\text { France } & 7.959 & 7.736 & 8.875 \\
\text { Great Britain } & 15.224 & 15.629 & 16.090 \\
\text { USA } & 0.905 & 0.894 & 0.883 \\
\text { Rest of the world } & 17.463 & 18.635 & 20.148 \\
\hline
\end{array}
$$
(a) Store the values in this table in a matrix with the name tourists.
(b) Calculate the totals of the rows.
(c) Calculate the totals of the columns.

James Kiss
James Kiss
Numerade Educator
00:34

Problem 11

Use a for loop to convert a sequence of temperatures (18 to 28 by 2 ) from degrees centigrade to degrees Fahrenheit.

Emily Himsel
Emily Himsel
Numerade Educator
04:05

Problem 12

If $1 \mathrm{~km}=0.6214$ miles, 1 hectare $=2.471$ acres, and $1 \mathrm{~L}=0.22$ gallons, write a function that converts kilometers, hectares, and liters into miles, acres, and gallons, respectively. Use the function to convert $10.2 \mathrm{~km}, 22.4$ hectares, and $13.5 \mathrm{~L}$.

Zachary Warner
Zachary Warner
Numerade Educator