##================================================================
##
##
## ---- SOME BASICS: HISTORY, NUMBERS, OPERATIONS, FUNCTIONS ----
##
##
## WHY R?
##
## . We need standards -- R is one of them.
## . Huge developer community
## . New stats algorithms appear first as R packages.
## . Growing user community, also in industry
## . Powerful language -- quite different from Python
## . Probably the best language for 'one-shot programming'
## (which comprises much of data analysis)
##
## HISTORY OF R: C --> S --> R
##
## . Where: Bell Labs (C, S)
## . Related: Unix --> Linux, MacOS
##
## R AND OTHER LANGUAGES:
## . R, Python, Matlab are ...
## . C, C++, Java, Fortran are ...
##
## GETTING RSTUDIO:
##
## . Search and install,
## . or see the syllabus for the URL or search 'RStudio'.
## ... right now!
## https://www.rstudio.com/products/rstudio/download/
##
## . NOTE! The instructor does not use RStudio.
## He uses the so-called 'emacs' environment.
## He does not recommend that you use it because
## its learning curve too steep.
##
## GETTING THESE NOTES:
##
## . For now, find the instructor's webpage (search 'buja wharton').
## In the Section on Stat 470/503/770 click on "Chapter 1".
##
## ONLINE SOURCES FOR LEARNING R:
##
## . The 'swirl' package in R
## . Videos on 'youtube'
## . Coursera offerings by Johns Hopkins biostatisticians
## . ...
##
##
##================================================================
##
##
## SOME ANSWERS TO NATURAL QUESTIONS:
##
## - The basic work cylce in RStudio:
##
## . Edit an R code file with extension ".R" in the upper left
## pane of RStudio (Files > Open File > .... or ctrl-O)
## The present chapter file is indeed an R code file,
## but it also contains a lot of non-executable text called 'comments'.
##
## . Copy/paste code lines from the editor pane (upper left pane)
## into the R console (lower left pane) and execute them.
## Instructions for doing this efficiently are given below.
##
## - Q: Why is this file a crude text file?
## What do the hash marks do at the beginning of the lines?
##
## A: All R code files are essentially txt files without formatting.
## The .R extension tells RStudio and some editors to use
## syntax highlighting.
##
## Any content in a line that follows a hash sign is
## NOT interpreted as code but as mere comment.
## If you type or copy lines with hashes at the beginning
## into an R interpreter/console, nothing gets computed, just copied back.
## Anything before a hash is interpreted as code,
## and when typed or copied into an R interpreter/console,
## R will try to compute something.
## AYT? What is code and what is comment in the following line?
10+20 # Some calculation...
## Syntax highlighting shows the difference between code and comment.
## The color scheme in RStudio will be different from the instructor's.
## You can choose your own scheme as follows:
## Tools > Global Options... > Appearance
## Then play with fonts, font sizes and editor themes.
## Finally, click "Apply" or "Cancel".
##
## - Q: Isn't there a more convenient way to copy/paste a line of code
## into the R interpreter/console?
##
## A: There is!
## Place the pointer on the code line and hit
## ---------------------
## | Ctrl-Enter | Windows !!!!!!!!!!!!!!!
## | Command-Enter | MacOS !!!!!!!!!!!!!!!
## ---------------------
## This copies the line into the R console and
## moves the pointer to the next line.
## Example: Copy the following line into the R console.
10 + 20
## If you wish to execute multiple consecutive lines,
## simply hit Ctrl/Command-Enter multiple times.
##
## - Q: And where are the solutions to '...'?
##
## A: The text below has no answers to the questions/problems.
## We give the solutions here in class.
## If you miss a class, get your answers from a fellow student or TA.
## Knowing how to get human help is a fundamental skill in life.
## [No, solutions to '...' will NOT be posted ever.]
##
##
##================================================================
##
##
## WHAT YOU CAN DO WITH R RIGHT AWAY:
##
## - Use R as a pocket calculator using math notation.
## (Every R class starts this way.)
##
##
## EXAMPLES:
##
## - What is 1.25% interest on a $1,213.85 bank account balance?
...
##
## . Syntax Rule: Make sure to omit the decimal commas in R!
## Commas in numbers are illegal.
##
## . Strange: What is the meaning of '[1]'?
## Reason: R considers a single number as a vector of length 1.
## ==> All numbers are collected in vectors.
##
## - Calculate an 18% tip on a $28.50 bill:
...
##
## - What is the 10th power of 2?
...
## What do computer geeks call this number?
## ...
##
##
## ----------------------------------------------------
## | First bit of new syntax, actually, an operation: |
## | m:n |
## | generates a 'vector' or 'sequence' of numbers |
## | spaced by 1, starting at m, ending at/before n. |
## | We call them 'ladders'. |
## ----------------------------------------------------
##
## - Show all positive integers up to 100 (create the ladder 1,2,3,...,100):
...
##
## - Show all integers from 1,001 to 1,100:
...
##
## - Show all numbers divisible by 3 below 100:
...
##
## - Calculate the powers of 2 for 0 up to 20:
...
##
## - Explorations for ladders:
##
## . What do you expect to see when you generate
## a ladder of numbers starting at 1.3 ending below 10?
## ...
##
## . What do you expect if you start the sequence with
## a negative integer such as -3?
## Or should it be (-3)?
-3:10
## Is it:
(-3):10
## Or is it:
-(3:10)
## ???
## Do you know a technical term for this general issue?
## You might know it from high school math:
## ...
## So which operation binds stronger, '-' or ':'?
## ...
##
## . What do you expect if you try a sequence that ends lower
## than its starting value, such as from 10 to 5?
...
## Do you expect the following to work?
5:-3
##
##
## ---------------------------------------------------------------
## | |
## | R SYNTAX: |
## | |
## | - Just like in math, computer languages require so-called |
## | "order of operations" or "operator precedence". |
## | |
## | - Again like in math, use ROUND parentheses "()" to force |
## | the order of operations according to your intentions. |
## | Do NOT use brackets "[]" or curly braces "{}" !!! |
## | |
## | - Even if the default precedences agree with your intentions, |
## | avoid ambiguity for the human reader by using parens |
## | even where they may not be needed. |
## | |
## | - You can insert blanks liberally for clarity, but not inside |
## | numbers. |
## | |
## | - You can learn a lot of syntax by experimentation in R. |
## | |
## ---------------------------------------------------------------
##
## - What is the number 'pi'?
## There is a symbol in the language for this number:
## Just type 'pi'!
...
##
## - What is the 'sin' of pi? of pi/4?
...
##
## - What is half of the square root of 2?
## There is a function sqrt(...) that you can use,
## or you can use the power with exponent 0.5.
## Write several versions, using 'sqrt()' and
## ways of writing the exponent:
...
...
...
##
## - What is the reciprocal of the square root of 2?
...
##
## - Compare the previous results and explain!
## ...
## - What is the number 'e'?
## You will need to use the function 'exp()'.
## There is no fixed symbol for this one.
...
##
## - What is the natural exponential of 10?
...
##
## - Write the natural exponential of 10 as 'e' to the power 10.
...
##
## - What is the justification for identical results in the
## previous two questions?
## ...
## - What is the natural logarithm of the previous two results?
## You know the answer, but 'confirm' with actual R code.
...
##
## - What are the 10-based logarithms of 1,000 and 5,000 and 10,000?
...
##
## - Why would the 10-based log of 5,000 be higher than 3.5?
## ...
##
## - What is the interest on a $1,000 initial investment
## after 4 years with the following annual financial returns
##
## +7%, +9%, -10%, -6%
##
## Recall: These are yearly percentage gains and losses.
##
## Biologists: Translate this to cultures of uni-cellular organisms
## starting with 1,000 cells and the above percentages
## interpreted as minute-to-minute changes.
##
...
##
## - Comprehension question: Do you expect to be back to 1,000?
## ...
##
## - Same question for the following returns:
##
## -6%, -10%, +9%, +7%
##
...
##
## - Are you surprised?
## ...
##
## - Quantitative literacy, side remark:
##
## A percentage change is a multiplicative change!
##
##
## -----------------------------------------
## | ARITHMETIC OPERATIONS: |
## | |
## | Power: ^ | 2^10
## | Unary sign operations: -, + | -(2); -(1/2)
## | Sequence/ladder: : | 2:5
## | Multiplication/division: *, / | 5/2
## | Integer division, remainder: %/%, %% | to be explained
## | Addition/subtraction: +, - | 10-12
## -----------------------------------------
##
## - Can you guess why the operations are listed in this order?
## ...
##
## - Try to guess what the following does before you execute:
-2^0.5
#
## FUTHER EXPLANATIONS:
##
## - Distinguish between unary and binary - and +:
-(1/4) # unary
2-(1/4) # binary
## Are you able to explain the difference between the following?
-2
-(2)
## You have to look at these expressions through 'the eyes of R':
## R tries to first identify the numbers, then the operations.
## Accordingly, what does R see in
-2 # ... ???
## And what does it see in
-(2) # ... ???
##
## - Not yet seen: integer division %/% and remainder %% operation:
10 %/% 3 # integer division
10 %% 3 # remainder operation
(-10) %/% 3 # strange, isn't it?
(-10) %% 3 # consistent with the strangeness of (-10)%/%3
2.6 %/% 0.5 # why not try this...
2.6 %% 0.5 #
## Both operations work for arbitrary decimal numbers but
## %/% produces always an integer and %% the associated remainder.
##
## - Patterned sequences using '%/%' and '%%':
## Apply ...%/%3 and ...%%3 to the ladder 0:20
...
...
## What kinds of sequence patterns can you generate this way?
## ...
## ...
##
##
## ---------------------------------------
## | MATH FUNCTIONS: |
## | |
## | Trig: sin(), cos(), tan() |
## | Trig inverses: asin(), acos(), atan() |
## | Square root: sqrt() |
## | Exp, log: exp(), log(), log10() |
## ---------------------------------------
##
## Notes:
## - Yes, we will have occasion to use logs and trig functions!
## - Trig functions take 'arc' as an argument, not degrees.
## Reminder: arc(degree) = degree / 180 * pi
## Obtain the sin of 30 degrees by translating to arc first:
sin(30 / 180 * pi)
## Obtain the sin of 60 degrees:
...
## - Accordingly, inverse trig functions produce 'arc', not degrees.
## Translate sin = 1/sqrt(2) to corresponding arc and degrees
asin(1/sqrt(2)) # arc
asin(1/sqrt(2)) / pi * 180 # degrees -- surprise?
## Translate sin = 1/2 to corresponding arc and degree:
... # arc
... # degrees -- surprise?
## Joys of high school math!
##
##
## MISSING NUMERIC VALUES: They are the results of undefined operations.
##
## - R has 'values' for three kinds of 'missing numbers':
Inf; -Inf; NaN
## Can you intuit how they will be used?
##
## - Examples: Guess in each case which of the three 'values'
## or what actual number will result!
1/0 # ...
0/0 # ...
-1/0 # ...
1+1/0 # ...
1+Inf # ...
1/Inf # ...
1/(-Inf) # ...
1/NaN # ...
log(0) # ...
log(-1) # ...
sqrt(-1) # ...
##
## Note:
##
## - R may issue a 'Warning' when missing values arise.
## This does not mean the computation didn't go through!
## In fact, it did, but you are warned about missing values.
## Actual errors and aborted computations result, for example,
## from bad syntax:
/(2+3)
##
## - The values Inf, -Inf and NaN can be used like numbers.
## If you type them into an R console, or make them part
## of computations, no error will occur. Instead,
## results will be produced as in the above examples.
##
## - Fun experiment:
## Is it possible to write a number large enough to be turned into Inf by R?
## Background fact: Numbers are stored in 64 bit 'words'.
## So something has to happen if we type a number with too many digits digits.
##
##
## -------------------------------------------------
## | MISSING NUMERIC VALUES IN R: |
## | |
## | Inf : 1/0 |
## | -Inf : -1/0; log(0) |
## | NaN : 0/0; log(-1); sqrt(-1); Inf-Inf; NaN+3 |
## | |
## | Do not expect illegal math operations to crash. |
## | They usually generate a form of missing value. |
## -------------------------------------------------
##
##
## MISSING DATA:
##
## - When DATA are missing, R will give it the following value:
NA
## This value can appear in data files or it may be generated
## if a field in a data file is empty.
##
## - You can type
NA
## into the R console, and R will happily copy it back
# as a vector containint one element, NA,
## as if it had done a computation for you.
##
## - Distinguish:
##
## ---------------------------------------------------------------
## | . Missing values resulting from computations: Inf; -Inf; NaN |
## | . Missing values resulting from absent data: NA |
## ---------------------------------------------------------------
##
## We will encounter NA values later when we deal with data files.
## Another situation arises when we need to allocate a data table
## but the values of the table will be filled in later.
## In this case, NA makes a good default value.
##
##
## REPRESENTATION OF NUMBERS -- DEALING WITH FINITE PRECISION
##
## - Examples of EXTREME NUMBERS:
0.000000001
1000000000
-999999999999
## In all cases R rendered the values in exponential form,
## also called 'scientific notation'.
## What does the symbol 'e' stand for?
## ...
## Instead of 'e' we can also use 'E' when writing numbers:
1e6
1E6
1e+6
1E+6
1000000
10^6
## Why are you not surprised about the results?
## ...
## Which of the six does an actual arithmetic calculation?
## ...
## Hint: R sees all but one as simple numbers.
##
## - Example of a VERY SMALL NUMBER: one millionth can be written as
0.000001
1e-6
1E-6
1.0000E-6
10E-7
## Why are you not surprised about the results?
## ...
## Does R see any arithmetic operations, or just simple numbers?
## ...
##
## - CAUTION: Syntax!
## The following is incorrect:
1e(-6)
## No parens in number syntax!
## The following is correct, though:
1*10^(-6)
## What is the difference between this and the following?
1e-6
## ...
##
## - R by default shows decimal numbers to 7 digits precision:
pi # 7 digits
0.999999999 # 8 digits, hence rounded to 7 digits
0.000000999 # full precision due to exponential notation
0.0000009999999 # 7 digits, still full precision
0.00000099999999 # 8 digits, hence rounded to 7 digits
##
## Numbers are represented internally to about 14 or 15 significant digits.
##
## --------------------------------------------------------------------------
## | There is a difference between machine precision and printed precision! |
## --------------------------------------------------------------------------
##
## If you want to see more precision, do the following:
print(pi, 20)
## This is a call to a function print(), asking to print 'pi' to 20 digits.
## It can't do better than 16 digits, so this is what it prints.
## Another example:
print(sqrt(2), 12)
## - It is possible to write numbers so extreme that R can't represent them.
## Examples:
9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999
2e308
## However, the following still works:
1e308
## Hence somewhere between 1e308 and 2e308 the computer runs
## out of bits to represent the number, in which case it is
## rendered as 'Inf'.
## - [Hint for the future:
## This default behavior can be changed
## using the function 'options()'.
## Example:
## options(digits=10)
## We haven't talked about R's functions yet, so don't worry.
## ]
##
##
## ----------------------------------------------------------
## | DECIMAL NUMBERS IN R: |
## | |
## | - Most general form using 10-based exponential notation: |
## | 123.4567E30; -123.4567e-10 |
## | |
## | - The exponential part can be missing; |
## | the decimal part cannot be missing: |
## | 10; -5; 0.01234; -12.345 # no exp. part |
## | 1e10; # dec. part 1 |
## | |
## | - Default printing precision in R: 7 decimals |
## | Internal precision: > 15 decimals |
## ----------------------------------------------------------
##
##
##================================================================