Multivariate Statistical Analysis

MSAN 623

Multivariate Statistics

Instructor: Jeff Hamrick, Ph.D., CFA, FRM

Course Syllabus

Spring 2014

SUMMARY INFORMATION

Instructor: Jeff Hamrick, Ph.D., CFA, FRM

Office: Masonic 211

Office Hours: After class for one hour at the Presidio, or by appointment.

Cell Phone: 617/943-4619

Office Phone: 415/422-6810

Email Address: jhamrick@usfca.edu

Class Location: Presidio Campus

Class Time: 10:00 a.m. - 12:30 p.m., Tuesdays and Thursdays


ON COURSE GOALS. Any student who successfully completes this course should:

  • Understand the concepts of principal components analysis (PCA) and factor analysis (FA), and be able to use both to facilitate exploratory data analysis;
  • Understand how to use the eigenvalues of the variance-covariance matrix and scree plots to choose a fundamental underlying dimension for the multivariate data set;
  • Understand how orthogonal rotations like varimax and oblique rotations like promax can be used to make factors more interpretable;
  • Be able to use PCA and factor analysis to create latent variables of interest or to partially replicate results promulgated in famous examples in the literature (e.g., the work of Fama
  • and French);
  • Define and explain Mahalanobis distance and its connection to both multivariate two-sample student's t tests and to linear discriminant analysis;
  • Be able to conduct in R, and interpret the outputs related to, Bartlett's test for sphericity;
  • Be able to solve the classical (i.e., Gaussian) univariate and multivariate linear discriminant analysis problems;
  • Understand common extensions of linear discriminant analysis, e.g., quadratic discriminant analysis and nonparametric discriminant analysis;
  • Understand the connections between naive Bayes and discriminant analysis;
  • Be able to state types of, and uses of, cluster analysis;
  • Understand the most common types of hierarchic methods of cluster analysis, as well as limitations of the usefulness of cluster analysis;
  • Be able to use principal components analysis (or factor analysis) to augment cluster analysis;
  • Understand the challenges associated with performing cluster analysis on categorical data, and the notion of a similarity measure for categorical data;
  • Understand the definition of, and basic properties related to, graphs (i.e., vertices and edge relationships between those vertices);
  • Be able to perform spectral clustering on graph-type data;
  • Be able to convert a data set with categorical variables into a graph and then use spectral clustering to partition the graph into subgraphs; and
  • Apply clustering techniques to interesting data sets from business and the social sciences.


ABOUT ME. My name is Jeff Hamrick. I'm a term assistant professor of finance and business

analytics and I am affiliated with both the Master of Science in Business Analytics (MSAN) and

Master of Science in Financial Analysis (MSFA) programs at the University of San Francisco.

Please call me Jeff. My office is located in room 211 of the Masonic (MA) building at the cor-

ner of Masonic and Turk. My e.mail address is jhamrick@usfca.edu. My cell phone number is

617/943-4619 and my office number is 415/422-6810. If you're unable to discuss academic issues

with me at the Presidio campus before or after class, let me know and we may be able to schedule

an appointment (possibly over Google Hangout) at an alternate time.


ABOUT YOU. You should be hard-working and enthusiastic about learning and, in most cases,

you are a candidate for the Master of Science in Analytics at the University of San Francisco.

Linear Regression Analysis (MSAN 601) is a prerequisite for this course.


ABOUT US. We will meet to talk about time series modeling from Tuesday, March 19, 2013

until Tuesday, March 14, 2013. We will meet at the Presidio Campus. We will primarily use the

sixth, seventh, eighth, and ninth chapters of the third edition of Multivariate Statistical Methods:

A Primer by Bryan F.J. Manly (ISBN 1-58488-414-2) and chapters 2, 5, 7, 8, and 9 of the second

edition of Analysis of Multivariate Social Science Data by David J. Bartholomew, Fiona Steele,

Irini Moustaki, and Jane Galbraith (ISBN 978-1-58488-960-1). You are responsible for the

material in all readings assigned for this course, regardless of whether or not the ma-

terial from those readings is included in my in-class lectures.


ON R. R is a powerful open-source programming language and software environment for statistical

computing and graphics. The R language is used by many professional statisticians and is making

deep inroads in industry as well. R is equipped with a wide variety of statistical and graphical

techniques. It supports linear and nonlinear modeling, classical statistical tests, time series anal-

ysis, classication analysis, clustering, and much more. It will be extensively used in the MSAN

program. A set of screencast tutorials related to R will be available on my YouTube channel and

Blackboard.


ON ATTENDANCE. This course meets for only seven weeks. It will be short and intense.

Consequently, you may only miss class under the most dire of circumstances. These circumstances

should be both unusual and documentable. For example, having a bad cold is documentable but

not unusual. On the other hand, being kidnapped by aliens is unusual but is most likely not doc-

umentable. A single absence, for any reason, is acceptable and will not be penalized.

Each absence in excess of one absence will cause your nal letter grade in this class

to be lowered by one level (e.g., an A- will become a B+.)


ON LAPTOPS. In general, I want you to have a laptop in class and I want you to install R on

that laptop before the course begins. You will be expected to use R on quizzes and on the final

examination, and sometimes we will use R in class. I would ask you to be respectful of your class-

mates and to refrain from surng the web, checking out Facebook, tweeting people your various

tweets, etc. during the middle of my lectures.


ON HOMEWORK. Every week, there will be a collection of homework problems (generally

including tasks for you to perform in R) that I will make up myself or assign from the Manly or

Bartholomew textbooks. You must turn in your own write-up of each assignment, though you may

work with colleagues on the homework problems up until 48 hours before the homework assignment

is due. To reiterate: during the 48 hours prior to the start of the class during which

the homework assignment is due, you may not confer, work with, or write up your

homework assignments with anybody from the class. You also may not confer with

any third party, in fact. I will grade a random subset of the problems you turn in each week,

and sometimes I might grade all of them. To facilitate efficient grading, your weekly homework

should have the following properties:


  1.  Each problem should be started on a separate piece of paper.
  2.  Different parts of the same problem do not need to be started on separate pieces of paper.
  3.  Turn in the problems in the same order in which they were assigned.
  4.  Staple your homework assignment in the upper left-hand corner.
  5.  In general, do not print out entire data sets.
  6.  In general, do not print out reams and reams of R outputs. Everything should be orderly and easy for me to read.


Unfortunately, we won't have enough time to do homework problems in class or to discuss home-

work problems in great detail. Instead, feel free to come to my oce hours or to schedule an

individual appointment with me to discuss the homework. I will not accept late homework

assignments under any circumstances.


ON QUIZZES. Every week, there will be a quiz in this class. You will have approximately a five-

day period to schedule an appointment with Kirsten Keihl, the assistant for the M.S. in Analytics

program. You will have as much time as you like to take each quiz. While you can use your laptop

during the quiz, you are on your honor not to consult with any other resources on the Internet.

You also may not use your textbook, class notes, cheat sheets, etc. The weekly quiz will focus on

material that we have recently discussed in class -- generally, the topics from the past few lectures.

The weekly quizzes will be centered on denitions, concepts, and simple computations, as well as

interpretation of statistical output. At the end of the course, I will drop your lowest quiz grade.


ON THE FINAL EXAMINATION. There will be a final written comprehensive examination

in this course on May 14, 2013 during the regular course time, with some possibility for extra

time (say, 10:00 a.m. - 2:00 p.m.). The final examination will focus on concepts, i.e., you will not

be expected to engage in tons of routine calculations, but you will be expected to know certain

formulas and relationships and you will be expected to interpret the outputs of various multivariate

statistical analyses. In addition, you will be expected to use R to assist you with various statistical

analyses.


ON GRADING. I've noticed that students are often too focused on grades, to the great detri-

ment of their own learning. If students put as much effort into actually learning material as they

did worrying about their grades, their performance would be much better. Nevertheless, part of

my job is to assign grades fairly and in a manner that reflects the high academic standards at the

University of San Francisco and in the MSAN program. In this class, we will use the standard

ten-point scale. "Plus" or "minus" grades will be assigned to students with grades close to the

extremes of each ten-point bracket (plus or minus three points from the boundary of each bracket).

Your grade in this course will be computed according to the following weights:


Component                     Weight

Homework Sets                25%

Quizzes                             35%

Final Examination            40%


ON CHEATING. As a Jesuit institution committed to cura personalis -- the care and education

of the whole person -- the University of San Francisco has an obligation to embody and foster the

values of honesty and integrity. The university upholds standards of honesty and integrity from all

members of the academic community, including faculty, students, and sta. All students are ex-

pected to know and to adhere to the university's honor code. You can find the full text of the code

online at http://www.usfca.edu/catalog/policies/honor/. Specically, while you are required

to work in groups with students on the homework assignments, you should not allow your name

to be placed on a group write-up if it does not reflect your own understanding of the material and

if you have not made an honest, equitable contribution to the group effort. Copying answers from

other students or sources during a quiz or examination is a violation of the university's honor code

and will be treated as such. You are also, of course, bound to the terms of the MSAN Code of

Conduct that you signed prior to matriculating in the analytics program. All incidents of cheating

or other academic misconduct will be reported to the director of the MSAN program.


ON DISABILITIES. If you are a student with a disability or disabling condition, or if you think

you may have a disability, please contact USF Student Disability Services (SDS) at 415/422-2613

within the first week of class, or immediately upon onset of the disability, to speak with a disability

specialist. If you are determined eligible for reasonable accommodations, please meet with your

disability specialist so they can arrange to have your accommodation letter sent to me, and we will

discuss your needs for this course. For more information, please visit http://www.usfca.edu/sds/

or call 415/422-2613.