Factor Analysis in the stock market (in the wild)

Well, I’m done with my qualifying exam. I’ll know if I passed by late this week/early next week.

Anyway, here is a short project that I did on factor analysis in November.


A major market index in the United States is the Dow Jones Industrial Average. Thirty large industrial companies stock prices contribute to the calculation of the Dow Jones industrial average. These companies are Boeing, Caterpillar, Chevron, Citigroup, Coca Cola, DuPont, Exxon Mobil, General Electric, General Motors, Hewlett Packard, Home Depot, IBM, Intel, Johnson and Johnson, JP Morgan Chase, Kraft Foods, McDonalds, Merck, Microsoft, Pfizer, Proctor and Gamble, United Tech, Verizon, WalMart, Walt Disney, Bank of America, AT and T, American Express, Alcoa, and 3M Company.

The amount of change in the price of these stocks will be highly correlated, as they are all part of the larger market. Factor analysis will be used to reduce the dimensionality of the 30 stocks in the Dow Jones average. This is being done because I am interested to see which stock’s prices move together.

Data was collected from the website finance.yahoo.com. Data consists of the high, low, opening, and closing price of each of the thirty stocks as well as the volume of each stock for each day. Stocks vary in the length for which they have historical data, as some companies have been public longer than other. As such only the last 1000 trading days are considered in the analysis. This includes all data dating back to November 19, 2004. Rather than consider the actual price of the stock (since some stock prices are much higher or lower than others), the change in stock price from one closing bell to the next is considered for all thirty stocks.

Using SAS 9.2, a factor analysis was implemented for the differences in closing prices for the 30 Dow Jones stocks over the last 1000 days. Using a scree plot \cite{scree} and by analyzing the eigenvalues of the correlation matrix, a sufficient number of factors will be chosen. Upon finding the principal components, the varimax \cite{Johnson} method will be used to find a final rotated factor solution.

Stock Factor 1 Factor 2 Factor 3 Factor 4 Factor 5
AA 0.23127 0.09893 0.23084 0.73216 0.04921
AXP 0.67590 0.27178 0.30486 0.21506 0.10536
BA 0.22446 0.24467 0.32912 0.32123 0.33817
BAC 0.82352 0.25896 0.14310 0.14884 0.13932
C 0.79975 0.24980 0.14993 0.15357 0.09632
CAT 0.19406 -0.06364 0.12838 0.45805 0.38608
CVX 0.13066 0.40635 0.20625 0.76029 0.08143
DD 0.39500 0.30897 0.27833 0.43173 0.30426
DIS 0.37520 0.43313 0.42601 0.27117 0.13823
GE 0.61010 0.27886 0.31333 0.21191 0.14694
GM 0.50475 0.06398 0.15834 0.19736 0.00832
HD 0.53510 0.24265 0.37274 0.02793 0.25287
HPQ 0.24719 0.19816 0.70506 0.24809 0.02544
IBM 0.33248 0.19050 0.68158 0.21392 0.10399
INTC 0.32603 0.20892 0.61192 0.21961 0.11821
JNJ 0.15935 0.71031 0.23464 0.10390 0.17792
JPM 0.80200 0.25962 0.19675 0.08128 0.15698
KFT 0.28979 0.45453 0.18234 0.20696 0.14068
KO 0.10981 0.60928 0.39598 0.08037 0.18114
MCD 0.26140 0.40935 0.36831 0.15274 0.36240
MMM 0.35237 0.31052 0.31527 0.35182 0.22999
MRK 0.18019 0.67967 0.06238 0.17023 -0.06161
MSFT 0.14925 0.35621 0.65338 0.20696 0.12790
PFE 0.37601 0.57298 0.08865 0.15824 -0.00815
PG 0.20504 0.69431 0.20156 0.17265 0.25142
T 0.36670 0.53525 0.32948 0.27273 0.00919
UTX 0.13055 0.15316 0.07721 0.11207 0.79017
VZ 0.37186 0.52181 0.37188 0.19101 0.01283
WMT 0.37782 0.46428 0.35919 0.06918 0.23774
XOM 0.13787 0.44943 0.21108 0.74470 0.09553

Keeping five factors, we can see see which stocks load heavily onto which factors by looking at the table. The variables that load heavily onto the first factor include, American Express (AXP), Bank of America (BAC), Citigroup (C), General Electric (GE), General Motors (GM), Home Depot (HD), and JP Morgan (JPM). With the exception of Home Depot and General motors, all of these companies are financial institutions, and General Motors and Home Depot are heavily affected by the availability of credit from these institution as GM sells large ticket items (cars) and HD is heavily tied to people buying houses, and thus affected by the mortgage market. It appears that this first factor explains variation related to the financial sector.

The companies that are heavily loaded onto the second factor include, Chevron (CVX), Disney (DIS), Johnson and Johnson (JNJ), Kraft Foods (KFT), Coca Cola (KO), McDonalds (MCD), Merck (MRK), Pfizer (PFE), Proctor and Gamble (PG), AT and T (T), Verizon (VZ), Wal-Mart (WMT), and Exxon-Mobil (XOM). All of these companies sell items directly to consumers, and the costs involved in each of these transactions with consumers is relatively small. So, it appears this second factor is explaining the variation due to the individual consumer.

The third factor includes Disney, Hewlett-Packard, IBM, Intel, and Microsoft. These companies, with the glaring exception of Disney, are all companies tied to computers. Thus, it appears that the third factor explains variation due to computer industry. While factor four include companies such as Alcoa, Cat, Chevron, DuPont, and Exxon-Mobil. This factor appears to explain variation in the manufacturing market. Both Chevron and Exxon-Mobil appear heavily loaded on both factor 2 and factor 4. This makes sense since both companies can essentially break down their earnings into two components, individual consumer sales and sales to other businesses.

Factor five includes United Technologies by itself, which is interesting because UTX hold such a large variety of companies including, Carrier, Hamilton-Sundstrand, Otis elevators, Pratt and Whitney, and Sikorsky Helicopter.


The movement in stock price of the 30 stocks which comprise the Dow Jones Industrial Average are highly correlated. As such they are a prime candidate for a factor analysis and a dimensionality reduction. Using five factors, we can group the variability in the stock market into categories. Roughly speaking the three categories that explain the most variation are financial, consumer goods, technology. The fourth and fifth factor seem to represent approximately the same dimension, namely, manufacturing and industry.

Using this factor analysis, we no can now view fluctuations in the stock market based on groups rather than the individual stocks. We have reduced the dimensionality of the stock in the Dow Jones from 30 down to 5, while still explaining 60 percent of the variability, greatly simplifying analysis of this stock data.

Future work in this direction could include using more than the past 1000 days of data and possibly including more than 30 stocks in the factor analysis.


Posted on January 20, 2009, in Uncategorized. Bookmark the permalink. 11 Comments.

  1. This analysis is interesting. Now, on to prediction, heh heh!

  2. Strangely the browser I have does not show your page as it should… It appears that a whole chunk of if is not showing and the skin of the article does not appear to be right. Are you sure this page has been set up for Google Chrome?

  3. I never thought of it that way, well put!

  4. Since factor analysis is cross-sectional, do the results posted above come from the latest data, meaning the latest day in the 1000 days you studied? or are these an average of the 1000 days?

  5. Since factor analysis is cross-sectional, are the results above based on the last data point, meaning the last day of the 1000 days you studied? or is this an average of the 1000 days?

  6. I believe I used closing price for each day.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: