


























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Correcting Expense Ratios, Management Fees, Returns, and Assets. 6. Identifying Index Funds ... manager_tenure (tenure of a fund manager, in days),.
Typology: Exercises
1 / 34
This page cannot be seen from the preview
Don't miss anything!



























Data Appendix for Scale and Skill in Active Management
Lubos Pastor, Robert F. Stambaugh, Lucian A. Taylor
August, 2013
Table of Contents
First, we merge the raw CRSP data files to create a final dataset that contains all the relevant variables for our study. The raw SAS files are downloaded directly from the WRDS server, under wrds/crsp/sasdata/q_mutualfunds. Table 1 lays out the structure of each raw dataset. We start the merging process from the return dataset. In each step of the merge, we merge in only observations of the year/month when return data exists. If a variable is missing, we keep the year/month observation and record a missing value for that variable.
After the merge, we follow pages 2-3 of Berk & Binsbergen’s (2011) Data Appendix 1 to clean up the ticker variable. We reproduce these pages of their Appendix verbatim at the end of this document.
In the raw CRSP fund_hdr and fund_hdr_hist files, ticker is named nasdaq.
Next, we merge in the total US stock market capitalization for each year/month observation. We use the WRDS web interface to download this data. In particular, we download monthly price and shares outstanding data for all US stocks (from 1960 to present) with shrcd =10 or 11. Stock market capitalization is then calculated as the multiple of price and shares outstanding. We then sum all the stocks' market capitalization to generate the total US stock market capitalization (variable sum_me).
Next, we create additional variables, including one-month-lag variables of
_- mtna (total net asset),
Finally, we rename the variables so that each variable from CRSP dataset has “ _CRSP ” in its name. Table 2 shows the content of our final CRSP dataset. The final dataset contains 4,784,162 observations, covers 50,536 funds, and spans the time period from
(^1) DOCUMENTATION ‐ Measuring Managerial Skill in the Mutual Fund Industry, by Jonathan B. Berk and
Jules H. van Binsbergen (version dated 06/09/2011).
Raw CRSP File
Variables
Variable Label
DatasetDuration
# of Obs
# of Funds ( crsp_fundno
monthly_returns
crsp_fundno
Fund Identifier
caldt
Date
mret
Total Return per Share as of Month End
monthly_tna
mtna
Total Net Assets as of Month End
dividends
dividend_CRSP
Monthly Dividends, created from summing differenttypes of dividend per fund/year/month.
monthly_nav
mnav
Monthly Net Asset Value Per Share
fund_style
si_obj_cd
Strategic Insight Objective Code
wbrger_obj_cd
iesenberger Fund Type Code, Identifying Fund Strategy
lipper_class
Lipper Classification Code
lipper_obj_cd
Lipper Objective Code
lipper_class_name
Lipper Classification Name
policy
Type of Securities Mainly Held by Fund
fund_summary
per_com
Amount of fund invested in Common Stocks
per_pref
Amount of fund invested in Preferred Stocks
per_conv
Amount of fund invested in Convertible Bonds
per_corp
Amount of fund invested in Corporate Bonds
per_bond
Amount of fund invested in all Bonds
summary_period
Frequency
Table 1: CRSP Raw File Structure
Raw CRSP File
Variables
Variable Label
DatasetDuration
# of Obs
# of Funds ( crsp_fundno
fund_fees
actual_12b
12b-1 Fee
max_12b
Maximum 12b-1 Fee
ex
p_ratio
Expense Ratio as of Fiscal Year-End
mgmt_fee
Management Fee
turn_ratio
Fund Turnover Ratio
fiscal_yearend
Effective Date for Fees
fr
ont_load
front_load
Max Defer & Rear Load Charges
re
ar_load
rear_load
Maximum Rear-End Load
cr
sp_portno
Portfolio Identifier
fund_hdr & fund_hdr_hist
ncusip
Fund CUSIP
nasdaq
NASDAQ Ticker Symbol
first_offer_dt
Date the Fund Was First Offered
mgr_name
Manager Name
mgr_dt
Date Current Portfolio Mgr Took Control
fund_name
Fund Name
mgmt_name
Management Company Name
mgmt_cd
Management Company Number
open_to_inv
Open to Investors
re
tail_fund
Retail Fund Indicator
inst_fund
Institutional Fund Indicator
index_fund_flag
Index Fund Indicator
dead_flag
Dead Fund Indicator
delist_cd
Identifies the reason for the fund delisting
Table 1: CRSP Raw File Structure, Continued
Variable Variable Label
month Month in CRSP
nav_CRSP Monthly Net Asset Value per Share in CRSP
nav_lag_CRSP One Month Lagged Net Asset Value per Share in CRSP
open_to_inv_CRSP Open to Investors in CRSP
per_bond_CRSP Amount of fund invested in all Bonds in CRSP
per_com_CRSP Amount of fund invested in Common Stocks in CRSP
per_conv_CRSP Amount of fund invested in Convertible Bonds in CRSP
per_corp_CRSP Amount of fund invested in Corporate Bonds in CRSP
per_pref_CRSP Amount of fund invested in Preferred Stocks in CRSP
policy_CRSP Type of Securities Mainly Held by Fund in CRSP
portno_CRSP Portfolio Identifier in CRSP
rear_load_flag_CRSP Flag for a Fund with Positive Rear Load in CRSP
retail_fund_CRSP Retail Fund Indicator in CRSP
return_CRSP Monthly Return in CRSP
si_obj_cd_CRSP Strategic Insight Objective Code in CRSP
sum_me_CRSP Monthly Aggregate US Stock Market Capitalization in CRSP
sum_me_lag_CRSP One Month Lagged Aggregate US Stock Market Capitalization in CRSP
summary_period_CRSP Frequency of Fund Summary Reporting in CRSP
ticker_CRSP Ticker in CRSP
turn_ratio_CRSP Fund Turnover Ratio in CRSP
turn_ratio_lag_CRSP One Month Lagged Fund Turnover Ratio in CRSP
wbrger_obj_cd_CRSP Wiesenberger Fund Type Code, Identifying Fund Strategy in CRSP
year Year in CRSP
Table 2: Final CRSP Dataset Contents, Continued
First, we assign the following six category dummies to each fund in the dataset fund_ops , which contains summary information for 42,575 funds. A fund can be assigned zero, one, or multiple flags. The six categories are:
_- Bond_fund_MS (bond fund)
We also make use of the variable primary_prospectus_benchmark in order to further assign the fund style dummy. In particular, we check if primary_prospectus_benchmark contains any of the words listed in the first column of the following table. If so, we assign the correspondingly style dummy (second column) to the fund.
Bond funds Int'l funds Sector funds Target funds Real estate funds Other non‐equity Bank Loan China Region Commodities Broad Basket Target‐Date 2000 ‐2010 Global Real Estate Currency Convertibles Diversified Emerging Mkts Communications Target‐Date 2011 ‐2015 Real Estate Long/Short Equity Emerging Markets Bond Diversified Pacific/Asia Consumer Cyclical Target‐Date 2016 ‐ 2020 Managed Futures High Yield Bond Emerging Markets Bond Consumer Defensive Target‐Date 2021 ‐ 2025 Market Neutral High Yield Muni Europe Stock Equity Energy Target‐Date 2026 ‐ 2030 Multialternative Inflation‐Protected Bond Foreign Large Blend Equity Precious Metals Target‐Date 2031 ‐ 2035 Trading‐Inverse Commodities Intermediate Government Foreign Large Growth Financial Target‐Date 2036 ‐ 2040 Trading‐Inverse Debt Intermediate‐Term Bond Foreign Large Value Health Target‐Date 2041 ‐ 2045 Trading‐Miscellaneous Long Government Foreign Small/Mid Blend Industrials Target‐Date 2046 ‐ 2050 Long‐Term Bond Foreign Small/Mid Growth Miscellaneous Sector Target‐Date 2051+ Multisector Bond Foreign Small/Mid Value Natural Resources Muni California Intermediate Global Real Estate Technology Muni California Long India Equity Utilities Muni Massachusetts Japan Stock Muni Minnesota Latin America Stock Muni National Interm Pacific/Asia ex‐Japan Stk Muni National Long World Allocation Muni National Short World Bond Muni New Jersey World Stock Muni New York Intermediate Muni New York Long Muni Ohio Muni Pennsylvania Muni Single State Interm Muni Single State Long Muni Single State Short Nontraditional Bond Short Government Short‐Term Bond Ultrashort Bond World Bond
In Morningstar’s monthly asset dataset, we discover that there are instances of extreme reversal patterns that likely reflect decimal-place mistakes. We perform the following procedure to remove fix these extreme reversals. First, we create a variable for the fraction change from last month to the current month,
dassets=(assets-lag_assets)/lag_assets.
Second, we create a reversal variable to capture the reversal pattern,
rev_next=(lead_assets-assets)/(assets-lag_assets).
This variable will be approximately –1 if it is a reversal (e.g. 20m, 2m, 20m). Finally, if abs(dassets)>=0.5, –0.75>rev_next>–1.25, and lag_assets>=10m , then we assign missing value to both assets and dassets. As a result of this procedure, 0.026% of monthly asset observations are changed to missing.
Next, we create additional variables, including one-month-lag variables of
_- asset (total net asset),
Benchmark return:
The Morningstar file cat_and_index_mapping maps each Morningstar Category into a benchmark index. For example, the most common Morningstar Category is “Large Growth,” which maps into the benchmark index “Russell 1000 Growth TR USD.” We use this mapping to assign a benchmark index to each fund. The Morningstar file idx_mth_ret provides monthly returns on all Morningstar benchmark indexes.
Finally, we rename the variables so that each variable from Morningstar dataset has “ _MS ” in its name. Table 4 shows the content of our final Morningstar dataset. The final dataset contains 5,112,629 observations, covers 42,575 fund share classes (reflected by secid ), 14,765 funds (reflected by fundid ), and spans the time period from August/1924 to August/2012.
Table 3: Morningstar Raw File Structure^ Raw Morningstar File
Variables
Variable Label
DatasetDuration
# of Obs
# of Funds
(secid)
# of Funds
(fundid)
fund_ops
Actual_Management_Fee
Actual Management Fee in MS
Cusip
CUSIP in MS
Deferred_Load
Deferred Load in MS
End_Date
End Date of Fund in MS
Enhanced_Index
Enhanced Index (F or T) in MS
Family
Fund Family (e.g. 13DManagement/AQR Funds/CitizensFunds) in MS
Fiscal
Fiscal Year End (1,2,…,12) in MS
Front_Load
Front Load in MS
Fund_of_Fund
Fund of Fund (F or T) in MS
Fundid
Fund Identifier (May IncludingMultiple Share Classes) in MS
Fundname
Fund Name in MS
Inception
Fund Inception Date in MS
Index
Index (F or T) in MS
Primary_Prospectus_Benchmark
Primary Prospectus Benchmark inMS
Prospectus_Objective
Prospectus Objective in MS
Reason
Reason (L or M) in MS
Redemption_Fee
Redemption Fee in MS
SRI (F or T) in MS
Secid
Security Identifier (Unique perShare Class Level) in MS
Share_Type
Share Type/Share Class in MS
Status
Fund Status (A or O) in MS
Ticker
Ticker in MS
morningstar_category
Fund Category (94 total) in MS
v12B_
12b_1 Free in MS
Variable Variable Label
Actual_Management_Fee_MS Actual Management Fee in MS
Bond_fund_MS Bond Fund Dummy in MS
Category_MS Fund Category (94 total) in MS
Cusip_MS CUSIP in MS
Deferred_Load_MS Deferred Load in MS
Enhanced_Index_MS Enhanced Index (F or T) in MS
Family_MS Family (Many Values, e.g. 13D Management/AQR Funds/Citizens Funds) in MS
Fiscal_MS Fiscal Year End (Values 0,1,2,…,12, very few 0) in MS
Front_Load_MS Front Load in MS
Fund_of_Fund_MS Fund of Fund (F or T) in MS
Fundid Fund Identifier (May Including Multiple Share Classes) in MS
Fundname_MS Fund Name in MS
Inception_MS Fund Inception Date in MS
Index_MS Index (F or T) in MS
Index_name_MS Index Name in MS
Intl_fund_MS International Fund Dummy in MS
Mgr_End_Date_MS End Date of Manager in MS
PPB_MS Primary Prospectus Benchmark in MS
Prospectus_Obj_MS Prospectus Objective in MS
RealEstate_fund_MS Real Estate Fund Dummy in MS
Reason_MS Reason (L or M) in MS
Redemption_Fee_MS Redemption Fee in MS
SRI_MS SRI (F or T) in MS
Secid Security Identifier (Unique per Share Class Level) in MS
Sector_fund_MS Sector Fund Dummy in MS
Share_Type_MS Share Type/Class in MS
Status_MS Fund Status (A or O) in MS
Strange_funds_MS Strange Fund Dummy in MS
Target_fund_MS Target Fund Dummy in MS
Ticker_MS Ticker in MS
Table 4: Final Morningstar Dataset Contents
Variable Variable Label
assets_MS Monthly Fund Assets in MS
assets_fundid_MSavg Monthly Fundid (Multiple Share Classes) Level Assets in MS
assets_fundid_MSavg_lag 1-Month Lagged Monthly Fundid (Multiple Share Classes) Level Assets in MS
assets_lag_MS Lagged Monthly Fund Assets in MS
date Date in MS
deferred_load_flag_MS Flag for a Fund with Positive Deferred Load in MS (snapshot from fund_ops)
dividend_MS Monthly Fund Dividend in MS
expense_ratio_MSavg Annual Expense Ratio in MS
expense_ratio_MSavg_lag 1-Month Lagged Annual Expense Ratio in MS
first_MS_dt First Date a Fund Appears in MS in MS
front_load_flag_MS Flag for a Fund with Positive Front Load in MS (snapshot from fund_ops)
fund_age_MS Fund Age (in days) in MS
month Month in MS
nav_MS Monthly Fund Net Assets in MS
nav_lag_MS Lagged Monthly Fund Net Assets in MS
num_cat_MS Sum of the Six Fund Style Dummy in MS
ratings_MS Monthly Fund Ratings in MS
return_MS Monthly Fund Return in MS
return_idx_MS Return of Index in MS
secid_idx_MS Secid of Index in MS
share_classes_fundid_MS Number of Share Classes in MS
turnover_MSavg Annual Turnover Ratio in MS
turnover_MSavg_lag 1-Month Lagged Annual Turnover Ratio in MS
v12B_1_MS 12b_1 Free in MS
year Year in MS
Table 4: Final Morningstar Dataset Contents, Continued
(B) Algorithm for Merging CRSP & Morningstar Data
Step 1: Merge CRSP and MS by Ticker at the share-class level. Both CRSP and MS datasets should have a unique Ticker per share class.
Step 2: Check matching quality by comparing CRSP’s and Morningstar’s data on each fund’s returns and total net assets. We say that a fund is “well matched” if and only if: (1) the 60th^ percentile of the absolute value of the difference between CRSP and MS monthly return is less than 5 basis points, and (2) the 60th^ percentile of the absolute difference between CRSP and MS monthly total net assets is less than $100,000.
If a fund is well matched, we accept its secid—crsp_fundno mapping as valid. Otherwise, we proceed to the Step 3.
We find that 24,288 share classes^6 (48.1% of 50,536 CRSP fund share classes, 57.0% of 42, MS fund share classes) are well matched by Ticker.
Step 3: Merge CRSP and MS using the CUSIP ( latest per crsp_fundno ) at the share-class level. Perform Step 2 to validate matches.
If well matched, accept the secid—crsp_fundno mapping. Otherwise proceed to Step 4.
We find that 9,798 additional share classes (19.4% of 50,536 CRSP share classes, 23.0% of 42,575 MS share classes) are well matched by the CUSIP ( latest per crsp_fundno ).
Step 4: Merge CRSP and MS using the CUSIP ( second latest ) per crsp_fundno at the share-class level. Perform Step 2 to validate matches.
If well matched, accept the secid—crsp_fundno mapping. Otherwise, keep the remaining unmatched fund share classes till Step 6.
Only 56 additional share classes are well matched by the earlier CUSIP.
Step 5:
(^6) BB apply thresholds of 2bp and $20,000. If we use those same thresholds, we find only 1,086 valid matches. We
relax the return and asset matching threshold to 5bp/$100,000 to obtain more matches. Although looser than the BB thresholds, our thresholds are still quite demanding.
We say that a fund (identified by fundid in Morningstar) is “completely matched” if all the MS secid' s belonging to the fundid are well matched to crsp_fundno' s. We say a fund is “partially matched” if at least one share class is well matched and at least one share class (in either CRSP or MS) is not.
At this step we find that 9,711 funds (65.8% of 14,765 MS fundid ’s) are completely matched. 1,572 funds (10.6%) are partially matched. 3,482 funds (23.6%) are not matched at all.
Step 6: The goal of this step is to identify additional matches by comparing funds’ names between CRSP and Morningstar. Only 212 additional share classes are well matched by this text-based approach_._
The steps are as follows:
a. Using the full CRSP database, create variables crsp_group_id and crsp_subclass from fund_name_CRSP , following pages 19-20 of BB’s documentation.^7 The variable crsp_group_id assigns a shared identifier to all share classes belonging to the same fund. The variable crsp_subclass assigns a share type to different share classes. b. For each crsp_group_id, check whether at least one share class in the group is well matched to a MS share class. If so, assign that share class’s fundid to all members of the crsp_group_id. For example, if fund share class A (well matched) and fund share class B (not yet matched) share the same crsp_group_id , then B will be assigned A’s fundid. c. Remove the following funds, which would either only yield partially matched cases or be useless for text-based matching: (c1) A crsp_group_id is associated with more funds in CRSP than its associated fundid indicates in MS. For example, if a crsp_group_id is associated with 3 crsp_fundno 's while the fundid is only associated with 2 secid 's. We remove this fund because we presume Morningstar is missing data on at least one share class. (c2) A crsp_group_id is associated with insufficient number of non-matched fund share classes. For example, a crsp_group_id is associated with 4 crsp_fundno ’s – 3 well matched and 1 not yet matched. The associated MS fundid is related to 5 fund share classes. In this case, even if we match the one fund through text-based approach, we would still end up with only a partially matched case. We presume in this case that CRSP is missing data on at least one share class. (c3) A crsp_group_id is associated with only well-matched fund share classes. They offer no incremental benefits for our text-based matching.
(^7) The relevant excerpt from their documentation is at the end of this document. We deviate slightly from BB when
we break the fund name by “/” to assign crsp_group_id and crsp_subclass. We notice some fund names such as “Vanguard equity/bond balance fund” should not be assigned a crsp_subclass as “bond balance fund”, but rather the whole fund name should be assigned crsp_group_id. After compressing the typical words (e.g. ‘class’, ‘share’) in share class, we decided to not break the fund name by “/” if what follows contains more than 11 letters. 11 is the 99 th^ percentile for number of letters that come after “/”.
Table 5: Share Type Processing
crsp_subclass with keywords crsp_subclass compressed Processed to MS share_type
Ashares / (A) 1 A
BShares / (B) / B(1) 2 B
CShares / (C) / C- 3 C
DShares / (D) 4 D
MShares / (M) M
NShares / (N) N
SShares / (S) S
TShares / (T) T
Investor / Invest / Consumer / Inv
Institutional / Instl / IB / Initial / I2 / IA / II I / Y Inst
Retire / R1 / R2 / R3 / R4 / R5 / R 5 / R6 / RetA / RetB R / K Retireme
Advisor / Adviser / Admin / Administrative / Admiral / Consultant / Service / Retail
Adv
No Load / NoLoad NoLoad
Nav / BlackRock E / F / H / L / W / Z Other
All else All else NA
Using the concordance file created in Section 3, we then perform the following three steps to merge CRSP and Morningstar data:
(1) Create a CRSP dataset that only contains completely matched funds. The resulting smaller CRSP dataset contains 2,936,897 observations compared to the 4,784,162 observations in the larger CRSP dataset created in Section 1.
(2) Create a Morningstar dataset that only contains completely matched funds. The resulting smaller Morningstar dataset contains 3,722,775 observations compared to the 5,112, observations in the larger Morningstar dataset created in Section 1.
(3) Merge the CRSP dataset and Morningstar dataset created in (1) and (2) by share class/year/month. We use the CRSP dataset as the master file for the merge. The resulting dataset contains 2,936,897 share class / month observations for 8,807 funds and 27,414 share classes.