1 Data Appendix for Scale and Skill in Active Management, Exercises of Investment Management and Portfolio Theory

Correcting Expense Ratios, Management Fees, Returns, and Assets. 6. Identifying Index Funds ... manager_tenure (tenure of a fund manager, in days),.

Typology: Exercises

2022/2023

Uploaded on 05/11/2023

ryangosling
ryangosling 🇺🇸

4.8

(24)

249 documents

1 / 34

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Data Appendix for
Scale and Skill in Active Management
Lubos Pastor, Robert F. Stambaugh, Lucian A. Taylor
August, 2013
Table of Contents
1. Raw CRSP Database Clean-up and Merge
2. Raw Morningstar Database Clean-up and Merge
3. Creating Completely Matched Sample Between CRSP and Morningstar
4. Merging CRSP and Morningstar
5. Correcting Expense Ratios, Management Fees, Returns, and Assets
6. Identifying Index Funds
7. Grouping Subclasses
8. Computing SectorSize
9. Excerpts from Berk and Binsbergen’s (2011) Data Appendix
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22

Partial preview of the text

Download 1 Data Appendix for Scale and Skill in Active Management and more Exercises Investment Management and Portfolio Theory in PDF only on Docsity!

Data Appendix for Scale and Skill in Active Management

Lubos Pastor, Robert F. Stambaugh, Lucian A. Taylor

August, 2013

Table of Contents

  1. Raw CRSP Database Clean-up and Merge
  2. Raw Morningstar Database Clean-up and Merge
  3. Creating Completely Matched Sample Between CRSP and Morningstar
  4. Merging CRSP and Morningstar
  5. Correcting Expense Ratios, Management Fees, Returns, and Assets
  6. Identifying Index Funds
  7. Grouping Subclasses
  8. Computing SectorSize
  9. Excerpts from Berk and Binsbergen’s (2011) Data Appendix

1. Raw CRSP Database Cleanup and Merge

First, we merge the raw CRSP data files to create a final dataset that contains all the relevant variables for our study. The raw SAS files are downloaded directly from the WRDS server, under wrds/crsp/sasdata/q_mutualfunds. Table 1 lays out the structure of each raw dataset. We start the merging process from the return dataset. In each step of the merge, we merge in only observations of the year/month when return data exists. If a variable is missing, we keep the year/month observation and record a missing value for that variable.

After the merge, we follow pages 2-3 of Berk & Binsbergen’s (2011) Data Appendix 1 to clean up the ticker variable. We reproduce these pages of their Appendix verbatim at the end of this document.

In the raw CRSP fund_hdr and fund_hdr_hist files, ticker is named nasdaq.

Next, we merge in the total US stock market capitalization for each year/month observation. We use the WRDS web interface to download this data. In particular, we download monthly price and shares outstanding data for all US stocks (from 1960 to present) with shrcd =10 or 11. Stock market capitalization is then calculated as the multiple of price and shares outstanding. We then sum all the stocks' market capitalization to generate the total US stock market capitalization (variable sum_me).

Next, we create additional variables, including one-month-lag variables of

_- mtna (total net asset),

  • nav (net asset value per share),
  • exp_ratio (expense ratio),
  • turn_ratio (turnover ratio),
  • sum_me (total US stock market capitalization),_ and _- first_offer_dt (date the fund was first offered)
  • fund_age (age of a fund at month end, in days)
  • manager_tenure (tenure of a fund manager, in days),
  • first_CRSP_dt (date when a fund first shows up in CRSP)._

Finally, we rename the variables so that each variable from CRSP dataset has “ _CRSP ” in its name. Table 2 shows the content of our final CRSP dataset. The final dataset contains 4,784,162 observations, covers 50,536 funds, and spans the time period from

(^1) DOCUMENTATION ‐ Measuring Managerial Skill in the Mutual Fund Industry, by Jonathan B. Berk and

Jules H. van Binsbergen (version dated 06/09/2011).

Raw CRSP File

Variables

Variable Label

DatasetDuration

# of Obs

# of Funds ( crsp_fundno

monthly_returns

crsp_fundno

Fund Identifier

caldt

Date

mret

Total Return per Share as of Month End

monthly_tna

mtna

Total Net Assets as of Month End

dividends

dividend_CRSP

Monthly Dividends, created from summing differenttypes of dividend per fund/year/month.

monthly_nav

mnav

Monthly Net Asset Value Per Share

fund_style

si_obj_cd

Strategic Insight Objective Code

wbrger_obj_cd

W

iesenberger Fund Type Code, Identifying Fund Strategy

lipper_class

Lipper Classification Code

lipper_obj_cd

Lipper Objective Code

lipper_class_name

Lipper Classification Name

policy

Type of Securities Mainly Held by Fund

fund_summary

per_com

Amount of fund invested in Common Stocks

per_pref

Amount of fund invested in Preferred Stocks

per_conv

Amount of fund invested in Convertible Bonds

per_corp

Amount of fund invested in Corporate Bonds

per_bond

Amount of fund invested in all Bonds

summary_period

Frequency

Table 1: CRSP Raw File Structure

Raw CRSP File

Variables

Variable Label

DatasetDuration

# of Obs

# of Funds ( crsp_fundno

fund_fees

actual_12b

12b-1 Fee

max_12b

Maximum 12b-1 Fee

ex

p_ratio

Expense Ratio as of Fiscal Year-End

mgmt_fee

Management Fee

turn_ratio

Fund Turnover Ratio

fiscal_yearend

Effective Date for Fees

fr

ont_load

front_load

Max Defer & Rear Load Charges

re

ar_load

rear_load

Maximum Rear-End Load

cr

sp_portno

Portfolio Identifier

fund_hdr & fund_hdr_hist

ncusip

Fund CUSIP

nasdaq

NASDAQ Ticker Symbol

first_offer_dt

Date the Fund Was First Offered

mgr_name

Manager Name

mgr_dt

Date Current Portfolio Mgr Took Control

fund_name

Fund Name

mgmt_name

Management Company Name

mgmt_cd

Management Company Number

open_to_inv

Open to Investors

re

tail_fund

Retail Fund Indicator

inst_fund

Institutional Fund Indicator

index_fund_flag

Index Fund Indicator

dead_flag

Dead Fund Indicator

delist_cd

Identifies the reason for the fund delisting

Table 1: CRSP Raw File Structure, Continued

Variable Variable Label

month Month in CRSP

nav_CRSP Monthly Net Asset Value per Share in CRSP

nav_lag_CRSP One Month Lagged Net Asset Value per Share in CRSP

open_to_inv_CRSP Open to Investors in CRSP

per_bond_CRSP Amount of fund invested in all Bonds in CRSP

per_com_CRSP Amount of fund invested in Common Stocks in CRSP

per_conv_CRSP Amount of fund invested in Convertible Bonds in CRSP

per_corp_CRSP Amount of fund invested in Corporate Bonds in CRSP

per_pref_CRSP Amount of fund invested in Preferred Stocks in CRSP

policy_CRSP Type of Securities Mainly Held by Fund in CRSP

portno_CRSP Portfolio Identifier in CRSP

rear_load_flag_CRSP Flag for a Fund with Positive Rear Load in CRSP

retail_fund_CRSP Retail Fund Indicator in CRSP

return_CRSP Monthly Return in CRSP

si_obj_cd_CRSP Strategic Insight Objective Code in CRSP

sum_me_CRSP Monthly Aggregate US Stock Market Capitalization in CRSP

sum_me_lag_CRSP One Month Lagged Aggregate US Stock Market Capitalization in CRSP

summary_period_CRSP Frequency of Fund Summary Reporting in CRSP

ticker_CRSP Ticker in CRSP

turn_ratio_CRSP Fund Turnover Ratio in CRSP

turn_ratio_lag_CRSP One Month Lagged Fund Turnover Ratio in CRSP

wbrger_obj_cd_CRSP Wiesenberger Fund Type Code, Identifying Fund Strategy in CRSP

year Year in CRSP

Table 2: Final CRSP Dataset Contents, Continued

2. Raw Morningstar Database Clean-up and Merge

First, we assign the following six category dummies to each fund in the dataset fund_ops , which contains summary information for 42,575 funds. A fund can be assigned zero, one, or multiple flags. The six categories are:

_- Bond_fund_MS (bond fund)

  • Intl_fund_MS (international fund)
  • Sector_fund_MS (sector fund)
  • Target_fund_MS (target fund)
  • RealEstate_fund_MS (real estate fund)
  • Other_nonequity_funds_MS (other non-equity fund)_ We set a flag to one if the fund’s Morningstar_category variable’s value belongs to the corresponding list below:

We also make use of the variable primary_prospectus_benchmark in order to further assign the fund style dummy. In particular, we check if primary_prospectus_benchmark contains any of the words listed in the first column of the following table. If so, we assign the correspondingly style dummy (second column) to the fund.

Bond funds Int'l funds Sector funds Target funds Real estate funds Other non‐equity Bank Loan China Region Commodities Broad Basket Target‐Date 2000 ‐2010 Global Real Estate Currency Convertibles Diversified Emerging Mkts Communications Target‐Date 2011 ‐2015 Real Estate Long/Short Equity Emerging Markets Bond Diversified Pacific/Asia Consumer Cyclical Target‐Date 2016 ‐ 2020 Managed Futures High Yield Bond Emerging Markets Bond Consumer Defensive Target‐Date 2021 ‐ 2025 Market Neutral High Yield Muni Europe Stock Equity Energy Target‐Date 2026 ‐ 2030 Multialternative Inflation‐Protected Bond Foreign Large Blend Equity Precious Metals Target‐Date 2031 ‐ 2035 Trading‐Inverse Commodities Intermediate Government Foreign Large Growth Financial Target‐Date 2036 ‐ 2040 Trading‐Inverse Debt Intermediate‐Term Bond Foreign Large Value Health Target‐Date 2041 ‐ 2045 Trading‐Miscellaneous Long Government Foreign Small/Mid Blend Industrials Target‐Date 2046 ‐ 2050 Long‐Term Bond Foreign Small/Mid Growth Miscellaneous Sector Target‐Date 2051+ Multisector Bond Foreign Small/Mid Value Natural Resources Muni California Intermediate Global Real Estate Technology Muni California Long India Equity Utilities Muni Massachusetts Japan Stock Muni Minnesota Latin America Stock Muni National Interm Pacific/Asia ex‐Japan Stk Muni National Long World Allocation Muni National Short World Bond Muni New Jersey World Stock Muni New York Intermediate Muni New York Long Muni Ohio Muni Pennsylvania Muni Single State Interm Muni Single State Long Muni Single State Short Nontraditional Bond Short Government Short‐Term Bond Ultrashort Bond World Bond

In Morningstar’s monthly asset dataset, we discover that there are instances of extreme reversal patterns that likely reflect decimal-place mistakes. We perform the following procedure to remove fix these extreme reversals. First, we create a variable for the fraction change from last month to the current month,

dassets=(assets-lag_assets)/lag_assets.

Second, we create a reversal variable to capture the reversal pattern,

rev_next=(lead_assets-assets)/(assets-lag_assets).

This variable will be approximately –1 if it is a reversal (e.g. 20m, 2m, 20m). Finally, if abs(dassets)>=0.5, –0.75>rev_next>–1.25, and lag_assets>=10m , then we assign missing value to both assets and dassets. As a result of this procedure, 0.026% of monthly asset observations are changed to missing.

Next, we create additional variables, including one-month-lag variables of

_- asset (total net asset),

  • nav (net asset value per share),
  • asset_fundid (combined total net asset of sub share classes of a fund),
  • asset (total net asset),
  • expense_ratio (expense ratio),
  • turnover (turnover ratio),_ and _- inception (inception date of a fund),
  • fund_age (fund age since inception, in days),
  • first_ms_dt (date the fund first appeared in Morningstar)._

Benchmark return:

The Morningstar file cat_and_index_mapping maps each Morningstar Category into a benchmark index. For example, the most common Morningstar Category is “Large Growth,” which maps into the benchmark index “Russell 1000 Growth TR USD.” We use this mapping to assign a benchmark index to each fund. The Morningstar file idx_mth_ret provides monthly returns on all Morningstar benchmark indexes.

Finally, we rename the variables so that each variable from Morningstar dataset has “ _MS ” in its name. Table 4 shows the content of our final Morningstar dataset. The final dataset contains 5,112,629 observations, covers 42,575 fund share classes (reflected by secid ), 14,765 funds (reflected by fundid ), and spans the time period from August/1924 to August/2012.

Table 3: Morningstar Raw File Structure^ Raw Morningstar File

Variables

Variable Label

DatasetDuration

# of Obs

# of Funds

(secid)

# of Funds

(fundid)

fund_ops

Actual_Management_Fee

Actual Management Fee in MS

N/A

Cusip

CUSIP in MS

Deferred_Load

Deferred Load in MS

End_Date

End Date of Fund in MS

Enhanced_Index

Enhanced Index (F or T) in MS

Family

Fund Family (e.g. 13DManagement/AQR Funds/CitizensFunds) in MS

Fiscal

Fiscal Year End (1,2,…,12) in MS

Front_Load

Front Load in MS

Fund_of_Fund

Fund of Fund (F or T) in MS

Fundid

Fund Identifier (May IncludingMultiple Share Classes) in MS

Fundname

Fund Name in MS

Inception

Fund Inception Date in MS

Index

Index (F or T) in MS

Primary_Prospectus_Benchmark

Primary Prospectus Benchmark inMS

Prospectus_Objective

Prospectus Objective in MS

Reason

Reason (L or M) in MS

Redemption_Fee

Redemption Fee in MS

SRI

SRI (F or T) in MS

Secid

Security Identifier (Unique perShare Class Level) in MS

Share_Type

Share Type/Share Class in MS

Status

Fund Status (A or O) in MS

Ticker

Ticker in MS

morningstar_category

Fund Category (94 total) in MS

v12B_

12b_1 Free in MS

Variable Variable Label

Actual_Management_Fee_MS Actual Management Fee in MS

Bond_fund_MS Bond Fund Dummy in MS

Category_MS Fund Category (94 total) in MS

Cusip_MS CUSIP in MS

Deferred_Load_MS Deferred Load in MS

Enhanced_Index_MS Enhanced Index (F or T) in MS

Family_MS Family (Many Values, e.g. 13D Management/AQR Funds/Citizens Funds) in MS

Fiscal_MS Fiscal Year End (Values 0,1,2,…,12, very few 0) in MS

Front_Load_MS Front Load in MS

Fund_of_Fund_MS Fund of Fund (F or T) in MS

Fundid Fund Identifier (May Including Multiple Share Classes) in MS

Fundname_MS Fund Name in MS

Inception_MS Fund Inception Date in MS

Index_MS Index (F or T) in MS

Index_name_MS Index Name in MS

Intl_fund_MS International Fund Dummy in MS

Mgr_End_Date_MS End Date of Manager in MS

PPB_MS Primary Prospectus Benchmark in MS

Prospectus_Obj_MS Prospectus Objective in MS

RealEstate_fund_MS Real Estate Fund Dummy in MS

Reason_MS Reason (L or M) in MS

Redemption_Fee_MS Redemption Fee in MS

SRI_MS SRI (F or T) in MS

Secid Security Identifier (Unique per Share Class Level) in MS

Sector_fund_MS Sector Fund Dummy in MS

Share_Type_MS Share Type/Class in MS

Status_MS Fund Status (A or O) in MS

Strange_funds_MS Strange Fund Dummy in MS

Target_fund_MS Target Fund Dummy in MS

Ticker_MS Ticker in MS

Table 4: Final Morningstar Dataset Contents

Variable Variable Label

assets_MS Monthly Fund Assets in MS

assets_fundid_MSavg Monthly Fundid (Multiple Share Classes) Level Assets in MS

assets_fundid_MSavg_lag 1-Month Lagged Monthly Fundid (Multiple Share Classes) Level Assets in MS

assets_lag_MS Lagged Monthly Fund Assets in MS

date Date in MS

deferred_load_flag_MS Flag for a Fund with Positive Deferred Load in MS (snapshot from fund_ops)

dividend_MS Monthly Fund Dividend in MS

expense_ratio_MSavg Annual Expense Ratio in MS

expense_ratio_MSavg_lag 1-Month Lagged Annual Expense Ratio in MS

first_MS_dt First Date a Fund Appears in MS in MS

front_load_flag_MS Flag for a Fund with Positive Front Load in MS (snapshot from fund_ops)

fund_age_MS Fund Age (in days) in MS

month Month in MS

nav_MS Monthly Fund Net Assets in MS

nav_lag_MS Lagged Monthly Fund Net Assets in MS

num_cat_MS Sum of the Six Fund Style Dummy in MS

ratings_MS Monthly Fund Ratings in MS

return_MS Monthly Fund Return in MS

return_idx_MS Return of Index in MS

secid_idx_MS Secid of Index in MS

share_classes_fundid_MS Number of Share Classes in MS

turnover_MSavg Annual Turnover Ratio in MS

turnover_MSavg_lag 1-Month Lagged Annual Turnover Ratio in MS

v12B_1_MS 12b_1 Free in MS

year Year in MS

Table 4: Final Morningstar Dataset Contents, Continued

(B) Algorithm for Merging CRSP & Morningstar Data

Step 1: Merge CRSP and MS by Ticker at the share-class level. Both CRSP and MS datasets should have a unique Ticker per share class.

Step 2: Check matching quality by comparing CRSP’s and Morningstar’s data on each fund’s returns and total net assets. We say that a fund is “well matched” if and only if: (1) the 60th^ percentile of the absolute value of the difference between CRSP and MS monthly return is less than 5 basis points, and (2) the 60th^ percentile of the absolute difference between CRSP and MS monthly total net assets is less than $100,000.

If a fund is well matched, we accept its secid—crsp_fundno mapping as valid. Otherwise, we proceed to the Step 3.

We find that 24,288 share classes^6 (48.1% of 50,536 CRSP fund share classes, 57.0% of 42, MS fund share classes) are well matched by Ticker.

Step 3: Merge CRSP and MS using the CUSIP ( latest per crsp_fundno ) at the share-class level. Perform Step 2 to validate matches.

If well matched, accept the secid—crsp_fundno mapping. Otherwise proceed to Step 4.

We find that 9,798 additional share classes (19.4% of 50,536 CRSP share classes, 23.0% of 42,575 MS share classes) are well matched by the CUSIP ( latest per crsp_fundno ).

Step 4: Merge CRSP and MS using the CUSIP ( second latest ) per crsp_fundno at the share-class level. Perform Step 2 to validate matches.

If well matched, accept the secid—crsp_fundno mapping. Otherwise, keep the remaining unmatched fund share classes till Step 6.

Only 56 additional share classes are well matched by the earlier CUSIP.

Step 5:

(^6) BB apply thresholds of 2bp and $20,000. If we use those same thresholds, we find only 1,086 valid matches. We

relax the return and asset matching threshold to 5bp/$100,000 to obtain more matches. Although looser than the BB thresholds, our thresholds are still quite demanding.

We say that a fund (identified by fundid in Morningstar) is “completely matched” if all the MS secid' s belonging to the fundid are well matched to crsp_fundno' s. We say a fund is “partially matched” if at least one share class is well matched and at least one share class (in either CRSP or MS) is not.

At this step we find that 9,711 funds (65.8% of 14,765 MS fundid ’s) are completely matched. 1,572 funds (10.6%) are partially matched. 3,482 funds (23.6%) are not matched at all.

Step 6: The goal of this step is to identify additional matches by comparing funds’ names between CRSP and Morningstar. Only 212 additional share classes are well matched by this text-based approach_._

The steps are as follows:

a. Using the full CRSP database, create variables crsp_group_id and crsp_subclass from fund_name_CRSP , following pages 19-20 of BB’s documentation.^7 The variable crsp_group_id assigns a shared identifier to all share classes belonging to the same fund. The variable crsp_subclass assigns a share type to different share classes. b. For each crsp_group_id, check whether at least one share class in the group is well matched to a MS share class. If so, assign that share class’s fundid to all members of the crsp_group_id. For example, if fund share class A (well matched) and fund share class B (not yet matched) share the same crsp_group_id , then B will be assigned A’s fundid. c. Remove the following funds, which would either only yield partially matched cases or be useless for text-based matching: (c1) A crsp_group_id is associated with more funds in CRSP than its associated fundid indicates in MS. For example, if a crsp_group_id is associated with 3 crsp_fundno 's while the fundid is only associated with 2 secid 's. We remove this fund because we presume Morningstar is missing data on at least one share class. (c2) A crsp_group_id is associated with insufficient number of non-matched fund share classes. For example, a crsp_group_id is associated with 4 crsp_fundno ’s – 3 well matched and 1 not yet matched. The associated MS fundid is related to 5 fund share classes. In this case, even if we match the one fund through text-based approach, we would still end up with only a partially matched case. We presume in this case that CRSP is missing data on at least one share class. (c3) A crsp_group_id is associated with only well-matched fund share classes. They offer no incremental benefits for our text-based matching.

(^7) The relevant excerpt from their documentation is at the end of this document. We deviate slightly from BB when

we break the fund name by “/” to assign crsp_group_id and crsp_subclass. We notice some fund names such as “Vanguard equity/bond balance fund” should not be assigned a crsp_subclass as “bond balance fund”, but rather the whole fund name should be assigned crsp_group_id. After compressing the typical words (e.g. ‘class’, ‘share’) in share class, we decided to not break the fund name by “/” if what follows contains more than 11 letters. 11 is the 99 th^ percentile for number of letters that come after “/”.

Table 5: Share Type Processing

crsp_subclass with keywords crsp_subclass compressed Processed to MS share_type

Ashares / (A) 1 A

BShares / (B) / B(1) 2 B

CShares / (C) / C- 3 C

DShares / (D) 4 D

MShares / (M) M

NShares / (N) N

SShares / (S) S

TShares / (T) T

Investor / Invest / Consumer / Inv

Institutional / Instl / IB / Initial / I2 / IA / II I / Y Inst

Retire / R1 / R2 / R3 / R4 / R5 / R 5 / R6 / RetA / RetB R / K Retireme

Advisor / Adviser / Admin / Administrative / Admiral / Consultant / Service / Retail

Adv

No Load / NoLoad NoLoad

Nav / BlackRock E / F / H / L / W / Z Other

All else All else NA

4. Merging CRSP and Morningstar Databases

Using the concordance file created in Section 3, we then perform the following three steps to merge CRSP and Morningstar data:

(1) Create a CRSP dataset that only contains completely matched funds. The resulting smaller CRSP dataset contains 2,936,897 observations compared to the 4,784,162 observations in the larger CRSP dataset created in Section 1.

(2) Create a Morningstar dataset that only contains completely matched funds. The resulting smaller Morningstar dataset contains 3,722,775 observations compared to the 5,112, observations in the larger Morningstar dataset created in Section 1.

(3) Merge the CRSP dataset and Morningstar dataset created in (1) and (2) by share class/year/month. We use the CRSP dataset as the master file for the merge. The resulting dataset contains 2,936,897 share class / month observations for 8,807 funds and 27,414 share classes.