



































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
How investors should value financial data and the factors that affect an investor's willingness to pay for data and the equilibrium value of data. The authors outline a simple model that gives rise to sufficient statistics that make an investor's private value of data measurable. relevant to investment management, portfolio theory, and information choice. The most important US university that most likely has courses related to them is Columbia Business School. The document could be useful as study notes with a rate of 8 out of 10.
Typology: Study notes
1 / 43
This page cannot be seen from the preview
Don't miss anything!




































Abstract How should an investor value financial data? The answer is complicated as it not only depends on the investor himself, but also on the characteristics of all other investors. Portfolio size, risk aversions, trading horizon, and investment style affect an investor’s willingness to pay for data and the equilibrium value of data. Directly measuring all these characteristics of all investors is hopeless. Thus, we outline a simple model that gives rise to sufficient statistics that make an investor’s private value of data measurable. Our approach can value data that is public or private, about one or many assets, relevant for dividends or for sentiment. We find that investor characteristics always matter. What tempers the heterogeneity in how investors value data is market illiquidity. When investors’ trades move prices, the value of data falls, especially for the investors who value data most. The high sensitivity of the value of data to market liquidity, for high-data investors, suggests that modest fluctuations in market liquidity can eviscerate the value of financial firms whose main asset is financial data.
∗ †MIT Sloan, NBER and CEPR; [email protected] ‡Columbia Business School; [email protected] Business School, NBER, and CEPR; [email protected] ¶^ §NYU Stern School of Business and NBER; [email protected] mation choice.Thanks to Adrien Matray for valuable conversations and suggestions. Keywords: Data valuation, portfolio theory, infor-
Investment management firms are gradually transforming themselves from users of small data and simple asset pricing models to users of big data and computer-generated statistical models. Amidst this transformation, investors’ strategic focus is shifting from the choice of pricing model to the choice of data they acquire. A key question for modern financial firms is: How much should they be willing to pay for a stream of financial data? This project devises and puts to use a methodology to estimate this dollar value, based the investor’s own characteristics, but without needing to know the characteristics of others. From information-based theories, we know many qualitative features of firms that make data valuable – large firms, growth stocks, firms with risky payoffs, assets that are sensitive to news, assets that others are uninformed about. After all, data is simply a stream of digitized information. But for an investor who is considering purchasing a data set, knowing the representative investor’s theoretical value for the data is not very useful. An investor with a large portfolio values data more, while an investor who invests in a restricted set of assets values data less. An investor with lots of other data is less willing to pay for additional data, while an investor who trades more frequently might value data more or less. All these effects depend on the asset market equilibrium, which in turn depends on the characteristics of every other investor. Data value also depends on which other investors buy that same data. To make matters more complex, we also know that illiquidity or price impact of a trade make information less valuable Kacperczyk, Nosal, and Sundaresan (2021), but how this interacts with investor heterogeneity, quantitatively, is less understood. Our simple procedure to estimate the value of any data series, to an investor with specific characteristics, reveals enormous dispersion in how different investors value the same data. Unlike financial assets, data assets are not equally valuable to all. The dispersion in private valuations for data matters for our understanding of data markets because it suggest a low price elasticity of aggregate data demand. It is important to point out that our procedure leads to an estimate of private value to an investor, which could be different from a transaction price that one might observe when data
Our first exercise explores the role of investor wealth and risk preferences. We consider an investor who has a relative risk aversion of 2 and with an initial wealth of either a $ million or a $100 million. The latter case is equivalent to considering an investor with lower absolute risk aversion. Of course, such an investor values information more, but the extent depends greatly on market structure, i.e. on whether their trades have price impact or not. When markets are competitive and a trade has no impact on the market price, data values increase almost linearly with wealth – an investor with 100 times more wealth values data by almost 100 times more. But when trades do move the market price, in line with empirical estimates of price impact, the value of data falls by an order or magnitude. A trader with 100 times larger wealth values data less than 10 times more. This illustrates the general pattern we see of enormous heterogeneity in willingness to pay for data, that is substantially tempered by a modest degree of market illiquidity. The high sensitivity of data to changes market liquidity is interesting in its own right. It suggests that market liquidity is crucial for the value of financial data. Small changes in market liquidity can lead to large variation in data value. For firms whose main asset is financial data, these small market liquidity changes could represent high volatility in the value of such firms. This suggests a new avenue of liquidity effects in asset markets. As data becomes a more important asset for financial firms, the prices of financial firms may become increasingly sensitive to market liquidity. Our second exercise considers investors with different investment styles. Specifically, we analyze data value for investors who trade the market (the S&P 500) portfolio, only small firms, only large firms, only growth stocks or only value stocks. The final type of investor trades all five of the previous portfolios. Because each of these types uses a piece of data differently, they value the same piece of information differently. Unsurprisingly, the investor who actively trades all the portfolios values data most. We also find that investors in large firms and growth stocks also value data substantially more than a value or small-firm investor. Our third exercise quantifies how much the value of analyst forecast data depends on what
other data is in an investor’s database. We find considerable variation in data values when we vary the other data variables used. In general, the more series we add to the investor’s information set, the lower is the value ascribed to additional data. The extent of this change in value is sizable. This intuitive result illustrates the importance of accounting for many facets of investor heterogeneity. It also suggests that this dimension of heterogeneity can induce sizable heterogeneity in data valuations, and in turn, a low price elasticity of data demand. Our fourth exercise considers investors with a shorter trading horizon. Such differences are easy to accommodate with higher frequency observations on the data series and asset returns. We illustrate this by computing the value of data to an investor who trades over a quarterly horizon. We find that a shorter horizon makes data slightly less valuable. In- tuitively, our data are less useful in forecasting returns over a shorter horizon. Of course, it is possible that an investor who trades or rebalances his portfolio more frequently might ascribe a higher value to the data. We do not investigate this conjecture in this paper, in part due to data limitations, but our procedure can be extended for this purpose as well In exploring these examples, we also gain new insights about financial asset markets. We learn that the value of data assets is very sensitive to market liquidity. We typically think of market liquidity as something that affects only the value of financial assets, not the real value of a firm. But if illiquidity makes it harder or more expensive to execute profitable trades, the real value of financial data that informs such trades declines. The value of firms whose main asset is such data declines as well. As the importance of data asset grows, this channel through which market liquidity can affect the real value of firm assets grows in importance. Why do we need to estimate the value of data? Why not look at prices for data directly? One reason is that not all data prices are observed, either because the data is not traded, or it is traded privately. In other words, the data is an asset, and if it is owned by a firm but never traded, it does affect the value of the firm while its price is unknown. But even if all prices were observed, just like assets can be mispriced, data can be mispriced. Finally,
among investors, data types and equilibrium effects, there is a simple procedure to compute a value for data. Measures of the information content of prices, like those in Bai, Philippon, and Savov (2016) and Davila and Parlatore (2021) are used to infer how much the average investor in an asset knows. Such measures are related, in that they arise from a similar noisy rational expectations framework. But they answer a question about the quantity of information, not its value. Farboodi, Matray, Veldkamp, and Venkateswaran (2019)’s “initial value” of a unit of precision is not the value a firm would pay, is only valid for private signals about orthogonal assets, and does not account for any particular firm’s preferences, portfolio, existing data set or price impact. Our sufficient statistics approach is more relevant for demand estimation, much simpler to estimate and more robust to heterogeneity.
1 A Framework for Valuing Data
Since data is information, we build on the standard workhorse model of information in financial markets, the noisy rational expectations framework. To the framework, we add long- lived assets, imperfect competition, heterogeneity of preferences, wealth effects, investment styles, public, private or partly public signals and arbitrary correlation between assets and between various signals. We include these features because each one affects the value of information. Model extensions consider data about sentiment or order flow. Our contribution is not the modeling. Our contribution lies in showing how to estimate data valuations in such a rich and flexible model. The goal of the model is to show how, despite all the heterogeneity, the value of data can be reduced to a few sufficient statistics that are easy to compute. Later, we justify this rich modeling structure by showing that heterogeneity matters for data valuations. equilibrium considerations about what others know.
Assets We have N distinct risky assets in the economy indexed by j, with net supply given by ¯x. Each of these assets are claims to stream of dividends {djt}∞ t=0, where the vector dt is assumed to follow the auto-regressive process
dt+1 = μ + G(dt − μ) + yt+1.
Here, the exogenous dividend innovation shock yt+1 ∼ N (0, Σd) is assumed to be i.i.d. across time. We use subscript t for variables that are known before the end of period t. Thus, the dividend dt+1 and its innovation shock yt+1 both pertain to assets that are purchased in period t; both these shocks are observed at the end of period t.
Investors and investment styles In each period t, n overlapping generations investors, i ∈ [0, 1], are born, observe data, and make portfolio choices. The number of investors may be finite, which implies that markets are imperfectly competitive. We will also consider the limiting economy as n becomes infinite. In the following period t + 1, investors sell their assets, consume the dividends and the proceeds of their asset sale and exit the model. Each investor i born at date t has initial endowment ¯wit and utility over total, end-of-life consumption cit+1. At date t, investors choose their portfolio of risky assets, which is a vector qit of the number of shares held or each asset. They also choose holdings of one riskless asset with return r, subject to budget constraint
cit+1 = r (wit − q′ itpt) + q′ it (pt+1 + dt+1). (1)
An investor i may also be subject to an investment style constraint, which limits the set of risky assets they purchase. We denote this set of investable assets as Qi. Following, Koijen and Yogo (2019), we do not model the source of the constraint. However, many investors do describe their strategy as small-firm investing or value investing, which limits the assets they hold. We consider sets Qi that either set the holdings of some assets to
Equilibrium Solution To solve the model and derive the value of data, we first apply Bayes’ law to investors’ prior beliefs and data to form posterior beliefs about asset payoffs. Appendix A shows that investor i can aggregate her data. Getting this combination of private, public and price information is equivalent to getting an unbiased signal sit about the dividend innovation yt+1, with private signal noise ξit and public signal noise zt+1.
sit = yt+1 + ζitzt+1 + ξit
The term zt+1 ∼ N (0, Σz ) comes from the noise in public component of the any data. It is iid across time, with precision Σ− z 1. This public signal noise zt+1 pertains to assets that are purchased in period t and is observed at the end of period t. If investor i learned nothing from any public sources of information at date t, then ζit = 0 and this becomes a standard private signal. Similarly, ξit ∼ N (0, K it− 1 ) is the noise in the private component of the signal (iid across individuals and time), which has the precision Kit, orthogonal to the noise of the public component.
Next, we take a second-order approximation to the utility function. This allows us to write the unconditional and conditional expected utility at time t as
E [U (cit+1)] = ρiE [cit+1] − ρ (^2) i 2 V^ [cit+1]^ (3) E [U (cit+1) | Iit] = ρiE [cit+1 | Iit] − ρ (^2) i 2 V^ [cit+1^ | Iit]^.^ (4)
Here, ρi denotes the coefficient of absolute risk aversion for investor i, which can be an arbitrary function of their endowment wit. Finally, we show in the appendix that the exists an equilibrium price schedule that is linear in current dividend dt, future dividend innovations yt+1 that investors learn about through data, demand shocks xt+1 and the noise in public data zt+1.
pt = At + B(dt − μ) + Ctyt+1 + Dtxt+1 + Ftzt+1 (5)
Mapping Data Utility to Sufficient Statistics Our first result uses the law of iterated expectations to compute unconditional expectation (3) in terms of means and variances of the vector of asset return Rt, defined below. Since we have substituted out the optimal consumption, we replace the direct utility function which takes consumption as its argu- ment, with an indirect expected utility function U˜ which takes an information set Iit as its argument. In order to state the main result we need to define Rt, the vector of returns from buying each asset in investor i’s feasible investment set, at time t,
Rit := ζi ((pt+1 + dt+1)./p¯t − r). (6)
where ./ represents the element-by-element division of two vectors and ¯pt is a reference price for computing returns that has already been realized. The matrix ζi is an mi × N matrix of zeros and ones, where mi is the number of investable assets for investor i. Each row of ζi
impact-adjusted variances.
Lemma 2. Unconditional expected utility, for an investor with price impact dp/dqi is
U^ ˜ (Iit) = E [Rt]′^ Vˆ (^) i− 1 E [Rt] + Tr^ [ (V (Rt) − V (Rt | Iit)) Vˆ (^) i−^1 ] + rρi w¯it. (8)
where Vˆ (^) i− 1 := V˜ (^) i−^1
1 − 12 V (Rt | Iit) V˜ (^) i−^1
and V˜i := V (Rt | Iit) + (^) ρ^1 idq^ dpi (¯pt p¯′ t)−^2. Notice that if dp/dqi = 0, then Vˆ 2 i = V˜i = V (Rt | Iit). The result becomes the same as proposition 1. This formula explains another important features of our results. Multiplying dp/dqi is an investor’s risk tolerance 1/ρi. Since this is absolute risk aversion and we know that absolute risk aversion declines in wealth, one can interpret this as a proxy for investor wealth. A wealthier/larger investor has more price impact. An investor with a portfolio that is ten times larger faces ten times the price impact, per share of an asset sold. The price impact of all investors’ trades would seem to matter for the value of data. It does. But once again, it is captured by the variances. Other investors’ price impact enters this expression through the equilibrium price coefficient C. This, in turn, shows up in the mean and variance of Rt. Since we measure then mean and variance of R directly, we do not need to know what other firms market power is or work out its effect. That effect is already incorporated in our sufficient statistics.^4 As long as we can measure these sufficient statistics, and we know investor i’s market power, we can accurately compute the value of investor i’s data. As before, we value data as the difference between expected utility with and without the data. When we make this calculation, we are calculating the value of a firm doing a one-time, surprise deviation to a marginally higher level of data. What we are not doing is asking: If all the other firms know that this one firm will acquire slightly more data, how will their own data choices react? We are taking as given the best responses of all other firms. (^4) Market power does change the interpretation of C as a measure of price informativeness. But how one interprets the price coefficient C, in this case, does not affects its use in assessing data value.
The two key assumptions behind both the competitive and market power results are that price can be approximated as a linear function of innovations as in equation (5), and that individual i maximizes risk-adjusted return. In other words, this calculation is accurate as long as investors use linear factor models and maximize risk-adjusted return, even with potentially heterogeneous prices of risk.
Private, Public and Correlated Information At first pass, this result is unsurprising. This type of expected utility expression shows up in many noisy rational expectations models, dating back to Grossman and Stiglitz (1980). But what is surprising are all the heterogeneous model features that did not complicate this answer. In particular, this answer suggests that there is no real difference between the value of public and private information. Regardless of who else knows the data, it is valuable only for its ability to change the conditional forecast errors. But that conclusion flies in the face of what we know about information value (Glode, Green, and Lowery, 2012). The reason both can be correct is that the publicity of the data matters for the conditional variance. Private information is typically more valuable. That is picked up by our measure because private information is less likely to be impounded into price. In other words, information that everyone knows is less correlated with ((pt+1 + dt+1./p¯t)). Public information about (pt+1 + dt+1) is already impounded in ¯pt. In their ratio, it cancels out. Therefore, public information will be less correlated and less predictive of returns Rt. In short, who else knows a piece of data matters. But knowing the forecast errors captures the way in which this public knowledge matters. This is an incredibly helpful property because it relieves the econometrician of having to figure out who knows what. Conditional variances, or in other words, the size of forecast errors, are sufficient statistics. Similarly, the risk preferences of all market participants matter. However, the expected payoff E[Rt] captures the way in which risk preferences and investment mandates matter.
premium and choose the value that matches a preferred estimate of the equity premium. We do not follow that approach for two main reasons. First, this would reveal how the market values data, not how an individual investor, with particular characteristics should value data. It is the answer to a different question. Our question is about the individual’s value of data and how investor heterogeneity matters for data valuation. Second, it requires estimating most of the structural parameters of the model. As such, the estimates becomes much more sensitive to the exact model structure and choices of how to estimate each object, and counteracts the advantage of our simple sufficient statistics approach.
Data About Order Flow or Sentiment Many new data sources teach us about how others investors feel about an asset. For example, analyzing a twitter feed is unlikely to turn up new dividend information. But it might well correlate with the current price because it detects sentiment. Sentiment is something unrelated to the fundamental asset value, that affects current demand. In our model, the variable that moves current price in a way that is orthogonal to value is xt+1. So, we interpret sentiment as something that shows up in x, thus sentiment data are time-t signals about price noise xt+1. Put differently, our base model is set up to value data which are signals about future cash flows of a firm. But this tool can also be used to value data series about sentiment, order flow, or aspects of demand that are orthgonal to future cash flows but may affect the current price. In fact, Appendix C shows that such data can be valued using (7) and (9), just as if this were cash flow data. Of course, many structural aspects of this model with sentiment data change. If we were to estimate the underlying parameters from order flow data, many adjustments would be necessary. But the essence of Farboodi and Veldkamp (2020) is to show that such data can be used to remove the noise from the price signal and thus better forecast earnings. Doing this is functionally equivalent to trading against dumb money, a common practice for sophisticated traders with access to retail order flow. The fact that such trading activity can be formally represented as if sentiment/order flow data were being used in a linear combination with
current prices to forecast cash flows, means that estimating cashflows conditional on prices and sentiment data yields a valid estimate of data value.
2 Data and Estimation Procedure
First, we describe the estimation procedure. Then, we describe the data series used in the procedure and how exactly we arrive at data valuations.
Estimation Procedure The first step is to compute asset j’s returns, Rjt. The Rjt for each asset at each date t is an element of the vector Rt. To get the unconditional expected return E [Rt], we then average this time series E [Rt] = 1/T ∑Tt=1 Rt. Next, compute the variance of returns V [Rt]. Next, we regress the sequence of returns Rt’s on any already-owned data and the data being valued. Implementing this in practice would require an investor to be able to access the historical series of the data-set they are considering buying. Then, perform a simple, linear, ordinary least squares regression of returns Rt on all the variables, already owned and new, in the data set. The variance of the OLS residual represents V [Rt | Iit]. Finally, combining these elements, compute E [U (cit+1)]. We then repeat this procedure, excluding the data series of interest. In our empirical implementation, we use a set of observable controls as a proxy for existing data. The difference between the expected utility with and without this data is the value of that data source. Formally, given data, denoted Xt, and existing data, denoted Zt, we can estimate the data added precision V (Rt | Xt, Zt)−^1 and V (Rt | Zt)−^1 by estimating the following two regressions:
Rt = β 1 Xt + β 2 Zt + εXZt (10) Rt = γ 2 Zt + εZt (11)
pertains to exiting firms. Our preferred solution is to only consider periods during which a firm has non-missing information. Next, we winsorize the deflated values for assets, market capitalization and total dividends at 0.01% level. Henceforth, we refer to the market capitalization at the end of year for stock j divided by the assets in that year for stock j as the price pjt, and the total dividends normalized by assets in that year as djt. We calculate the excess returns as Rjt = pjt+1+d pjtjt+1 −pjt− rft , where we use the yield on Treasury bills (constant maturity rate, hereafter CMT) with one year maturity as the risk-free rate.
Forming Asset Portfolios The procedure described above can be used for any number and type of assets, including individual stocks. However, for expositional purposes, and to show more clearly the patterns in data value, we group assets into a small number of commonly-used portfolios, rather than work with a large number of individual stocks/assets. We then consider information portfolio choice between these portfolios and data about the payoff of each portfolio. As a result, we will have a smaller number of data values to consider. We group firms into Large and Small, based on whether they are above or below the me- dian value of market capitalization for all firms in our sample, in that year. Next, we classify firms into Growth and Value based on their book-to-market ratio (defined as the difference between total assets and long-term debt, divided by the firm’s market capitalization). Firms above the median value of book-to-market in a year are value firms, while those below the median are our growth firms. This gives us four portfolios – Small, Large, Growth and Value. The fifth portfolio is a market index (S&P500). We use value-weighted averages for excess returns for each portfolio as the return measure, where we weigh each firm’s return by its market capitalization.
Measuring Price Impact If an investor uses our data valuation tool to measure their own value of data, then presumably, that investor knows how much the price moves when they trade, on average. But for the purpose of illustrating the use of our tool, we need a
reasonable price impact estimate. Appendix B explores estimates of price impact from the literature. Hasbrouck (1991) finds that a $20000 trade moved prices by 0.3% on average. Since the reference price of one share of an asset is normalized to one in the model, a 0.3% price increase corresponds to a price that is 0.003 units higher. Therefore, we explore imperfectly competitive markets where dp/dqi = 0. 003 /20000. While this is a small number, it is large enough to illustrate a substantial effect.
Publicly Available Information When we value a stream of data, we need to take a stand on what else an investor already knows. Obviously, we as econometricians have no way of knowing that. But this is a tool designed from the investor’s perspective, for the investor to value a stream of data. That investor should know what other data they themselves regularly use. For the purposes of illustrating the use of the tool, we endow our hypothetical investor with some commonly-used and publicly-available data series. Specifically, we assume that they already observe the dividend yield (D/P ratio) for S&P500^5. In additional results, we also consider and investor who also has access to one or more of the following pieces of data: the yield on a 1-year Treasury bill (constant maturity rate)^6 , the consumption-wealth ratio (CAY) from Lettau and Ludvigson (2001) and a sentiment index from Baker and Wurgler (2006).
The Data Stream We Value: IBES forecasts One could use this tool to value any finance-relevant data stream or bundle of data streams. To explore how variable investors’ valuations can be for a very standard data series, we consider the value of the earnings forecasts provided by the Institutional Brokers Estimate System (IBES).^7. Our data contains (^5) Obtained from NASDAQ Quandl https://data.nasdaq.com/data/MULTPL/SP500_DIV_YIELD_ MONTH-sp-500-dividend-yield-by-month 6 7 Obtained from FRED series^ DGS We use the Summary Statistics series from IBES, accessed through WRDS, https://wrds- www.wharton.upenn.edu/pages/get-data/ibes-thomson-reuters/ibes-academic/summary-history/summary-