Advances in Financial Machine Learning
Advances in Financial Machine Learning

Advances in Financial Machine Learning

As it relates to finance, this is the most exciting time to adopt a disruptive technology that will transform how everyone invests for generations. (LocationĀ 698)

Books about investments largely fall in one of two categories. On one hand we find books written by authors who have not practiced what they teach. They contain extremely elegant mathematics that describes a world that does not exist. (LocationĀ 701)

On the other hand we find books written by authors who offer explanations absent of any rigorous academic theory. They misuse mathematical tools to describe actual observations. Their models are overfit and fail when implemented. (LocationĀ 703)

Beating the wisdom of the crowds is harder than recognizing faces or driving cars. (LocationĀ 719)

The rate of failure in quantitative finance is high, particularly so in financial ML. (LocationĀ 723)

Discretionary portfolio managers (PMs) make investment decisions that do not follow a particular theory or rationale (if there were one, they would be systematic PMs). (LocationĀ 728)

They may rationalize those decisions based on some story, but there is always a story for every decision. (LocationĀ 730)

If you have ever attended a meeting of discretionary PMs, you probably noticed how long and aimless they can be. (LocationĀ 732)

If you have been asked to develop ML strategies on your own, the odds are stacked against you. (LocationĀ 745)

Every successful quantitative firm I am aware of applies the meta-strategy paradigm (LĆ³pez de Prado [2014]). (LocationĀ 751)

No particular individual is responsible for these discoveries, as they are the outcome of team efforts where everyone contributes. (LocationĀ 757)

This is the station responsible for transforming raw data into informative signals. These informative signals have some predictive power over financial variables. Team members are experts in information theory, signal extraction and processing, visualization, labeling, weighting, classifiers, and feature importance techniques. (LocationĀ 795)

Such a finding is not an investment strategy on its own, and can be used in alternative ways: execution, monitoring of liquidity risk, market making, position taking, etc. (LocationĀ 799)

A strategist will parse through the libraries of features looking for ideas to develop an investment strategy. (LocationĀ 805)

The goal of the strategist is to make sense of all these observations and to formulate a general theory that explains them. (LocationĀ 807)

Team members are data scientists with a deep knowledge of financial markets and the economy. Remember, the theory needs to explain a large collection of important features. (LocationĀ 808)

Initially, the strategy is run on data observed after the end date of the backtest. Such a period may have been reserved by the backtesters, or it may be the result of implementation delays. (LocationĀ 836)

At this point, the strategy is run on a live, real-time feed. In this way, performance will account for data parsing latencies, calculation latencies, execution delays, and other time lapses between observation and positioning. (LocationĀ 839)

Many investment managers believe that the secret to riches is to implement an extremely complex ML algorithm. They are setting themselves up for a disappointment. (LocationĀ 854)

Amateurs develop individual strategies, believing that there is such a thing as a magical formula for riches. In contrast, professionals develop methods to mass-produce strategies. The money is not in making a car, it is in making a car factory. (LocationĀ 980)

Think like a business. Your goal is to run a research lab like a factory, where true discoveries are not born out of inspiration, but out of methodic hard work. (LocationĀ 982)

Most discoveries in finance are false, due to multiple testing and selection bias. (LocationĀ 989)

Whatever you do, always ask yourself in what way you may be overfitting. Be skeptical about your own work, and constantly challenge yourself to prove that you are adding value. (LocationĀ 991)

The flexibility and power of ML techniques have a dark side. When misused, ML algorithms will confuse statistical flukes with patterns. (LocationĀ 1002)

The core audience of this book is investment professionals with a strong ML background. My goals are that you monetize what you learn in this book, help us modernize finance, and deliver actual value for investors. (LocationĀ 1048)

Once you have managed an investment portfolio long enough, the rules of the game will become clearer to you, along with the meaning of these chapters. (LocationĀ 1054)

Investment management is one of the most multi-disciplinary areas of research, and this book reflects that fact. (LocationĀ 1057)

Python has become the de facto standard language for ML, and I have to assume that you are an experienced developer. You must be familiar with scikit-learn (sklearn), pandas, numpy, scipy, multiprocessing, matplotlib and a few other libraries. (LocationĀ 1060)

This is where the bulk of automation has taken place so far, transforming the financial markets into ultra-fast, hyper-connected networks for exchanging information. (LocationĀ 1069)

An ML algorithm can spot patterns in a 100-dimensional world as easily as in our familiar 3-dimensional one. And while we all laugh when we see an algorithm make a silly mistake, keep in mind, algorithms have been around only a fraction of our millions of years. (LocationĀ 1085)

Not at all. No human is better at chess than a computer. And no computer is better at chess than a human supported by a computer. Discretionary PMs are at a disadvantage when betting against an ML algorithm, but it is possible that the best results are achieved by combining discretionary PMs with ML algorithms. (LocationĀ 1090)

In particular, Chapter 3 introduces a new technique called meta-labeling, which allows you to add an ML layer on top of a discretionary one. (LocationĀ 1094)

Financial ML methods do not replace theory. They guide it. An ML algorithm learns complex patterns in a high-dimensional space without being specifically directed. (LocationĀ 1110)

People mistrust what they do not understand. Their prejudices are rooted in ignorance, for which the Socratic remedy is simple: education. Besides, some of us enjoy using our brains, even though neuroscientists still have not figured out exactly how they work (a black box in itself). (LocationĀ 1118)

there are many shared generic problems you will face: data structuring, labeling, weighting, stationary transformations, cross-validation, feature selection, feature importance, overfitting, backtesting, etc. (LocationĀ 1128)

My advice is that you start by reading the references listed at the end of the chapter. When I wrote the book, I had to assume the reader was familiar with the existing literature, or this book would lose its focus. (LocationĀ 1141)

There are two reasons. First, backtest overfitting is arguably the most important open problem in all of mathematical finance. (LocationĀ 1148)

However, the reader may be surprised to learn that, in fact, U.S. National Laboratories are among the research centers with the longest track record and experience in using ML. (LocationĀ 1170)

In Chapter 22, Drs. Horst Simon and Kesheng Wu offer the perspective of a deputy director and a project leader at a major U.S. National Laboratory specializing in large-scale scientific research involving big data, high-performance computing, and ML. (LocationĀ 1176)

Snippet 3.1 computes the daily volatility at intraday estimation points, applying a span of span0 days to an exponentially weighted moving standard deviation. (LocationĀ 1889)