Advances in Financial Machine Learning
Advances in Financial Machine Learning

Advances in Financial Machine Learning

As it relates to finance, this is the most exciting time to adopt a disruptive technology that will transform how everyone invests for generations. (Location 698)

Books about investments largely fall in one of two categories. On one hand we find books written by authors who have not practiced what they teach. They contain extremely elegant mathematics that describes a world that does not exist. (Location 701)

On the other hand we find books written by authors who offer explanations absent of any rigorous academic theory. They misuse mathematical tools to describe actual observations. Their models are overfit and fail when implemented. (Location 703)

Beating the wisdom of the crowds is harder than recognizing faces or driving cars. (Location 719)

The rate of failure in quantitative finance is high, particularly so in financial ML. (Location 723)

Discretionary portfolio managers (PMs) make investment decisions that do not follow a particular theory or rationale (if there were one, they would be systematic PMs). (Location 728)

They may rationalize those decisions based on some story, but there is always a story for every decision. (Location 730)

If you have ever attended a meeting of discretionary PMs, you probably noticed how long and aimless they can be. (Location 732)

If you have been asked to develop ML strategies on your own, the odds are stacked against you. (Location 745)

Every successful quantitative firm I am aware of applies the meta-strategy paradigm (López de Prado [2014]). (Location 751)

No particular individual is responsible for these discoveries, as they are the outcome of team efforts where everyone contributes. (Location 757)

This is the station responsible for transforming raw data into informative signals. These informative signals have some predictive power over financial variables. Team members are experts in information theory, signal extraction and processing, visualization, labeling, weighting, classifiers, and feature importance techniques. (Location 795)

Such a finding is not an investment strategy on its own, and can be used in alternative ways: execution, monitoring of liquidity risk, market making, position taking, etc. (Location 799)

A strategist will parse through the libraries of features looking for ideas to develop an investment strategy. (Location 805)

The goal of the strategist is to make sense of all these observations and to formulate a general theory that explains them. (Location 807)

Team members are data scientists with a deep knowledge of financial markets and the economy. Remember, the theory needs to explain a large collection of important features. (Location 808)

Initially, the strategy is run on data observed after the end date of the backtest. Such a period may have been reserved by the backtesters, or it may be the result of implementation delays. (Location 836)

At this point, the strategy is run on a live, real-time feed. In this way, performance will account for data parsing latencies, calculation latencies, execution delays, and other time lapses between observation and positioning. (Location 839)

Many investment managers believe that the secret to riches is to implement an extremely complex ML algorithm. They are setting themselves up for a disappointment. (Location 854)

Amateurs develop individual strategies, believing that there is such a thing as a magical formula for riches. In contrast, professionals develop methods to mass-produce strategies. The money is not in making a car, it is in making a car factory. (Location 980)

Think like a business. Your goal is to run a research lab like a factory, where true discoveries are not born out of inspiration, but out of methodic hard work. (Location 982)

Most discoveries in finance are false, due to multiple testing and selection bias. (Location 989)

Whatever you do, always ask yourself in what way you may be overfitting. Be skeptical about your own work, and constantly challenge yourself to prove that you are adding value. (Location 991)

The flexibility and power of ML techniques have a dark side. When misused, ML algorithms will confuse statistical flukes with patterns. (Location 1002)

The core audience of this book is investment professionals with a strong ML background. My goals are that you monetize what you learn in this book, help us modernize finance, and deliver actual value for investors. (Location 1048)

Once you have managed an investment portfolio long enough, the rules of the game will become clearer to you, along with the meaning of these chapters. (Location 1054)

Investment management is one of the most multi-disciplinary areas of research, and this book reflects that fact. (Location 1057)

Python has become the de facto standard language for ML, and I have to assume that you are an experienced developer. You must be familiar with scikit-learn (sklearn), pandas, numpy, scipy, multiprocessing, matplotlib and a few other libraries. (Location 1060)

This is where the bulk of automation has taken place so far, transforming the financial markets into ultra-fast, hyper-connected networks for exchanging information. (Location 1069)

An ML algorithm can spot patterns in a 100-dimensional world as easily as in our familiar 3-dimensional one. And while we all laugh when we see an algorithm make a silly mistake, keep in mind, algorithms have been around only a fraction of our millions of years. (Location 1085)

Not at all. No human is better at chess than a computer. And no computer is better at chess than a human supported by a computer. Discretionary PMs are at a disadvantage when betting against an ML algorithm, but it is possible that the best results are achieved by combining discretionary PMs with ML algorithms. (Location 1090)

In particular, Chapter 3 introduces a new technique called meta-labeling, which allows you to add an ML layer on top of a discretionary one. (Location 1094)

Financial ML methods do not replace theory. They guide it. An ML algorithm learns complex patterns in a high-dimensional space without being specifically directed. (Location 1110)

People mistrust what they do not understand. Their prejudices are rooted in ignorance, for which the Socratic remedy is simple: education. Besides, some of us enjoy using our brains, even though neuroscientists still have not figured out exactly how they work (a black box in itself). (Location 1118)

there are many shared generic problems you will face: data structuring, labeling, weighting, stationary transformations, cross-validation, feature selection, feature importance, overfitting, backtesting, etc. (Location 1128)

My advice is that you start by reading the references listed at the end of the chapter. When I wrote the book, I had to assume the reader was familiar with the existing literature, or this book would lose its focus. (Location 1141)

There are two reasons. First, backtest overfitting is arguably the most important open problem in all of mathematical finance. (Location 1148)

However, the reader may be surprised to learn that, in fact, U.S. National Laboratories are among the research centers with the longest track record and experience in using ML. (Location 1170)

In Chapter 22, Drs. Horst Simon and Kesheng Wu offer the perspective of a deputy director and a project leader at a major U.S. National Laboratory specializing in large-scale scientific research involving big data, high-performance computing, and ML. (Location 1176)

Snippet 3.1 computes the daily volatility at intraday estimation points, applying a span of span0 days to an exponentially weighted moving standard deviation. (Location 1889)