Delta hedging is an options strategy that utilizes delta aiming to reduce the risk associated with price movements in the underlying asset while minimizing trading costs, by taking offsetting long or short positions. We employed Deep Reinforcement Learning (DRL) to address this hedging problem in a realistic setting, including discrete time trading with high level of market friction. (Page 1)

We are able to show that PPO has the best performance among all other DRL algorithms. Moreover, PPO has significantly shorter training time and generates more financially sensible policy than other DRL methods. (Page 2)

DQL and PPO with reward clipping will have crashed reward if training takes too long. DQL with Pop-art can fix this issue. In general, PPO has the fastest convergence speed than all other methods. (Page 9)

All DRL agents have a similar policy to baseline delta hedging. (Page 9)

AIl DRL agents find more optimal strategy as the average realized vol are much lower compared to baseline delta, but slightly large than zero as financially, given discrete time trading, DRL agents tend to be off a bit in between hedging time. (Page 9)

All DRL agents realize much lower average cost while maintaining the hedge, showing their capability between trading error and costs. (Page 10)

Overall PPO achieves better performance in terms of its lower average cost at 54.87 and standard deviation at 12.70, compared to both DQL and DQL with Pop-art, at the sacrifice of slightly higher volatility of total P&L, representing its more cost-conscious decision at trade-off between costs and trading errors (Page 10)

All DRL agents outperform Delta as their t-statistic of P&L are much more often close to zero and insignificant. (Page 10)

All agents trade less when cost is implemented and the number of random actions (individual dots deviating from the piecewise segments)decreases. (Page 10)