## Agent-based modeling and simulation in US equity markets – Part 3

This is the final post of an introduction to agent-based models in US equity markets. The first post provided a definition of a model and a brief overview of how economists use models to simplify reality. The second post introduced agent-based models, simulation, and how they are related. We will conclude with an introduction to the polar types of agents: zero-intelligence and learning agents.

Zero intelligence versus learning agents

Gode and Sunder (1993) define a zero intelligence (ZI) trader as a trader that “has no intelligence, does not seek or maximize profits, and does not observe, remember, or learn.” Farmer et al. (2005) also provide a description: “The model makes the simple assumption that agents place orders to buy or sell at random, subject to constraints imposed by current prices.” They go on to explain that their ZI traders do observe current prices – a deviation from the previous definition. The constraints are essentially dynamic bounds on limit order prices.

The agents in my Tick Pilot paper are ZI as characterized in Farmer et al. (2005): they observe the top of book (price and size) and place orders to buy or sell consistent with their pricing heuristic. The buying and selling choice is random subject to pricing constraints.

Holland et al. (1986) provide a description of learning agents. This description is summarized in Beinhocker (2006):

1. Agents interact with other agents and the environment.
2. The agent has a goal or set of goals and can perceive the gap between its current state and desired state.
3. The agent has a set of heuristics (rules of thumb) that map the current state into decisions. This is called the agent’s mental model.
4. The agent’s mental model tracks which rules have helped it achieve its goals. Historically successful rules are used more often than less successful rules. Feedback from the environment causes the agent to learn over time.

Holland and Miller (1991) define complex (1, 2, 3) adaptive (4, 5) systems:

1. A network of interacting agents.
2. Exhibits dynamic behavior that emerges from the individual agent activities.
3. Aggregate behavior can be described without detailed knowledge of the individual agents.
4. Agent actions can be assigned a value (payoff, fitness, utility).
5. Agent behaves so as to increase this value over time.

Beinhocker (2006) summarizes the evolutionary (learning) approach: differentiate, select, amplify. The evolutionary approach is implemented on a computer with genetic algorithms.

Genetic algorithm

Wikipedia gets the final word: In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on bio-inspired operators such as mutation, crossover and selection.

There’s a lot going on in this definition. The links are helpful – especially to folks with a hard science background. But, I suspect it is all a bit of a mystery to those who never studied evolutionary biology, computer science, physics, etc. How would you explain zero intelligence, learning agents and genetic algorithms to an accountant, attorney or MBA?

In an upcoming post, I will provide some specific examples of failed attempts to communicate with a wider audience about simulation and agent-based modeling applied to US equity markets.

## Agent-based modeling and simulation in US equity markets – Part 2

This is the second of an occasional series of posts on the application of agent-based modeling to US equity markets. We left off with the definition of a model and a brief overview of how economists use models to simplify reality. Agent-based models can be used to simplify reality as well. This post will tackle the nuts and bolts of ABMs.

Simulation

According to Wikipedia:

1. Simulation is the imitation of the operation of a real-world process or system.
2. Monte Carlo methods (or Monte Carlo experiments) are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results.
3. discrete-event simulation (DES) models the operation of a system as a discrete sequence of events in time.

In my agent-based model of the US equity Tick Pilot, I employ simulation to imitate the real world process of order submission to a centralized limit order book and the system that book uses to match orders and generate prices and sizes for trades. Note: the US equity markets are not centralized – they are fragmented. The model simplification makes the simulation much easier to code and faster to run without harming any inference regarding market-maker profits or participation. I also employ Monte Carlo methods with pseudo random number generators to produce distributions of results, thereby facilitating statistical tests for significance. And finally, the model employs discrete-event simulation. The first discrete event is the time step. For each step, a subset of the traders is chosen and randomly queued. Only one agent trader can be interacting with the order book at a particular instant in computer time. This is a very simple discrete-event simulation. In reality, many traders are attempting to interact with the order book. But, there too, only one event can be processed at any very small point in time. Otherwise, there is indeterminacy in the book – much like when a database is being updated and accessed at the same time.

How would you explain simulation to an accountant, attorney or MBA?

Agent-based modeling

According to Wikipedia, an agent-based model (ABM) is a class of computational models for simulating the actions and interactions of autonomous agents (both individual or collective entities such as organizations or groups) with a view to assessing their effects on the system as a whole. This definition is simple enough but doesn’t really describe what an agent-based model is or even what an agent is. Bookstaber (2017) distills the essence of ABMs:

1. A set of agents that are typically heterogeneous and that can act with some degree of independence or autonomy. No centralized control.
2. At the start of each time period each agent observes its environment and acts according to its own heuristic. The agent’s environment is only a local view of the overall system.
3. Agent’s actions change the environment.
4. In the next period, each agent sees its new environment, altered based on the actions of the previous period, and takes action again. Thus there is an interaction between the agents and the environment, and between agents.
5. The components are the agents, the environment, heuristics, interactions, and dynamics.

Let’s apply the essence of ABMs to my Tick Pilot model to see if it satisfies Bookstaber’s view of an ABM.

1. The agents in my model are heterogeneous. There are a variety of agents, including generic liquidity providers, market makers, liquidity takers, and penny jumpers. Many of these agents have randomly generated arrival times (i.e., there is heterogeneity within agent classes in addition to among agent classes). Some agents choose limit order prices at random. The choice of their arrival times is independent and no agent is constrained by the activities of other agents.
2. All of the agents view the top of the book (i.e., the best bid and ask prices and associated sizes at those prices are the agent’s environment). The agents take actions based on that specific view of the book. The agents have no insight into what other agents have done or will do in the future.
3. When liquidity providers, market makers and penny jumpers arrive, they (usually) add orders to the book, thereby altering the state of the environment. These agents potentially cancel outstanding orders, which also alters the state of the environment. And finally, liquidity takers alter the book when they trade with standing limit orders.
4. After each discrete event in the queue during one time step, the selected agents view any changes to the top of the book and act accordingly. The takers and providers interact with each other via the book.
5. The components are the trader agents, the limit order book, the rules that the traders use to generate orders, the interactions with the limit order book, and the dynamics of price and size traded and available at each discrete step.

My model varies somewhat from Bookstaber’s essence of ABMs because of the artificiality of the time step and the fact that several events can occur in one time step. It is best to view each discrete event as occurring at a specific fraction of a time step. This is consistent with how real market data is generated: if two or more messages or transactions occur within the minimum time step (a microsecond in most US equity market feeds), then we do not know their exact time stamps, but we do know the order of the events within that microsecond. And, in my model, each selected (queued) agent receives an up-to-date view of the top of book before they act within each time step.

Tesfatsion provides seven requirements for agents and agent-based models (MP1 – MP7 are taken directly from the linked website without alteration):

(MP1) Agent Definition: An agent is a software entity within a computationally constructed world capable of acting over time on the basis of its own state, i.e., its own internal data, attributes, and methods.

(MP2) Agent Scope: Agents can represent individuals, social groupings, institutions, biological entities, and/or physical entities.

(MP3) Agent Local Constructivity: The action of an agent at any given time is determined as a function of the agent’s own state at that time.

(MP4) Agent Autonomy: Coordination of agent interactions cannot be externally imposed by means of free-floating restrictions, i.e., restrictions not embodied within agent states.

(MP5) System Constructivity: The state of the modeled system at any given time is determined by the ensemble of agent states at that time.

(MP6) System Historicity: Given initial agent states, all subsequent events in the modeled system are determined solely by agent interactions.

(MP7) Modeler as Culture-Dish Experimenter: The role of the modeler is limited to the setting of initial agent states and to the non-perturbational observation, analysis, and reporting of model outcomes.

Many of these are consistent with Bookstaber’s characterization of ABMs. But there doesn’t appear to be any role for the environment. I think if we re-characterize the limit order book as an institutional agent as defined in MP2, then there is some consistency.

How would you explain agents and agent-based modeling to an accountant, attorney or MBA?

## Agent-based modeling and simulation in US equity markets – Part 1

This is the first of an occasional series of posts on the application of agent-based modeling to US equity markets. I chose the US equity markets because it serves as the motivating paradigm for the vast majority of historical academic work on agent-based modeling of equity markets. Despite the appearance of such a narrow focus, this particular perspective is relevant because the history of US equity markets reflects and occasionally foretells the variety of market structures observed around the globe today. For example, prior to Regulation National Market System (Reg NMS), US equity markets were a combination of fragmented dealer-intermediated continuous two-sided auctions, call auctions, and over-the-counter networks. Nowadays, they are a combination of fragmented order-driven two-sided continuous auctions, electronically-intermediated call auctions, and a variety of dark pools.

There are three primary sources of agent-based models and/or simulation-based models of equity markets: academics (including academics in government), for-profit companies and not-for-profit government contractors. My discussion and examples will be focused on academic work because academics are the ones most likely to publish their results. It would come as no surprise to learn that hedge funds and other market participants employ simulations as a part of their R&D apparatus. For example, backtesting trading strategies with historical data requires a technical infrastructure that would also support some forms of ABM or simulation. However, for-profit trading firms are not very likely to publish their findings. Even not-for-profit government contractors (and academics who aspire to be) have an incentive to withhold critical model details, thereby making validation and replication difficult or impossible.

In the introduction to this website, I set out my true aims. One of these is increasing awareness of the utility of applying agent-based modeling to financial markets. Toward that end, I will begin with some definitions and provide some examples of how I think we can do better at communicating with market practitioners, regulators, and other professionals who are not susceptible to deep and lengthy discussions of the scientific method. I will source many of the definitions from Wikipedia; not because I think they are the final arbiter of scientific rigor, but because I don’t necessarily want to argue about definitions. I just want a common source everyone can agree is the source – even if they don’t agree with the precision (or lack thereof) contained within.

A model

Wikipedia defines a conceptual model as a representation of a system using general rules and concepts and a scientific model as a simplified and idealized understanding of physical systems. A model is a simplification of reality. Economists use a variety of models to simplify our true complex economy. Theoretical models are mathematical frameworks. Simplifications are introduced to facilitate analytical tractability. In other words, the framework (i.e., the construction of the model) is simplified in order to get answers, right or wrong. Econometric models are empirical statistical models, typically applied to real economic data. Simplifications involve specification (choosing a type of model that can be estimated or that suits the available data), variable selection (choosing variables that are consistent with the available data), and imposing (or ignoring) untested assumptions regarding the unbiasedness or consistency of parameter estimates. Yet economists continue to use these models because they can be useful despite their shortcomings. George Box, a statistician, summarized the practical use of statistical models in a paper with a section entitled “All models are wrong but some are useful.”

Agent-based models are simplifications of reality, too. To maintain usefulness, the agent-based modeler abstracts away features of the real economy that are unnecessary and accentuates features that are necessary for generating insights into the proposed problem. If the agent-based modeler is successful, the model is useful. But the model is still a simplification, and, if Box is right, it is wrong, too!

How would you explain economic modeling to an accountant, attorney or MBA?

In the next post, I will introduce simulation and agent-based modeling and discuss the relationship between them.

## Collecting the agent-based model simulation output results with Python

This is the final post of the coding project in support of replicating the results in my Tick Pilot Agent-Based Modeling paper. The previous post developed the Runner class for running a single simulation. The final step is to write a wrapper that imports the Runner and runs a set of related simulations and then collects output data to be used as inputs for the charts and tables in the paper. First I will walk through the wrapper code. Then I will provide some details for running the entire set of 4,800 simulations on AWS EC2 Ubuntu servers.

The wrapper strategy is to generate a temporary hdf5 file to hold a set of tables containing results for one simulation. This data is then aggregated and summarized (i.e., munged) and stored in a bunch of lists at the end of each simulation. After the final simulation, these interim results are stored as tables in a summary hdf5 file. These summary tables contain the inputs to the charts and tables of the paper. This strategy will become clear as we walk through it. The full code is available on GitHub as runwrapper2017mpi_r4.py. First, the imports:

import random
import time
import numpy as np
import pandas as pd

from pyziabm.runner2017mpi_r4 import Runner

Pandas and numpy help with the munging and hdf5 file manipulations. Numpy and random are used to generate seeds for the random numbers. Time is an optional import for timing the individual simulations. The final import is the Runner class. Runner generates an hdf5 file with 4 tables: trades, mmp, tob, and orders. After each individual simulation, these tables are aggregated to summary results and stored to collector lists: participation_collector, position_collector, profit_collector, spread_collector, canceltrade_collector, by_mm_collector, and returns_collector.

Participation

The first function reads the trades table into a pandas DataFrame, creates the summaries, and appends the summary as a dict to the participation_collector.

def participation_to_list(h5in, outlist):
if 'p999999' in lt_df.index:
lt_df.drop('p999999', inplace=True)
providers = ltsum_df.index.unique()
market_makers = [x for x in providers if x.startswith('m')]
market_makers.append('j0')
ltsum_df = ltsum_df.ix[market_makers]
part_dict = {'MCRun': j, 'MM_Participation': ltsum_df.loc['m0', 'Participation']}
if 'j0' in providers:
part_dict.update({'PJ_Participation': ltsum_df.loc['j0', 'Participation']})
outlist.append(part_dict)

This strategy is repeated with varying levels of pandas munging in the functions to follow.

Position

def position_to_list(h5in, outlist):
market_makers = mmcf_df.mmid.unique()
for mm in market_makers:
pos_dict = {}
pos_dict['MCRun'] = j
pos_dict['MarketMaker'] = mm
pos_dict['Min'] =  mmcf_df[mmcf_df.mmid == mm].position.min()
pos_dict['Max'] =  mmcf_df[mmcf_df.mmid == mm].position.max()
outlist.append(pos_dict)

Profit

def profit_to_list(h5in, outlist):
)
)
cash_flow = cash_flow.assign(NetCashFlow = cash_flow.CumulBuyCF + cash_flow.CumulSellCF)
temp_df = temp_df[['NetCashFlow', 'NetCFPerShare']]
outlist.append(temp_df)

def spread_to_list(h5in, outlist):
last_df = indf.groupby('timestamp').last()
last_df = last_df.loc[50:]
outlist.append(spread_dict)

def tradesrets_to_list(h5in, outlist):
minprice = indf.price.min()
maxprice = indf.price.max()

indf = indf.assign(ret = 100*indf.price.pct_change())
indf = indf.assign(abs_ret = np.abs(indf.ret))
lags = []
autocorr = []
abs_autocorr = []
for i in range(1,51):
ac = indf.ret.autocorr(lag = i)
aac = indf.abs_ret.autocorr(lag = i)
lags.append(i)
autocorr.append(ac)
abs_autocorr.append(aac)
ar_df = pd.DataFrame({'lag': lags, 'autocorrelation': autocorr, 'autocorrelation_abs': abs_autocorr})
ar_df.set_index('lag', inplace=True)
clustering_constant = np.abs(ar_df.autocorrelation_abs.sum()/ar_df.autocorrelation.sum())

'MeanRet': indf.ret.mean(), 'StdRet': indf.ret.std(), 'SkewRet': indf.ret.skew(),
'KurtosisRet': indf.ret.kurtosis(), 'MCRun': j}
outlist.append(returns_dict)

def canceltrade_to_list(h5in, outlist1, outlist2):

both_sum = pd.merge(lpsum_df, ltsum_df, how='right', left_index=True, right_index=True)
)
total_dict = {}
total_dict['MCRun'] = j
outlist1.append(total_dict)

market_makers = [x for x in traders if (x.startswith('m') or x.startswith('j'))]
for mm in market_makers:
cto_dict = {}
temp = both_sum.loc[mm, :]
cto_dict['MCRun'] = j
cto_dict['MarketMaker'] = mm
outlist2.append(cto_dict)

The final function is called after all of the simulations have run. It loads the interim collector lists into pandas DataFrames and saves each DataFrame as a table in a summary hdf5 file. There are lots of ways to do this. Pandas is convenient. A final note regarding the future of the ABM test bed: HDF5 is written in and compatible with C++. When the test bed code (i.e., all of the code in the previous posts up to and including the Runner) is re-written in C++, this last wrapper file will not require any changes to the bookkeeping functions.

def lists_to_h5(participation_list, position_list, profit_list, spread_list, canceltrade_list, by_mm_list, returns_list, h5out):
participation_df = pd.DataFrame(participation_list)
participation_df.set_index('MCRun', inplace=True)
participation_df.to_hdf(h5out, 'participation', append=True, format='table', complevel=5, complib='blosc')

position_df = pd.DataFrame(position_list)
position_df.to_hdf(h5out, 'position', append=True, format='table', complevel=5, complib='blosc')

profit_df = pd.concat(profit_list)
profit_df.to_hdf(h5out, 'profit', append=True, format='table', complevel=5, complib='blosc')

returns_df = pd.DataFrame(returns_list)
returns_df.set_index('MCRun', inplace=True)
returns_df.to_hdf(h5out, 'returns', append=True, format='table', complevel=5, complib='blosc')

by_mm_df = pd.DataFrame(by_mm_list)
by_mm_df.to_hdf(h5out, 'by_mm', append=True, format='table', complevel=5, complib='blosc')

User inputs are declared before running the loop. Most of these are fixed for the results portrayed in the paper. The empty collectors are created and the variable inputs are specified. In this case, whether the run includes a penny jumper (pj), the penny jumper alpha (alpha_pj), the trial number (trial_no) and the number of simulations are specified by the user. I chose to fix the mpi in this file and change it in the Runner, but mpi could be user-specified as well. The final input is the specification of the final summary hdf5 file. I alter this last input when I switch to the executable version of this file.

participation_collector = []
position_collector = []
profit_collector = []
by_mm_collector = []
returns_collector = []

# User inputs
#num_mms=1
#mm_maxq=1
#mm_quotes=12
#mm_quote_range=60
#mm_delta=0.05
#num_takers=100
#taker_maxq=1
#num_providers=38
#provider_maxq=1
#q_provide=0.5
#alpha=0.0375
#mu=0.001
#delta=0.025
#lambda0=100
#wn=0.001
#c_lambda=5.0
#run_steps=100000
#mpi=1
#h5filename='test.h5'
alpha_pj = 0.001
pj = False
trial_no = 801
end = 101

h5_out = 'C:\\path\\to\\h5 files\\Trial %d\\ABMSmallCapSum.h5' % trial_no

The final step of the code is to run the simulations in a loop. For each simulation in the loop:

1. the random seeds are set,
2. the interim hdf5 file is specified (note the trial number and the simulation run number),
3. the simulation is run conditional on the value of pj,
4. the results are summarized and stored to lists,
5. the interim hdf5 file is removed (optional here, but not in the unix version),
6. the run time is reported

After the loop is run, the final hdf5 file is created.

start = time.time()
print(start)
for j in range(1, end):
random.seed(j)
np.random.seed(j)
h5_file = 'C:\\Path\\to\\h5 files\\Trial %d\\smallcap_%d.h5' % (trial_no, j)
if pj:
market1 = Runner(alpha_pj=alpha_pj, h5filename=h5_file)
else:
market1 = Runner(h5filename=h5_file)

participation_to_list(market1.h5filename, participation_collector)
position_to_list(market1.h5filename, position_collector)
profit_to_list(market1.h5filename, profit_collector)
#    os.remove(market1.h5filename)

print('Run %d:  %.2f minutes' % (j, (time.time() - start)/60))
start = time.time()

lists_to_h5(participation_collector, position_collector, profit_collector, spread_collector, canceltrade_collector, by_mm_collector, returns_collector, h5_out)

The unix version of this file requires making the file executable, telling the OS where to find the python interpreter, specifying unique output hdf5 filenames, and allowing for the user-specified inputs as arguments to the script. The differences are portrayed in this last code snippet:

Code Block 11: executable, sys & os, use of sys.argv

#!/home/username/anaconda3/bin/python3

import os
import random
import sys
import time
import numpy as np
import pandas as pd

from pyziabm.runner2017mpi_r4 import Runner

...

alpha_pj = float(sys.argv[3])
pj = int(sys.argv[2])
trial_no = int(sys.argv[1])
end = 101

h5_out = '/home/username/h5/ABMSmallCapSum_%d.h5' % trial_no

You should be able to run the executable file (repeatedly) and use the final hdf5 files as inputs to a bunch of Jupyter Notebooks to generate the exact results portrayed in the paper. In the next post I will detail the steps for making a conda package from the pyziabm files and discuss the minor changes required to run the package import with the wrapper file.

## Coding the agent-based model simulation loop with Python

This blog continues the coding project in support of replicating the results in my Tick Pilot Agent-Based Modeling paper. The first and second blogs created and tested the limit order book. The third and fourth blogs created and tested the traders. The next step is to pull the book and traders together and run a simulation. The strategy is designed to enable a user to install a package, import the package, and instantiate from a Runner class. For example, from the command line:

~\$ conda install pyziabm

The user might have to specify the conda repo or download and install from local. See the Tick Pilot ABM project website for more details. Then from IPython or a Jupyter Notebook:

import pyziabm as pzi
pzi.Runner()

This command would run the simulation with a set of defaults and store some results in a table in an hdf5 file. The defaults are all keywords. The user can change the defaults by calling Runner with the keywords updated – in the spirit of how matplotlib gets things done:

pzi.Runner(mpi=1, h5filename='test2.h5', pj=True, alpha_pj=0.01)

The full code is available on GitHub as runner2017mpi_r4.py. As usual, the first step is to import some python packages. The traders and the orderbook were designed to be imported by the simulation module. We will import those as well.

import random
import numpy as np
import pandas as pd

from pyziabm.orderbook3 import Orderbook
from pyziabm.trader2017_r3 import Provider, Provider5, Taker, MarketMaker, MarketMaker5, PennyJumper

The __init__() method does all of the work in four major steps: create some useful attributes for later use, create the traders, orderbook and information environment, set up and run the simulation, and save some output. The first portion of __init__() demonstrates the keyword strategy and creates some attributes.

    def __init__(self, prime1=20, num_mms=1, mm_maxq=1, mm_quotes=12, mm_quote_range=60, mm_delta=0.025,
num_takers=50, taker_maxq=1, num_providers=38, provider_maxq=1, q_provide=0.5,
alpha=0.0375, mu=0.001, delta=0.025, lambda0=100, wn=0.001, c_lambda=1.0, run_steps=100000,
mpi=5, h5filename='test.h5', pj=False, alpha_pj=0):
self.alpha_pj = alpha_pj
self.q_provide = q_provide
self.lambda0 = lambda0
self.run_steps = run_steps+1
self.h5filename = h5filename

The second portion creates the traders and their arrival intervals, the order book and the information environment.

        self.t_delta_t, self.taker_array = self.make_taker_array(taker_maxq, num_takers, mu)
self.t_delta_p, self.provider_array = self.make_provider_array(provider_maxq, num_providers, delta, mpi, alpha)
self.t_delta_m, self.marketmaker_array = self.make_marketmaker_array(mm_maxq, num_mms, mm_quotes, mm_quote_range, mm_delta, mpi)
self.pennyjumper = self.make_pennyjumper(mpi)
self.exchange = Orderbook()
self.q_take, self.lambda_t = self.make_q_take(wn, c_lambda)
self.trader_dict = self.make_traders(num_takers, num_providers, num_mms)

The final portion prepares and runs the simulation and then saves output.

        self.seed_orderbook()
self.make_setup(prime1)
if pj:
self.run_mcsPJ(prime1)
else:
self.run_mcs(prime1)
self.out_to_h5()

We will take each of these steps in order and I will provide a brief overview of what’s going on in each of the methods. See the Tick Pilot Agent-Based Modeling paper for further details and a full description of the agents and the simulation strategy.

The make_taker_array(…) method creates the taker agents and their arrival intervals. The first three lines of code determine the trade size (size = 1 in the paper). The fourth line creates the random arrival intervals and the fifth and sixth lines prepare and create the Taker instances and store them in a numpy array. The arrival intervals are permanently associated with specific Taker instances via numpy arrays. We will make use of this later.

    def make_taker_array(self, maxq, num_takers, mu):
default_arr = np.array([1, 5, 10, 25, 50])
actual_arr = default_arr[default_arr<=maxq]
taker_size = np.random.choice(actual_arr, num_takers)
t_delta_t = np.floor(np.random.exponential(1/mu, num_takers)+1)*taker_size
takers_list = ['t%i' % i for i in range(num_takers)]
takers = np.array([Taker(t,i) for t,i in zip(takers_list,taker_size)])
return t_delta_t, takers

The make_provider_array(…) method follows a similar strategy while using an if block to specify whether the provider should use a unit (penny) pricing increment or a 5 unit increment.

    def make_provider_array(self, maxq, num_providers, delta, mpi, alpha):
default_arr = np.array([1, 5, 10, 25, 50])
actual_arr = default_arr[default_arr<=maxq]
provider_size = np.random.choice(actual_arr, num_providers)
t_delta_p = np.floor(np.random.exponential(1/alpha, num_providers)+1)*provider_size
providers_list = ['p%i' % i for i in range(num_providers)]
if mpi==1:
providers = np.array([Provider(p,i,mpi,delta) for p,i in zip(providers_list,provider_size)])
else:
providers = np.array([Provider5(p,i,mpi,delta) for p,i in zip(providers_list,provider_size)])
return t_delta_p, providers

The make_marketmaker_array(…) method also follows the same strategy. The market maker arrival interval is the same as the trade size. In the paper, the single market maker has a trade size of one and therefore appears once every simulation step.

    def make_marketmaker_array(self, maxq, num_mms, mm_quotes, mm_quote_range, mm_delta, mpi):
default_arr = np.array([1, 5, 10, 25, 50])
actual_arr = default_arr[default_arr<=maxq]
provider_size = np.random.choice(actual_arr, num_mms)
t_delta_m = maxq
marketmakers_list = ['m%i' % i for i in range(num_mms)]
if mpi==1:
marketmakers = np.array([MarketMaker(p,i,mpi,mm_delta,mm_quotes,mm_quote_range) for p,i in zip(marketmakers_list,provider_size)])
else:
marketmakers = np.array([MarketMaker5(p,i,mpi,mm_delta,mm_quotes,mm_quote_range) for p,i in zip(marketmakers_list,provider_size)])
return t_delta_m, marketmakers

The make_pennyjumper(…) method merely returns the single instance of the Penny Jumper.

    def make_pennyjumper(self, mpi):
return PennyJumper('j0', 1, mpi)

The Information Environment

The information environment includes a vector, q_take, that determines the probability a taker will submit a buy order and a vector, lambda_t, that serves as a parameter for a method that modifies the exponential distribution from which the Providers choose their prices.

    def make_q_take(self, s, c_lambda):
noise = np.random.rand(2,self.run_steps)
qt_take = np.empty_like(noise)
qt_take[:,0] = 0.5
for i in range(1,self.run_steps):
qt_take[:,i] = qt_take[:,i-1] + (noise[:,i-1]>qt_take[:,i-1])*s - (noise[:,i-1]<qt_take[:,i-1])*s
lambda_t = -self.lambda0*(1 + (np.abs(qt_take[1] - 0.5)/np.sqrt(np.mean(np.square(qt_take[0] - 0.5))))*c_lambda)
return qt_take[1], lambda_t

Preparing the Orderbook

Preparing the order book for the simulation involves seeding the book with one ask order and one bid order and then priming the book for twenty steps with just the Providers participating. The seed_orderbook(…) method accomplishes the seeding. The make_setup(…) method calls make_providers(…) to prime the book. For each time step, make_setup(…) calls make_providers(…) and loops through the returned active Providers: the Provider processes the top-of-book signal; the Exchange (orderbook) processes the Provider order and then updates the top-of-book, which serves as an input for the next step through the list of active Providers. make_providers(…) uses np.remainder() on the arrival interval vector to determine which Providers are active in any particular step. We will re-use this strategy in the main simulation loop to follow.

    def seed_orderbook(self):
seed_provider = Provider('p999999', 1, 5, 0.05)
ba = random.choice(range(1000005, 1002001, 5))
bb = random.choice(range(997995, 999996, 5))
qask = {'order_id': 'p999999_a', 'timestamp': 0, 'type': 'add', 'quantity': 1, 'side': 'sell',
'price': ba, 'exid': 99999999}
qbid = {'order_id': 'p999999_b', 'timestamp': 0, 'type': 'add', 'quantity': 1, 'side': 'buy',
'price': bb, 'exid': 99999999}
seed_provider.local_book['p999999_b'] = qbid
self.exchange.order_history.append(qbid)

def make_setup(self, prime1):
top_of_book = self.exchange.report_top_of_book(0)
for current_time in range(1, prime1):
for p in self.make_providers(current_time):
p.process_signal(current_time, top_of_book, self.q_provide, -self.lambda0)
self.exchange.process_order(p.quote_collector[-1])
top_of_book = self.exchange.report_top_of_book(current_time)

def make_providers(self, step):
providers = self.provider_array[np.remainder(step, self.t_delta_p)==0]
np.random.shuffle(providers)
return providers

Running the Simulation

The run_mcs(…) method steps through the remaining time, calling make_both(…) to determine which traders will participate in the time step and to randomize them (a misnomer, should name the method make_all(…)). A series of actions are specified as a function of trader type. Providers and MarketMakers add orders if their arrival interval matches the time step (that’s what “if row[1]:” determines) and potentially cancel orders regardless of whether their interval matches the time step or not. Takers add orders, too. A make_traders(…) method creates a dictionary of trader objects and their ids, thereby enabling liquidity provider lookup when a taker takes liquidity. This facilitates sending confirm messages to the liquidity providers when one of their resting orders is hit. The final block of code stores some of the larger history objects to an hdf5 file and resets the containers to empty.

    def run_mcs(self, prime1):
top_of_book = self.exchange.report_top_of_book(prime1)
for current_time in range(prime1, self.run_steps):
for row in self.make_both(current_time):
if row[1]:
row[0].process_signal(current_time, top_of_book, self.q_provide, self.lambda_t[current_time])
self.exchange.process_order(row[0].quote_collector[-1])
top_of_book = self.exchange.report_top_of_book(current_time)
row[0].bulk_cancel(current_time)
if row[0].cancel_collector:
for c in row[0].cancel_collector:
self.exchange.process_order(c)
if self.exchange.confirm_modify_collector:
row[0].confirm_cancel_local(self.exchange.confirm_modify_collector[0])
top_of_book = self.exchange.report_top_of_book(current_time)
if row[1]:
row[0].process_signal(current_time, top_of_book, self.q_provide)
for q in row[0].quote_collector:
self.exchange.process_order(q)
top_of_book = self.exchange.report_top_of_book(current_time)
row[0].bulk_cancel(current_time)
if row[0].cancel_collector:
for c in row[0].cancel_collector:
self.exchange.process_order(c)
if self.exchange.confirm_modify_collector:
row[0].confirm_cancel_local(self.exchange.confirm_modify_collector[0])
top_of_book = self.exchange.report_top_of_book(current_time)
else:
row[0].process_signal(current_time, self.q_take[current_time])
self.exchange.process_order(row[0].quote_collector[-1])
top_of_book = self.exchange.report_top_of_book(current_time)
if not np.remainder(current_time, 2000):
self.exchange.order_history_to_h5(self.h5filename)
self.exchange.sip_to_h5(self.h5filename)

def make_both(self, step):

takers_dict = dict(zip(['t%i' % i for i in range(num_takers)], list(self.taker_array)))
providers_dict = dict(zip(['p%i' % i for i in range(num_providers)], list(self.provider_array)))
takers_dict.update(providers_dict)
marketmakers_dict = dict(zip(['m%i' % i for i in range(num_mms)], list(self.marketmaker_array)))
takers_dict.update(marketmakers_dict)
if self.alpha_pj > 0:
takers_dict.update({'j0': self.pennyjumper})
return takers_dict

The run_mcsPJ(…) method has an additional block of code at the end of each step through the traders. This code determines whether a PennyJumper will be active after a trader shows up. If so, the PennyJumper has an opportunity to add and/or cancel orders. Note that the PennyJumper can participate zero, one, or many times during each time step.

    def run_mcsPJ(self, prime1):

...

if random.uniform(0,1) < self.alpha_pj:
self.pennyjumper.process_signal(current_time, top_of_book, self.q_take[current_time])
if self.pennyjumper.cancel_collector:
for c in self.pennyjumper.cancel_collector:
self.exchange.process_order(c)
if self.pennyjumper.quote_collector:
for q in self.pennyjumper.quote_collector:
self.exchange.process_order(q)
top_of_book = self.exchange.report_top_of_book(current_time)

...

The final step saves some results.

    def qtake_to_h5(self):
temp_df = pd.DataFrame({'qt_take': self.q_take, 'lambda_t': self.lambda_t})
temp_df.to_hdf(self.h5filename, 'qtl', append=True, format='table', complevel=5, complib='blosc')

def mm_profitability_to_h5(self):
for m in self.marketmaker_array:
temp_df = pd.DataFrame(m.cash_flow_collector)
temp_df.to_hdf(self.h5filename, 'mmp', append=True, format='table', complevel=5, complib='blosc')

def out_to_h5(self):
self.qtake_to_h5()
self.mm_profitability_to_h5()

If you have comments or suggestions, feel free to post them. Coming up are posts describing the wrapper file to replicate the results in the Working Paper and some notes on a simple Conda build process for creating a package.

## Coding some zero-intelligence traders with Python

This blog continues the coding project in support of replicating the results in my Tick Pilot Agent-Based Modeling paper. The first blog introduced the limit order book and the second described unit testing the order book. The next step is coding up some zero-intelligence traders – traders who follow simple rules and act randomly. The strategy employs object-oriented programming to enforce a single channel of communication from the trader to the order book and to reuse code whenever possible. There are several classes representing four basic trader types:

1. The ZITrader class is the base class from which the others inherit. This class defines the communication mechanism (i.e., the order) and enforces the idea that this is the only way a trader can send messages to the order book.
2. Two simple liquidity provider classes: Provider and Provider5, with Provider inheriting from ZITrader and Provider5 inheriting from Provider.
3. Two market maker classes: MarketMaker and MarketMaker5, with MarketMaker inheriting from Provider and MarketMaker5 inheriting from MarketMaker.
4. A Taker class inherits from ZITrader.
5. A PennyJumper class inherits from ZITrader.

The full code is available on GitHub as trader2017_r3.py. As usual, the first step is to import some python modules.

import random
import numpy as np

The base class is ZITrader. It contains the _make_add_quote() method and some supporting infrastructure for making and storing quotes.

class ZITrader(object):
'''
ZITrader generates quotes (dicts) based on mechanical probabilities.

A general base class for specific trader types.
Public attributes: quote_collector
Public methods: none
'''

def __init__(self, name, maxq):
'''
Initialize ZITrader with some base class attributes and a method

quote_collector is a public container for carrying quotes to the exchange
'''
self._max_quantity = maxq
self.quote_collector = []
self._quote_sequence = 0

def __repr__(self):

def _make_add_quote(self, time, quantity, side, price):
self._quote_sequence += 1
order_id = '%s_%d' % (self._trader_id, self._quote_sequence)
return {'order_id': order_id, 'timestamp': time, 'type': 'add', 'quantity': quantity,
'side': side, 'price': price}

In this simple model, I chose to make the quote a Python dict. In a more complex model, the actual quote could have been an instance of a separate class with ‘.’ access to the attributes.

Instances of the Provider class must be capable of several activities: generating inputs to the _make_add_quote() method, receiving confirmation messages from the order book, and canceling outstanding orders.

class Provider(ZITrader):
'''
Provider generates quotes (dicts) based on make probability.

Public methods: confirm_cancel_local, confirm_trade_local, process_signal, bulk_cancel
'''

def __init__(self, name, maxq, mpi, delta):
'''Provider has own mpi and delta; a local_book to track outstanding orders and a
cancel_collector to convey cancel messages to the exchange.
'''
self._mpi = mpi
self._delta = delta
self.local_book = {}
self.cancel_collector = []

def __repr__(self):

def _make_cancel_quote(self, q, time):
return {'type': 'cancel', 'timestamp': time, 'order_id': q['order_id'], 'quantity': q['quantity'],
'side': q['side'], 'price': q['price']}

def confirm_cancel_local(self, cancel_dict):
del self.local_book[cancel_dict['order_id']]

to_modify = self.local_book.get(confirm['order_id'], "WTF???")
if confirm['quantity'] == to_modify['quantity']:
self.confirm_cancel_local(to_modify)
else:
self.local_book[confirm['order_id']]['quantity'] -= confirm['quantity']

def bulk_cancel(self, time):
'''bulk_cancel cancels _delta percent of outstanding orders'''
self.cancel_collector.clear()
lob = len(self.local_book)
if lob > 0:
order_keys = list(self.local_book.keys())
orders_to_delete = np.random.ranf(lob)
for idx in range(lob):
if orders_to_delete[idx] < self._delta:
self.cancel_collector.append(self._make_cancel_quote(self.local_book.get(order_keys[idx]), time))

def process_signal(self, time, qsignal, q_provider, lambda_t):
'''Provider buys or sells with probability related to q_provide'''
self.quote_collector.clear()
if np.random.uniform(0,1) < q_provider:
else:
side = 'sell'
q = self._make_add_quote(time, self._max_quantity, side, price)
self.local_book[q['order_id']] = q
self.quote_collector.append(q)

def _choose_price_from_exp(self, side, inside_price, lambda_t):
'''Prices chosen from an exponential distribution'''
# make pricing explicit for now. Logic scales for other mpi.
plug = np.int(lambda_t*np.log(np.random.rand()))
if side == 'bid':
#price = np.int(5*np.floor((inside_price-1-plug)/5))
price = inside_price-1-plug
else:
#price = np.int(5*np.ceil((inside_price+1+plug)/5))
price = inside_price+1+plug
return price

The process_signal() method randomly chooses whether to buy or sell and calls _choose_price_from_exp() to establish the price. Then it does some bookkeeping by adding the quote (dict) to the individual trader’s local book and to the quote_collector, a list that conveys the message to the order book. _choose_price_from_exp() randomly selects a price increment (plug) from an exponential distribution and then computes a price based on the distance from the best price on the opposite side of the market. This design has a purpose: Providers never cross the spread! bulk_cancel() and _make_cancel_quote() randomly select some quotes on the local book to be canceled and pass the cancel messages on to the exchange in the cancel_collector list. The two remaining methods, confirm_cancel_local() and confirm_trade_local() receive messages from the order book and modify the local book appropriately.

One of the contributions of the Working Paper is quantifying the impact of increasing the minimum pricing increment from one to five ticks on some market quality measures and market maker profitability. Provider5 enforces pricing on a five tick grid by employing its own _choose_price_from_exp() method.

class Provider5(Provider):
'''
Provider5 generates quotes (dicts) based on make probability.

Subclass of Provider
'''

def __init__(self, name, maxq, mpi, delta):
'''Provider has own mpi and delta; a local_book to track outstanding orders and a
cancel_collector to convey cancel messages to the exchange.
'''
Provider.__init__(self, name, maxq, mpi, delta)

def _choose_price_from_exp(self, side, inside_price, lambda_t):
'''Prices chosen from an exponential distribution'''
# make pricing explicit for now. Logic scales for other mpi.
plug = np.int(lambda_t*np.log(np.random.rand()))
if side == 'bid':
price = np.int(5*np.floor((inside_price-1-plug)/5))
else:
price = np.int(5*np.ceil((inside_price+1+plug)/5))
return price

The MarketMaker is a special type of liquidity provider that inherits most of its behavior from Provider. Instances of the MarketMaker class submit multiple orders with prices chosen from uniform distribution. process_signal() accomplishes this by choosing a number of prices (_num_quotes) from a range of prices defined by the best price and the upper/lower limit (_quote_range). One of the research questions in the Working Paper focused on market maker profitability. Instances of MarketMaker track profitability by adding a few attributes (_position, _cash_flow, and cash_flow_collector), overriding the confirm_trade_local() method, and employing the _cumulate_cashflow() helper method.

class MarketMaker(Provider):
'''
MarketMaker generates a series of quotes near the inside (dicts) based on make probability.

Subclass of Provider
cash_flow_collector
Public methods: confirm_cancel_local (from Provider), confirm_trade_local, process_signal
'''

def __init__(self, name, maxq, mpi, delta, num_quotes, quote_range):
'''_num_quotes and _quote_range determine the depth of MM quoting;
_position and _cashflow are stored MM metrics
'''
Provider.__init__(self, name, maxq, mpi, delta)
self._num_quotes = num_quotes
self._quote_range = quote_range
self._position = 0
self._cash_flow = 0
self.cash_flow_collector = []

def __repr__(self):

'''Modify _cash_flow and _position; update the local_book'''
self._cash_flow -= confirm['price']*confirm['quantity']
self._position += confirm['quantity']
else:
self._cash_flow += confirm['price']*confirm['quantity']
self._position -= confirm['quantity']
to_modify = self.local_book.get(confirm['order_id'], "WTF???")
if confirm['quantity'] == to_modify['quantity']:
self.confirm_cancel_local(to_modify)
else:
self.local_book[confirm['order_id']]['quantity'] -= confirm['quantity']
self._cumulate_cashflow(confirm['timestamp'])

def _cumulate_cashflow(self, timestamp):
self.cash_flow_collector.append({'mmid': self._trader_id, 'timestamp': timestamp, 'cash_flow': self._cash_flow,
'position': self._position})

def process_signal(self, time, qsignal, q_provider):
'''
MM chooses prices from a grid determined by the best prevailing prices.
MM never joins the best price if it has size=1.
'''
# make pricing explicit for now. Logic scales for other mpi and quote ranges.
self.quote_collector.clear()
if random.uniform(0,1) < q_provider:
max_bid_price = qsignal['best_bid'] if qsignal['bid_size'] > 1 else qsignal['best_bid']-self._mpi
prices = np.random.choice(range(max_bid_price-self._quote_range+1, max_bid_price+1, self._mpi), size=self._num_quotes)
else:
side = 'sell'
for price in prices:
q = self._make_add_quote(time, self._max_quantity, side, price)
self.local_book[q['order_id']] = q
self.quote_collector.append(q)

MarketMaker5 enforces pricing on a five tick grid by overriding the MarketMaker process_signal() method. Two new attributes, _p5ask and _p5bid, assign probabilities to the discrete uniform price grid, thereby establishing the idea that the true reservation prices are still formulated on a one-tick grid. See the Working Paper for more details.

class MarketMaker5(MarketMaker):
'''
MarketMaker5 generates a series of quotes near the inside (dicts) based on make probability.

Subclass of MarketMaker
Public methods: process_signal
'''

def __init__(self, name, maxq, mpi, delta, num_quotes, quote_range):
'''
_num_quotes and _quote_range determine the depth of MM quoting;
_position and _cashflow are stored MM metrics
'''
MarketMaker.__init__(self, name, maxq, mpi, delta, num_quotes, quote_range)
self._p5ask = [1/20, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/30]
self._p5bid = [1/30, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/20]

def process_signal(self, time, qsignal, q_provider):
'''
MM chooses prices from a grid determined by the best prevailing prices.
MM never joins the best price if it has size=1.
'''
# make pricing explicit for now. Logic scales for other mpi and quote ranges.
self.quote_collector.clear()
if random.uniform(0,1) < q_provider:
max_bid_price = qsignal['best_bid'] if qsignal['bid_size'] > 1 else qsignal['best_bid']-self._mpi
prices = np.random.choice(range(max_bid_price-self._quote_range, max_bid_price+1, self._mpi), size=self._num_quotes, p=self._p5bid)
else:
side = 'sell'
for price in prices:
q = self._make_add_quote(time, self._max_quantity, side, price)
self.local_book[q['order_id']] = q
self.quote_collector.append(q)

The PennyJumper is also a liquidity provider, but with a simple rule: either be alone at the best (inside) price or leave the market. The PennyJumper can have a maximum of two quotes outstanding: _ask_quote and _bid_quote. The quoting rule is implemented in its own process_signal() method. After clearing out the collectors, the method checks for the existence of an available price at the inside. If there is one, then the PennyJumper randomly chooses the side of the quote. It then checks whether it is alone at the best price and cancels if not. If the PennyJumper has no quote (self._bid_quote is None, for example) then the PennyJumper adds a new quote to establish the best inside price. If there is no available price at the inside (i.e., the spread is equal to the minimum price increment), then the PennyJumper checks whether it is alone at the inside and cancels if not.

class PennyJumper(ZITrader):
'''
PennyJumper jumps in front of best quotes when possible

'''

def __init__(self, name, maxq, mpi):
'''
Initialize PennyJumper

cancel_collector is a public container for carrying cancel messages to the exchange
PennyJumper tracks private _ask_quote and _bid_quote to determine whether it is alone
at the inside or not.
'''
self._mpi = mpi
self.cancel_collector = []
self._bid_quote = None

def __repr__(self):

def _make_cancel_quote(self, q, time):
return {'type': 'cancel', 'timestamp': time, 'order_id': q['order_id'], 'quantity': q['quantity'],
'side': q['side'], 'price': q['price']}

'''PJ has at most one bid and one ask outstanding - if it executes, set price None'''
self._bid_quote = None
else:

def process_signal(self, time, qsignal, q_taker):
'''PJ determines if it is alone at the inside, cancels if not and replaces if there is an available price
point inside the current quotes.
'''
self.quote_collector.clear()
self.cancel_collector.clear()
if qsignal['best_ask'] - qsignal['best_bid'] > self._mpi:
# q_taker > 0.5 implies greater probability of a buy order; PJ jumps the bid
if random.uniform(0,1) < q_taker:
if self._bid_quote: # check if not alone at the bid
if self._bid_quote['price'] < qsignal['best_bid'] or self._bid_quote['quantity'] < qsignal['bid_size']:
self.cancel_collector.append(self._make_cancel_quote(self._bid_quote, time))
self._bid_quote = None
if not self._bid_quote:
price = qsignal['best_bid'] + self._mpi
q = self._make_add_quote(time, self._max_quantity, side, price)
self.quote_collector.append(q)
self._bid_quote = q
else:
side = 'sell'
q = self._make_add_quote(time, self._max_quantity, side, price)
self.quote_collector.append(q)
if self._bid_quote: # check if not alone at the bid
if self._bid_quote['price'] < qsignal['best_bid'] or self._bid_quote['quantity'] < qsignal['bid_size']:
self.cancel_collector.append(self._make_cancel_quote(self._bid_quote, time))
self._bid_quote = None
self._ask_quote = None

The Taker is the only liquidity taker in this model. process_signal() randomly chooses whether to buy or sell then makes an add quote guaranteed to cross the spread and take liquidity by choosing a price equal to zero for sells and 2,000,000 for buys.

class Taker(ZITrader):
'''
Taker generates quotes (dicts) based on take probability.

Public methods: process_signal
'''

def __init__(self, name, maxq):

def __repr__(self):

def process_signal(self, time, q_taker):
'''Taker buys or sells with 50% probability.'''
self.quote_collector.clear()
if random.uniform(0,1) < q_taker: # q_taker > 0.5 implies greater probability of a buy order
price = 2000000 # agent buys at max price (or better)
else:
price = 0 # agent sells at min price (or better)
side = 'sell'
q = self._make_add_quote(time, self._max_quantity, side, price)
self.quote_collector.append(q)

## Unit testing a simple limit order book with Python

This post is a follow-up to the previous post on building a simple limit order book with Python. After that original post, I learned that there were some quirks in how the WordPress editor handles the “less than” symbol in code blocks. That post has been updated and I will continue to monitor it for any more corruption.

I will walk through unit testing the orderbook methods with the unittest module. There are a variety of python testing alternatives. I chose unittest for two reasons: 1.) unittest is included in the standard library; and 2.) unittest works well with the Eclipse/PyDev Integrated Development Environment (IDE). Many Python aristocrats (Pythonistocrats?) have adopted Pytest because it is easier to implement with automated build processes. See for example the pandas documentation for a discussion of how they incorporate Pytest into their continuous integration services. I find the pandas documentation a very helpful resource for learning how to manage the test-build-ship process from GitHub.

Nearly every post on code testing will admonish you to implement test-driven development. But I admit that for me the combination of the orderbook and the tests is more like test-enhanced development: I would write the basics of the method, then test, and then repeat if necessary. To satisfy the rule on writing about testing, I recommend you use test-driven development!

Testing begins with creating a separate module, importing the module/class to be tested and unittest, and defining a test class that inherits from unittest.TestCase. The full code and directory structure is available in my GitHub repo. You might find it helpful to have the actual Orderbook code handy when walking through the tests.

from pyziabm.orderbook3 import Orderbook
import unittest

class TestOrderbook(unittest.TestCase):

There is a special method in unittest called setUp(). This method is called every time a test method is called. We will use it to provide a clean Orderbook instance and a set of known orders to each test.

    def setUp(self):
'''
setUp creates the Orderbook instance and a set of orders
'''
self.ex1 = Orderbook()
'price': 50}
'price': 50}
'price': 49}
'price': 47}
self.q1_sell = {'order_id': 't1_3', 'timestamp': 2, 'type': 'add', 'quantity': 1, 'side': 'sell',
'price': 52}
self.q2_sell = {'order_id': 't1_4', 'timestamp': 3, 'type': 'add', 'quantity': 1, 'side': 'sell',
'price': 52}
self.q3_sell = {'order_id': 't10_2', 'timestamp': 4, 'type': 'add', 'quantity': 3, 'side': 'sell',
'price': 53}
self.q4_sell = {'order_id': 't11_2', 'timestamp': 5, 'type': 'add', 'quantity': 3, 'side': 'sell',
'price': 55}

The testing strategy for the Orderbook instance (self.ex1) is similar for all of the tests: establish the state of the orderbook before calling the orderbook method (if necessary), create any needed inputs, call the method, and finally, test that the output matches what is expected. All test methods must begin with the word ‘test’. test_add_order_to_history() is a simple example of the strategy.

    def test_add_order_to_history(self):
'''
'''
h1 = {'order_id': 't1_5', 'timestamp': 4, 'type': 'add', 'quantity': 5, 'side': 'sell', 'price': 55}
self.assertFalse(self.ex1.order_history)
h1['exid'] = 1
self.assertDictEqual(h1, self.ex1.order_history[0])

In this test, h1 is the input dict. The next line asserts that the order history list is empty. After appending the exchange order id to the dict, we call the method with h1. Finally, we assert that the modified h1 dict from the test method matches the first (and only) dict in the exchange order history list. If the assertions pass (return True), then the test will pass. In PyDev, the tests are run from a menu. In the console area, it will return something like:

Finding files... done.
Importing test modules ... done.
----------------------------------------------------------------------
Ran 1 tests in 0.000s

OK

You can also run the test module from a shell: python –m unittest testOrderbook3.

test_add_order_to_book() follows a similar strategy. First we check that the price list and the book dict are both empty (because setUp() was just called). Then we add one order to the bid book with self.ex1.add_order_to_book(self.q1_buy) and test whether actual and expected are the same with simple assertions: assertTrue, assertEqual, asserDictEqual. We then add another order to check whether the incrementing portion of add_order_to_book is working correctly. Finally, we repeat the process for sell orders.

    def test_add_order_to_book(self):
'''
'''
self.assertFalse(self.ex1._bid_book_prices)
self.assertFalse(self.ex1._bid_book)
self.assertTrue(50 in self.ex1._bid_book_prices)
self.assertTrue(50 in self.ex1._bid_book.keys())
self.assertEqual(self.ex1._bid_book[50]['num_orders'], 1)
self.assertEqual(self.ex1._bid_book[50]['size'], 1)
self.assertEqual(self.ex1._bid_book[50]['num_orders'], 2)
self.assertEqual(self.ex1._bid_book[50]['size'], 2)
# 2 sell orders
self.assertDictEqual(self.ex1._ask_book[52]['orders'][self.q2_sell['order_id']], self.q2_sell)

test_remove_order() first adds two orders and checks the state of the book. Then it removes the two orders and checks the orderbook state after each removal. Finally, it also checks that removing an order that is not there causes no harm. The process is then repeated for sell orders.

    def test_remove_order(self):
'''
Add two  orders, remove the second order twice
'''
self.assertTrue(50 in self.ex1._bid_book_prices)
self.assertTrue(50 in self.ex1._bid_book.keys())
self.assertEqual(self.ex1._bid_book[50]['num_orders'], 2)
self.assertEqual(self.ex1._bid_book[50]['size'], 2)
self.assertEqual(len(self.ex1._bid_book[50]['order_ids']), 2)
# remove first order
self.assertEqual(self.ex1._bid_book[50]['num_orders'], 1)
self.assertEqual(self.ex1._bid_book[50]['size'], 1)
self.assertEqual(len(self.ex1._bid_book[50]['order_ids']), 1)
self.assertFalse('t1_1' in self.ex1._bid_book[50]['orders'].keys())
self.assertTrue(50 in self.ex1._bid_book_prices)
# remove second order
self.assertFalse(self.ex1._bid_book_prices)
self.assertEqual(self.ex1._bid_book[50]['num_orders'], 0)
self.assertEqual(self.ex1._bid_book[50]['size'], 0)
self.assertEqual(len(self.ex1._bid_book[50]['order_ids']), 0)
self.assertFalse('t1_2' in self.ex1._bid_book[50]['orders'].keys())
self.assertFalse(50 in self.ex1._bid_book_prices)
# remove second order again
self.assertFalse(self.ex1._bid_book_prices)
self.assertEqual(self.ex1._bid_book[50]['num_orders'], 0)
self.assertEqual(self.ex1._bid_book[50]['size'], 0)
self.assertEqual(len(self.ex1._bid_book[50]['order_ids']), 0)
self.assertFalse('t1_2' in self.ex1._bid_book[50]['orders'].keys())
# sell orders
# remove first order
self.ex1._remove_order('sell', 52, 't1_3')
# remove second order
self.ex1._remove_order('sell', 52, 't1_4')
# remove second order again
self.ex1._remove_order('sell', 52, 't1_4')
self.assertFalse('t1_2' in self.ex1._ask_book[52]['orders'].keys())

In modern limit order book markets, some order modifications do not generally result in loss of time priority. Reducing the limit order quantity is one example of this type of modification. test_modify_order() begins by adding an order with quantity of 2 to a clean orderbook and then tests for a reduction in quantity and finally tests for removal when quantity becomes zero. The tests are repeated for sell orders.

    def test_modify_order(self):
'''
_modify_order() primarily impacts _bid_book or _ask_book
_modify_order() could impact _bid_book_prices or _ask_book_prices if the order results
in removing the full quantity with a call to _remove_order()
Add 1 order, remove partial, then remainder
'''
q1 = {'order_id': 't1_1', 'timestamp': 5, 'type': 'add', 'quantity': 2, 'side': 'buy',
'price': 50}
self.assertEqual(self.ex1._bid_book[50]['size'], 2)
# remove 1
self.assertEqual(self.ex1._bid_book[50]['size'], 1)
self.assertEqual(self.ex1._bid_book[50]['orders']['t1_1']['quantity'], 1)
self.assertTrue(self.ex1._bid_book_prices)
# remove remainder
self.assertFalse(self.ex1._bid_book_prices)
self.assertEqual(self.ex1._bid_book[50]['num_orders'], 0)
self.assertEqual(self.ex1._bid_book[50]['size'], 0)
self.assertFalse('t1_1' in self.ex1._bid_book[50]['orders'].keys())
# Sell order
q2 = {'order_id': 't1_1', 'timestamp': 5, 'type': 'add', 'quantity': 2, 'side': 'sell',
'price': 50}
# remove 1
self.ex1._modify_order('sell', 1, 't1_1', 50)
# remove remainder
self.ex1._modify_order('sell', 1, 't1_1', 50)
self.assertFalse('t1_1' in self.ex1._ask_book[50]['orders'].keys())

    def test_add_trade_to_book(self):
'''
'''
t1 = dict(resting_order_id='t1_1', resting_timestamp=2, incoming_order_id='t2_1',

'''
'''
t2 = dict(timestamp=5, trader='t3', order_id='t3_1', quantity=1,
side='sell', price=50)

def test_confirm_modify(self):
'''
confirm_modify() impacts confirm_modify_collector
Check confirm modify collector empty, add a trade, check non-empty, verify dict equality
'''
self.assertFalse(self.ex1.confirm_modify_collector)
self.assertTrue(self.ex1.confirm_modify_collector)
self.assertDictEqual(m1, self.ex1.confirm_modify_collector[0])

In the Orderbook instance, process_order() potentially relies upon _match_trade(). Testing these independently is difficult. I decided to test process_order() with a simple trade quantity of 1 and then test for proper matching (i.e., “walking the book”) with quantities > 1 in _match_trade(). test_process_order() seeds each side of the orderbook with 2 orders with the same price, then tests the impact of marketable buy and sell orders with quantity 1. It then tests for adding, canceling and modifying some orders. See the docstring and inline comments for more details.

    def test_process_order(self):
'''
process_order() impacts confirm_modify_collector, traded indicator, order_history,
process_order() is a traffic manager. An order is either an add order or not. If it is an add order,
it is either priced to go directly to the book or is sent to match_trade (which is tested below). If it
is not an add order, it is either modified or cancelled. To test, we will add some buy and sell orders,
then test for trades, cancels and modifies. process_order() also resets some object collectors.
'''
self.q2_sell['quantity'] = 2

self.assertEqual(len(self.ex1._bid_book_prices), 0)
self.assertFalse(self.ex1.confirm_modify_collector)
self.assertFalse(self.ex1.order_history)
# seed order book
# process new orders
self.ex1.process_order(self.q2_sell)
self.assertEqual(len(self.ex1._bid_book_prices), 1)
self.assertEqual(len(self.ex1.order_history), 2)
# marketable sell takes out 1 share
q3_sell = {'order_id': 't3_1', 'timestamp': 5, 'type': 'add', 'quantity': 1, 'side': 'sell',
'price': 0}
self.ex1.process_order(q3_sell)
self.assertEqual(len(self.ex1.order_history), 3)
self.assertEqual(self.ex1._bid_book[50]['num_orders'], 1)
self.assertEqual(self.ex1._bid_book[50]['size'], 2)
# marketable buy takes out 1 share
'price': 10000}
self.assertEqual(len(self.ex1.order_history), 4)
'price': 48}
self.assertEqual(len(self.ex1.order_history), 5)
self.assertEqual(len(self.ex1._bid_book_prices), 2)
self.assertEqual(self.ex1._bid_book[48]['num_orders'], 1)
self.assertEqual(self.ex1._bid_book[48]['size'], 1)
q4_cancel1 = {'order_id': 't4_1', 'timestamp': 10, 'type': 'cancel', 'quantity': 1, 'side': 'buy',
'price': 48}
self.ex1.process_order(q4_cancel1)
self.assertEqual(len(self.ex1.order_history), 6)
self.assertEqual(len(self.ex1._bid_book_prices), 1)
q4_sell = {'order_id': 't4_2', 'timestamp': 10, 'type': 'add', 'quantity': 1, 'side': 'sell',
'price': 54}
self.ex1.process_order(q4_sell)
self.assertEqual(len(self.ex1.order_history), 7)
q4_cancel2 = {'order_id': 't4_2', 'timestamp': 10, 'type': 'cancel', 'quantity': 1, 'side': 'sell',
'price': 54}
self.ex1.process_order(q4_cancel2)
self.assertEqual(len(self.ex1.order_history), 8)
'price': 48}
self.assertEqual(len(self.ex1.order_history), 9)
self.assertEqual(len(self.ex1._bid_book_prices), 2)
self.assertEqual(self.ex1._bid_book[48]['num_orders'], 1)
self.assertEqual(self.ex1._bid_book[48]['size'], 5)
q5_modify1 = {'order_id': 't5_1', 'timestamp': 10, 'type': 'modify', 'quantity': 2, 'side': 'buy',
'price': 48}
self.ex1.process_order(q5_modify1)
self.assertEqual(len(self.ex1.order_history), 10)
self.assertEqual(len(self.ex1._bid_book_prices), 2)
self.assertEqual(self.ex1._bid_book[48]['size'], 3)
self.assertEqual(self.ex1._bid_book[48]['orders']['t5_1']['quantity'], 3)
self.assertEqual(len(self.ex1.confirm_modify_collector), 1)
q5_sell = {'order_id': 't5_1', 'timestamp': 10, 'type': 'add', 'quantity': 5, 'side': 'sell',
'price': 54}
self.ex1.process_order(q5_sell)
self.assertEqual(len(self.ex1.order_history), 11)
q5_modify2 = {'order_id': 't5_1', 'timestamp': 10, 'type': 'modify', 'quantity': 2, 'side': 'sell',
'price': 54}
self.ex1.process_order(q5_modify2)
self.assertEqual(len(self.ex1.order_history), 12)
self.assertEqual(len(self.ex1.confirm_modify_collector), 1)
self.assertFalse(self.ex1.traded)

For _match_trade(), we will test buys and sells separately. The logic is the same and is documented in the comments. In general, the tests check for partial executions (one incoming order with a quantity less than the quantity available at that price), walking the book (incoming order priced to remove more than one order from the book – possibly at different prices), and making a new market. First some sell orders:

    def test_match_trade_sell(self):
'''
An incoming order can:
1. take out part of an order,
2. take out an entire price level,
3. if priced, take out a price level and make a new inside market.
'''
# seed order book
# process new orders
self.ex1.process_order(self.q2_sell)
self.ex1.process_order(self.q3_sell)
self.ex1.process_order(self.q4_sell)
# The book: bids: 2@50, 3@49, 3@47 ; asks: 2@52, 3@53, 3@55
self.assertEqual(self.ex1._bid_book[47]['size'], 3)
self.assertEqual(self.ex1._bid_book[49]['size'], 3)
self.assertEqual(self.ex1._bid_book[50]['size'], 2)
#self.assertFalse(self.ex1.sip_collector)
# market sell order takes out part of first best bid
q1 = {'order_id': 't100_1', 'timestamp': 10, 'type': 'add', 'quantity': 1, 'side': 'sell',
'price': 0}
self.ex1.process_order(q1)
self.assertEqual(self.ex1._bid_book[50]['size'], 1)
self.assertTrue(50 in self.ex1._bid_book_prices)
self.assertEqual(self.ex1._bid_book[49]['size'], 3)
self.assertEqual(self.ex1._bid_book[47]['size'], 3)
self.assertEqual(self.ex1._bid_book[50]['orders'][self.ex1._bid_book[50]['order_ids'][0]]['quantity'], 1)
#self.assertEqual(len(self.ex1.sip_collector), 1)
# market sell order takes out remainder first best bid and all of the next level
self.assertEqual(len(self.ex1._bid_book_prices), 3)
q2 = {'order_id': 't100_2', 'timestamp': 11, 'type': 'add', 'quantity': 4, 'side': 'sell',
'price': 0}
self.ex1.process_order(q2)
self.assertEqual(len(self.ex1._bid_book_prices), 1)
self.assertFalse(50 in self.ex1._bid_book_prices)
self.assertFalse(49 in self.ex1._bid_book_prices)
self.assertTrue(47 in self.ex1._bid_book_prices)
#self.assertEqual(len(self.ex1.sip_collector), 3)
# make new market
q3 = {'order_id': 't101_1', 'timestamp': 12, 'type': 'add', 'quantity': 2, 'side': 'buy',
'price': 48}
q4 = {'order_id': 't102_1', 'timestamp': 13, 'type': 'add', 'quantity': 3, 'side': 'sell',
'price': 48}
self.ex1.process_order(q3)
self.assertEqual(len(self.ex1._bid_book_prices), 2)
self.assertTrue(48 in self.ex1._bid_book_prices)
self.assertTrue(47 in self.ex1._bid_book_prices)
self.assertEqual(self.ex1._bid_book_prices[-1], 48)
self.assertEqual(self.ex1._bid_book_prices[-2], 47)
# sip_collector does not reset until new trade at new time
#self.assertEqual(len(self.ex1.sip_collector), 3)
self.ex1.process_order(q4)
self.assertEqual(len(self.ex1._bid_book_prices), 1)
self.assertFalse(48 in self.ex1._bid_book_prices)
self.assertTrue(47 in self.ex1._bid_book_prices)
self.assertEqual(self.ex1._bid_book_prices[-1], 47)
#self.assertEqual(len(self.ex1.sip_collector), 1)

    def test_match_trade_buy(self):
'''
An incoming order can:
1. take out part of an order,
2. take out an entire price level,
3. if priced, take out a price level and make a new inside market.
'''
# seed order book
# process new orders
self.ex1.process_order(self.q2_sell)
self.ex1.process_order(self.q3_sell)
self.ex1.process_order(self.q4_sell)
# The book: bids: 2@50, 3@49, 3@47 ; asks: 2@52, 3@53, 3@55
self.assertEqual(self.ex1._bid_book[47]['size'], 3)
self.assertEqual(self.ex1._bid_book[49]['size'], 3)
self.assertEqual(self.ex1._bid_book[50]['size'], 2)
q1 = {'order_id': 't100_1', 'timestamp': 10, 'type': 'add', 'quantity': 1, 'side': 'buy',
'price': 100000}
self.ex1.process_order(q1)
# market buy order takes out remainder first best ask and all of the next level
q2 = {'order_id': 't100_2', 'timestamp': 11, 'type': 'add', 'quantity': 4, 'side': 'buy',
'price': 100000}
self.ex1.process_order(q2)
# make new market
q3 = {'order_id': 't101_1', 'timestamp': 12, 'type': 'add', 'quantity': 2, 'side': 'sell',
'price': 54}
q4 = {'order_id': 't102_1', 'timestamp': 13, 'type': 'add', 'quantity': 3, 'side': 'buy',
'price': 54}
self.ex1.process_order(q3)
self.ex1.process_order(q4)
self.assertEqual(len(self.ex1._bid_book_prices), 4)
self.assertTrue(54 in self.ex1._bid_book_prices)
self.assertEqual(self.ex1._bid_book_prices[-1], 54)

There is also an additional test for market collapse:

    def test_market_collapse(self):
'''
At setup(), there is 8 total bid size and 8 total ask size
A trade for 8 or more should collapse the market
'''
print('Market Collapse Tests to stdout:\n')
# seed order book
# process new orders
self.ex1.process_order(self.q2_sell)
self.ex1.process_order(self.q3_sell)
self.ex1.process_order(self.q4_sell)
# The book: bids: 2@50, 3@49, 3@47 ; asks: 2@52, 3@53, 3@55
# market buy order takes out part of the asks: no collapse
q1 = {'order_id': 't100_1', 'timestamp': 10, 'type': 'add', 'quantity': 4, 'side': 'buy',
'price': 100000}
self.ex1.process_order(q1)
q2 = {'order_id': 't100_2', 'timestamp': 10, 'type': 'add', 'quantity': 5, 'side': 'buy',
'price': 100000}
self.ex1.process_order(q2)
# market sell order takes out part of the bids: no collapse
q3 = {'order_id': 't100_3', 'timestamp': 10, 'type': 'add', 'quantity': 4, 'side': 'sell',
'price': 0}
self.ex1.process_order(q3)
# next market sell order takes out the asks: market collapse
q4 = {'order_id': 't100_4', 'timestamp': 10, 'type': 'add', 'quantity': 5, 'side': 'sell',
'price': 0}
self.ex1.process_order(q4)

This test is mostly for internal checking. Many combinations of inputs will necessarily result in market collapse (i.e., exhaustion of all orders on one side of the book). Check out the Preis et al. (2006) and Preis et al. (2007) references in the Bibliography for more details. The final test checks for posting the top-of-book:

    def test_report_top_of_book(self):
'''
At setup(), top of book has 2 to sell at 52 and 2 to buy at 50
at time = 3
'''
tob_check = {'timestamp': 5, 'best_bid': 50, 'best_ask': 52, 'bid_size': 2, 'ask_size': 2}
self.ex1.report_top_of_book(5)
self.assertDictEqual(self.ex1._sip_collector[0], tob_check)

That’s it for testing the Orderbook class and associated methods. As usual, the tests are more than twice as long as the actual code to be tested. Future posts will cover the @unittest.skip() decorator and how to run tests in a loop. If you have comments or suggestions, feel free to post them. The current WordPress settings require me to approve the first comment from a specific source. After that, you are free to comment without further approval. Coming up are posts describing the Trader classes and associated tests followed by a post or two on the simulation loop.

## Coding a simple limit order book with Python

I will walk through designing and coding a simple two-sided continuous auction limit order book using an object-oriented approach with Python 3. The full code is available on GitHub as orderbook3.py. The limit order book will have attributes and methods. Private attributes (things the order book has) and methods (things the order book does) are not used outside of the class declaration. These are denoted with a leading underscore. Public attributes and methods are called from other modules that import the orderbook. These have no leading underscore.

The first step is to import some python modules we will need within the class: bisect and pandas. We will use bisect.insort to maintain two ordered lists of prices – thereby maintaining price priority in both the bid and ask queue and pandas to facilitate permanent results storage in hdf5 files.

import bisect
import pandas as pd 

Next, we declare the Orderbook class and initialize the attributes. For now, we will skip the documentation comments.

class Orderbook(object):

def __init__(self):
self.order_history = []
self._bid_book = {}
self._bid_book_prices = []
self.confirm_modify_collector = []
self._sip_collector = []
self._order_index = 0
self.traded = False 

First, Orderbook inherits from object – the default in Python 3. order_history is a list of all of the orders sequenced by arrival time. This is used to reconstruct the orderbook after the simulation is run. _bid_book_prices and _ask_book_prices are lists of existing prices in ascending order. The sorted order is established by using bisect.insort. The prices act as pointers to the two books: _bid_book and _ask_book. An example of the _ask_book_prices:

[998, 999, 1000, 1001, 1005, … , 1010]

And an example of the _ask_book:

{998: {‘num_orders’: 2, ‘size’: 5, ‘order_ids’: [id1, id2], ‘orders’: {id1: {order for id1}, id2: {order for id2}}, 999: …}

confirm_modify_collector and confirm_trade_collector are public lists that carry messages (dictionaries) to the traders. _sip_collector is a private list of dictionaries containing best bid and ask prices along with their associated sizes for each discrete event. This top-of-book information is provided to traders via a public Orderbook method called in the looping logic contained in a separate module. trade_book is a list of dictionaries containing details for each trade. _order_index is used to generate unique incremented order ids and traded is a public boolean attribute used in the simulation looping logic to determine if a trade occurred or not. Much of this will become clearer as we introduce the Orderbook methods.

There are three major types of methods in Orderbook. The actual order processing and matching is done by a public process_order method and a private _match_trade method, respectively. Several methods are helper functions called from these two main order processing methods. The remaining methods prepare and save some important data for processing after the simulation is run.

_add_order_to_history adds a unique order index to an existing order (dict) and appends the modified order to the order_history list.

    def _add_order_to_history(self, order):
'''Add an order (dict) to order_history'''
hist_order = {'order_id': order['order_id'], 'timestamp': order['timestamp'], 'type': order['type'],
'quantity': order['quantity'], 'side': order['side'], 'price': order['price']}
self._order_index += 1
hist_order['exid'] = self._order_index
self.order_history.append(hist_order)

Here we can see how simple the order book really is. To extend this orderbook, we could add reserve or hidden features to the order. We would also have to modify the bookkeeping and matching logic that follows. Note that hist_order is just a hand-written copy of the order parameter. This is much faster than using copy.deepcopy().

add_order_to_book performs all of the book maintenance for incoming orders that do not result in a full trade (i.e., either the order is not priced to trade or the size is not fully exhausted if it is priced to trade).

    def add_order_to_book(self, order):
'''
Use insort to maintain on ordered list of prices which serve as pointers
to the orders.
'''
book_order = {'order_id': order['order_id'], 'timestamp': order['timestamp'], 'type': order['type'],
'quantity': order['quantity'], 'side': order['side'], 'price': order['price']}
book_prices = self._bid_book_prices
book = self._bid_book
else:
if order['price'] in book_prices:
book[order['price']]['num_orders'] += 1
book[order['price']]['size'] += order['quantity']
book[order['price']]['order_ids'].append(order['order_id'])
book[order['price']]['orders'][order['order_id']] = book_order
else:
bisect.insort(book_prices, order['price'])
book[order['price']] = {'num_orders': 1, 'size': order['quantity'], 'order_ids': [order['order_id']],
'orders': {order['order_id']: book_order}}

Again, the incoming order is copied to a new dict object. The order side determines which book we are using: bid or ask. Then we check if the order price is in the list of book_prices – a very expensive task that would take longer if we were to check for “not in” prices. If the order price is already in the book, the order book is updated with the new information. If not, a new price is inserted in the proper sorted slot and the book is (re-)established for the new price.

_remove_order removes an order from the order book and removes the price from the price list if removal results in an empty book for that price. Maintaining a list of valid prices instead of merely keeping all of the prices (with some prices pointing to empty books) speeds up the trade matching algorithm.

    def _remove_order(self, order_side, order_price, order_id):
'''Pop the order_id; if  order_id exists, updates the book.'''
book_prices = self._bid_book_prices
book = self._bid_book
else:
is_order = book[order_price]['orders'].pop(order_id, None)
if is_order:
book[order_price]['num_orders'] -= 1
book[order_price]['size'] -= is_order['quantity']
book[order_price]['order_ids'].remove(is_order['order_id'])
if book[order_price]['num_orders'] == 0:
book_prices.remove(order_price)

_modify_order behaves similarly, but also checks if the modify actually results in removal.

    def _modify_order(self, order_side, order_quantity, order_id, order_price):
'''Modify order quantity; if quantity is 0, removes the order.'''
if order_quantity < book[order_price]['orders'][order_id]['quantity']:
book[order_price]['size'] -= order_quantity
book[order_price]['orders'][order_id]['quantity'] -= order_quantity
else:
self._remove_order(order_side, order_price, order_id)

    def _add_trade_to_book(self, resting_order_id, resting_timestamp, incoming_order_id, timestamp, price, quantity, side):
'incoming_order_id': incoming_order_id, 'timestamp': timestamp, 'price': price,
'quantity': quantity, 'side': side})

_confirm_trade and _confirm_modify are helper functions that append trade or modify messages to a list that is conveyed to the traders.

    def _confirm_trade(self, timestamp, order_side, order_quantity, order_id, order_price):
'quantity': order_quantity, 'side': order_side, 'price': order_price})

def _confirm_modify(self, timestamp, order_side, order_quantity, order_id):
'''Add modify confirmation to confirm_modify_collector list.'''
'quantity': order_quantity, 'side': order_side})

process_order determines whether an incoming order results in a match with a resting order or not.

    def process_order(self, order):
'''Check for a trade (match); if so call _match_trade, otherwise modify book(s).'''
self.confirm_modify_collector.clear()
else:
else: #order['side'] == 'sell'
if order['price'] <= self._bid_book_prices[-1]:
else:
else:
if order['price'] in book_prices:
if order['order_id'] in book[order['price']]['orders']:
self._confirm_modify(order['timestamp'], order['side'], order['quantity'], order['order_id'])
if order['type'] == 'cancel':
self._remove_order(order['side'], order['price'], order['order_id'])
else: #order['type'] == 'modify'
self._modify_order(order['side'], order['quantity'], order['order_id'], order['price'])

It does some bookkeeping then checks the type of order. If it is an add order, it results in a trade if it is priced to match an existing order. This is assessed by checking the order price against the best bid (_bid_book_prices[-1]) or best ask (_ask_book_prices[0]). If it is not an add order, then it must be a cancel or modify and the order book is updated and messages are created for the trader.

_match_trade enforces price-time priority for matching incoming orders against resting orders.

    def _match_trade(self, order):
'''Match orders to generate trades, update books.'''
remainder = order['quantity']
while remainder > 0:
if book_prices:
price = book_prices[0]
if order['price'] >= price:
book_order_id = book[price]['order_ids'][0]
book_order = book[price]['orders'][book_order_id]
if remainder >= book_order['quantity']:
book_order['quantity'], order['side'])
self._remove_order(book_order['side'], book_order['price'], book_order['order_id'])
remainder -= book_order['quantity']
else:
remainder, order['side'])
self._modify_order(book_order['side'], remainder, book_order['order_id'], book_order['price'])
break
else:
order['quantity'] = remainder
break
else:
print('Ask Market Collapse with order {0}'.format(order))
break
else: #order['side'] =='sell'
book_prices = self._bid_book_prices
book = self._bid_book
remainder = order['quantity']
while remainder > 0:
if book_prices:
price = book_prices[-1]
if order['price'] <= price:
book_order_id = book[price]['order_ids'][0]
book_order = book[price]['orders'][book_order_id]
if remainder >= book_order['quantity']:
book_order['quantity'], order['side'])
self._remove_order(book_order['side'], book_order['price'], book_order['order_id'])
remainder -= book_order['quantity']
else:
remainder, order['side'])
self._modify_order(book_order['side'], remainder, book_order['order_id'], book_order['price'])
break
else:
order['quantity'] = remainder
break
else:
print('Bid Market Collapse with order {0}'.format(order))
break

It does a little bookkeeping then checks whether the incoming order is a buy or sell. The “while” loops ensure price priority by checking for the best price pointer (price = book_prices[0], for example), then ensures time priority by walking through the resting orders in the order of arrival for each price (book_order_id = book[price][‘order_ids’][0]; book_order = book[price][‘orders’][book_order_id]). The remaining portions of the while loop check if the remaining order size is greater than the size available for the current best price and behaves accordingly.

Three helper functions facilitate saving data to an hdf5 for use after the simulation has ended.

    def order_history_to_h5(self, filename):
'''Append order history to an h5 file, clear the order_history'''
temp_df = pd.DataFrame(self.order_history)
temp_df.to_hdf(filename, 'orders', append=True, format='table', complevel=5, complib='blosc',
min_itemsize={'order_id': 12})
self.order_history.clear()

temp_df.to_hdf(filename, 'trades', append=True, format='table', complevel=5, complib='blosc',
min_itemsize={'resting_order_id': 12, 'incoming_order_id': 12})

def sip_to_h5(self, filename):
'''Append _sip_collector to an h5 file, clear the _sip_collector'''
temp_df = pd.DataFrame(self._sip_collector)
temp_df.to_hdf(filename, 'tob', append=True, format='table', complevel=5, complib='blosc')
self._sip_collector.clear()

The final function is a public method for conveying the top of book information to the traders.

    def report_top_of_book(self, now_time):
'''Update the top-of-book prices and sizes'''
best_bid_price = self._bid_book_prices[-1]
best_bid_size = self._bid_book[best_bid_price]['size']
self._sip_collector.append(tob)
return tob

That is it! Easy? Maybe not the first time. Creating order book code is an iterative process, even when a lot of planning and forethought is applied and even with a lot of prior knowledge about how order books are actually created by professional trading firms. The logic here can be extended to include more order information like hidden or iceberg orders. The order processing, trade matching and bookkeeping would have to be updated as well. Adding more functionality like pegged or sliding features would require considerable modification to the order processing and trade matching algorithms. But it can be done! And finally, the basic organization of this order book module can be applied to other matching mechanisms like auctions or dealer markets. Simulations will always require a module or set of functions to determine which agents traded and the prices the agents received.

Next posts will cover unit testing and designing various trader agents with Python.

[2/12/2018: Updated to properly format “less than” in code blocks.]

## Why would anyone write about agent-based modeling?

Agent-based modeling offers an opportunity to experiment with and learn about how markets work without interfering with real markets. To this very day, experimenting with our financial markets means altering the actual market structure, typically via some mixture of policy and incentives designed to influence or curtail the activities of market participants. These experiments impose real costs on a variety of participants, sometimes for years. Why not try agent-based modeling before experimenting with actual markets? A variety of excuses are proffered by the people who stand to gain the most: regulators and market participants. These folks regularly claim that agent-based models are untested, difficult to implement, and too unrealistic. They are wrong about the models being untested, wrong about implementation, and wrong about the necessity for complete realism in any model. The truth is agent-based modeling is not well understood. We fear what we do not understand. And we distrust what we fear.

radicalmarketsimulation.com is devoted to increasing awareness of the utility of applying agent-based modeling to financial markets. To increase awareness, skilled agent-based modelers must engage market practitioners, academics, and regulators in an open discussion of the pros and cons of agent-based models. Not only do we need to build better models, target important policy issues and publicize our results more extensively, we need to learn how to communicate effectively with each member of the tripartite audience. I say “we” because I want to invite other agent-based modelers to contribute to this site in the spirit of an open-source community of concerned practitioners. More details below. But first, let’s get some obligatory preliminaries out of the way.

Who am I?

My name is Charles Collver and I am a financial economist at the US Securities and Exchange Commission where I have spent the last five years working with financial market big data. My LinkedIn profile provides more historical details and a way to contact me (for now). While I have been noodling around with agent-based models and genetic algorithms since my grad school days, I have only recently begun focusing on applying the models to financial market policy.

In the blogs to follow, I will write about:
1. Agent-based modeling for financial markets
2. Simulation
3. Python/Cython for building an API (aka a “test bed”)
4. Genetic algorithms
5. Packaging, Conda, GitHub
6. Large scale and distributed computing
7. I will write about what works as if it took a few hours that morning. I will also write about some of the things I tried and didn’t work (and took a lot more than a few hours).

What will I not write about?

I will not write about the US Securities and Exchange Commission or opine on any policy-related issues. However, if a policy issue becomes a matter of public discussion, then I might write about how to think about designing models to address the issue.

Why I write (and why you should, too)?

Or, what’s in it for me? I want to establish expertise and super-credibility, to become a better modeler and have other like-minded modelers notice. Of course, as an ex-academic, I believe you haven’t really fully understood a topic/subject/skill/dirty trick until you have taught it to others. More importantly, I want to engage with a like-minded community and build a network for myself, my colleagues and for you. We can learn from each other – which means I will get better, you will get better, and, most importantly, the models will get better.

When?

I have a day job! At first, I will shoot for weekly posts, maybe more frequently and occasionally less. When others begin contributing, publication frequency will necessarily increase.

Where?