Example: Modeling Seismic Activity¶
One field where Hawkes processes have traditionally been popular is seismology. In this example, we look at fitting a univariate Hawkes model to earthquakes.
%matplotlib inline
import requests
from lxml import html
from datetime import datetime as dt
import numpy as np
from matplotlib import pyplot as plt
Let’s start by scraping some data on recent earthquakes in and around
Istanbul – where this tutorial was written. The following short script
uses requests
, lxml
and pandas
to scrape some data from the
“recent earthquakes” report maintained by Bogazici University’s
Kandilli Observatory, and whip it
into shape for use.
res = requests.get("http://www.koeri.boun.edu.tr/scripts/lst9.asp")
tx = html.fromstring(res.content).xpath("//pre/text()")[0]
lines = tx.splitlines()[7:] # get rid of the headers
# take out timestamps and convert them to "hours since first event"
timestamps = [dt.strptime(l[:19], "%Y.%m.%d %H:%M:%S") for l in lines if "ISTANBUL" in l]
t = np.array([(x - ts[-1]).total_seconds() / 3600 for x in timestamps])[::-1]
Estimating Hawkes process parameters¶
Hawkes processes model self-excitement, systems where point (or event) occurrences excite, or increase the probability of occurrence for, future points. Earthquakes fit this description. Indeed, they are known to trigger after-shocks.
Plotting occurrence times on the timeline, one thing we can expect of aftershock sequences is that they appear “clustered” in time. This is intuitive, a “main” earthquake would occur stochastically (as it would occur in a Poisson process), and the others in the cluster would follow closely after. We plot our data below to verify that this appears to be the case. (We add some “jitter” on the y axis to make viewing easier).
plt.figure(figsize=(15,2))
plt.ylim([-5, 5])
plt.yticks([])
_ = plt.plot(t, np.random.rand(len(t)), 'k.')
We now move to fitting a (univariate) Hawkes process, using the
hawkeslib.UnivariateExpHawkesProcess
class. Before we move
on, let’s recap the interpretation of the parameters.
mu
is the background intensity rate, i.e. the intensity rate for the earthquakes that occur exogenously.alpha
is the infectivity factor. It can be interpreted as the number of aftershocks, in expectation, to be triggered by each earthquake.theta
is the rate parameter of the exponential delay density. For example,theta
equaling to 0.5 would mean that on average 2 hours pass between the main earthquake and the aftershock.
Let’s fit the model.
%%time
from hawkeslib import UnivariateExpHawkesProcess as UVHP
uv = UVHP()
uv.fit(t)
print uv.get_params()
(0.04930555306217892, 0.30548369162341404, 5.150339498582191)
CPU times: user 1.48 ms, sys: 0 ns, total: 1.48 ms
Wall time: 1.44 ms
Interpreting the parameters, we conclude that earthquakes occur
exogenously once every ~20 hours in Istanbul (here, we call any
registered seismic activity an “earthquake”). Each main shock results in
0.3 aftershocks on average, and aftershocks occur with
1 / 5.15 = 0.194
delay or 12 minutes on average.
Having fit a model, hawkeslib
allows sampling (unconditional) from
the model, as well as evaluate likelihood (e.g. for out-of-sample cross
validation) for other data sets. Here, let’s take a few samples from the
“earthquake timeline” and use it to approximate the histogram for the
number of tremors during a 24 hour time span in the city.
nr_shocks_sample = [len(uv.sample(24)) for x in range(100000)]
_ = plt.hist(nr_shocks_sample, bins=20)
Bayesian inference¶
Having fit the model, we now move to quantifying uncertainty in the
parameter estimates. In hawkeslib
, we do this by Bayesian inference
in the univariate Hawkes model. Related functionality is implemented in
hawkeslib.BayesianUVExpHawkesProcess
.
Below, we use hawkeslib
to sample from the posterior
distribution of parameters mu
, alpha
, and theta
. We then
present “Bayesian credible intervals” for the parameters.
from hawkeslib import BayesianUVExpHawkesProcess as BUVHP
buv = BUVHP(mu_hyp=(1., 10.), alpha_hyp=(1., 1.), theta_hyp=(1., 10.))
trace = buv.sample_posterior(t, T=t[-1], n_samp=50000)
# compute the BCIs
print pm.stats.quantiles(trace["alpha"], [2.5, 97.5])
print pm.stats.quantiles(trace["theta"], [2.5, 97.5])
{2.5: 0.19881998542628126, 97.5: 0.45850986170997604}
{2.5: 2.9138895786443646, 97.5: 8.605057427283178}
We observe that, under small data the credible intervals around our parameters are relatively wide.
Let us end by noting that a more expressive model, one that takes into account earthquake magnitudes, would be required for more realistic scenarios. Traditionally, this is a marked Hawkes process that’s known as ETAS, the Epidemic-type Aftershock Sequence Model [1].
References
[1] | Ogata, Yosihiko, Ritsuko S. Matsu’ura, and Koichi Katsura. “Fast likelihood computation of epidemic type aftershock‐sequence model.” Geophysical research letters 20.19 (1993): 2143-2146. |