Pairwise distance with PBC

Pair distances with PBC
Nested For Loop
pdist from Scipy
Allen & Tildesley
Compare

If we want to simulate a system, either with molecular dynamics or Monte-Carlo sampling, we often use the periodic boundary condition (PBC) to fix the volume fraction of the system. There are many ways to implement the PBC especially in Python, and this note introduced 3 different implementations.

Pair distances with PBC

When we simulate a soft matter system, the particles often interact with different forms of pairwise potentials like the Lennard-Jones potential. One crucial step for the calculation of the potential energy, hence the forces on each particles, is determing the pairwise distances.

With PBC, we have the following situation

┌───────────────────────┳━━━━━━━━━━━━━━━━━━━━━━━┳───────────────────────┐
│   i'              j'  ┃   i               j   ┃   i"              j"  │
│   ○               ○   ┃   ●               ●   ┃   ○               ○   │
└───────────────────┼───┻━━━╋━━━━━━━━━━━━━━━╋━━━┻───────────────────────┘
                    └ d_ij'─┴──── d_ij ─────┘   


    ( box size: L = d_ij' + d_ij )

where the distance of i and j are d_ij.

The particles labeled with i', i", j', j" are the equivalences of i and j in the neighbour boxes, as the result of the PBC. Here, since the value of d_ij' is smaller than that of d_ij, we should select d_ij' as the distance instead of d_ij, for the calculation of short-ranged interaction like LJ. ¹

Nested For Loop

The straightforward way is to perform the literal calculation inside a for loop. Me and Fergus exhibited this method in our pedagogical code. Here is a “minimal” code for the relevant part.

import numpy as np

N, dim = 1000, 3  # 100 particles in 3D
box = 1  # size of box
positions = np.random.random((N, dim))

pair_distances = np.empty((N, N))
for i, p1 in enumerate(positions):
    for j, p2 in enumerate(positions):
        dist_nd_sq = 0
        for d in range(dim):
            dist_1d = abs(p2[d] - p1[d])
            if dist_1d > (box / 2):  # check if d_ij > box_size / 2
                dist_1d = box - dist_1d
            dist_nd_sq += dist_1d ** 2
        pair_distances[i, j] = dist_nd_sq
pair_distances = np.sqrt(pair_distances)

However, this is quite slow because we are using Python, which is infamously slow for nested for loops. The above code takes about 5000 ms to execute on my laptop.

`pdist` from Scipy

Inspired by Francesco’s post, we can use the very fast function pdist from package scipy to calculate the pair distances. The “minimal” code is presented here.

import numpy as np
from scipy.spatial.distance import pdist

N, dim = 1000, 3  # 100 particles in 3D
box = 1
positions = np.random.random((N, dim))

dist_nd_sq = np.zeros(N * (N - 1) // 2)  # to match the result of pdist
for d in range(dim):
    pos_1d = positions[:, d][:, np.newaxis]  # shape (N, 1)
    dist_1d = pdist(pos_1d)  # shape (N * (N - 1) // 2, )
    dist_1d[dist_1d > box * 0.5] -= box
    dist_nd_sq += dist_1d ** 2  # d^2 = dx^2 + dy^2 + dz^2
dist_nd = np.sqrt(dist_nd_sq)

It is significantly faster than the straightforward way, and it takes about 800 ms on my laptop.

Allen & Tildesley

There is also another good implementation from the example python code from Allen & Tildesley’s book. The “minimal” code is presented here, where I

import numpy as np

N, dim = 1000, 3  # 1000 particles in 3D
box = 1
positions = np.random.random((N, dim))

pair_shift = np.empty((dim, N, N))
pos_in_box = positions / box  # convert the unit to box

for d in range(dim):
    pos_1d = pos_in_box[:, d][:, np.newaxis]  # shape (N, 1)
    pair_shift_1d = pos_1d - pos_1d.T  # shape (N, N)
    pair_shift_1d = pair_shift_1d - np.rint(pair_shift_1d)
    pair_shift[d] = pair_shift_1d

dist_nd = np.linalg.norm(pair_shift, axis=0) * box  # conver the unit back

It is less efficient comparing with the previous solution, taking about 900 ms on my computer. However, we have access to the pairwise shift during the calculation, which is a plus for implementing strange agent-based models.

We could further transform the code into a more compact form thanks to the broadcasting rules of numpy,

import numpy as np

N, dim = 1000, 3  # 1000 particles in 3D
box = 1
positions = np.random.random((N, dim))

pos_in_box = positions.T / box  # convert the unit to box, shape (dim, N)
pair_shift = pos_in_box[:, :, np.newaxis] - pos_in_box[:, np.newaxis, :] # broadcasting
pair_shift = pair_shift - np.rint(pair_shift)
dist_nd = np.linalg.norm(pair_shift, axis=0) * box  # conver the unit back

(What a beautiful, elegant and annoying way to write Python code!)

Compare

I compared the three different methods with the code on my gist, and they give identical results. The efficiency varies a lot. And the second method (using pdist in Scipy) is the fastest version.

8901 s spent by method 1
0230 s spent by method 2
0747 s spent by method 3
Getting identical results? True

Actually, in a straightforward calculation, we should take all the distances from all the image boxes created by the PBC. This would lead to infinite amount of distances, which can nevertheless be handled with methods such as the Ewald sum.

However the calculation is much easier for short ranged interactions such as LJ interaction. This is because we only consider particles whose distance is smaller than a cutoff value, r_cut, and r_cut is typically much smaller than half of the box length, L / 2.

Going back to the diagram, this means d_ij itself will definitely be ignored because it is larger that L / 2. However, the existence of PBC forces us to especially take care of the situation where d_ij > L / 2 ↩

Pair distances with PBC

Nested For Loop

pdist from Scipy

Allen & Tildesley

Compare

`pdist` from Scipy