MPI

Work distribution and shared memory.

class elphmod.MPI.Buffer(buf=None)[source]

Wrapper for pickle in parallel context.

Parameters:

bufstr, default None: Name of buffer file with pickled object. By default, the buffer is not used and get() and set() do nothing and return None.

get()[source]

Read object from buffer and broadcast it.

Returns:

object: Object if buffer exists, None otherwise.

set(obj)[source]

Write object to buffer on single processor.

Parameters:

object: Object to write.

class elphmod.MPI.SharedArray(*args, **kwargs)[source]

Create array whose memory is shared among all processes on same node.

See also

shared_array

elphmod.MPI.distribute(size, bounds=False, comm=<elphmod.MPI.Communicator object>, chunks=None)[source]: Distribute work among processes.

elphmod.MPI.elphmodenv(num_threads=1)[source]

Print commands to change number of threads.

Run eval $(elphmodenv) before your Python scripts to prevent NumPy from using multiple processors for linear algebra. This is advantageous when parallelizing with MPI already (mpirun python3 script.py) or on shared servers, where CPU resources must be used with restraint.

Combine eval $(elphmodenv X) with mpirun -n Y python3 script.py, where the product of X an Y is the number of available CPUs, for hybrid parallelization.

elphmod.MPI.info(message, error=False, comm=<elphmod.MPI.Communicator object>)[source]: Print status message from first process.

elphmod.MPI.load(filename, shared_memory=False, txt=False, comm=<elphmod.MPI.Communicator object>)[source]

Read and broadcast NumPy data.

Parameters:

shared_memorybool, default False: Use shared memory?
txtbool, default False: Load data from text file?

Returns:

ndarray: Data.

elphmod.MPI.matrix(size, comm=<elphmod.MPI.Communicator object>)[source]: Create sub-communicators.

elphmod.MPI.shared_array(shape, dtype=<class 'float'>, shared_memory=True, single_memory=False, only_info=False, comm=<elphmod.MPI.Communicator object>)[source]

Create array whose memory is shared among all processes on same node.

With shared_memory=False (single_memory=True) a conventional array is created on each (only one) processor, which however allows for the same broadcasting syntax as shown below.

# Set up huge array:
node, images, array = shared_array(2 ** 30, dtype=np.uint8)

# Write data on one node:
if comm.rank == 0:
    array[:] = 0

# Broadcast data to other nodes:
if node.rank == 0:
    images.Bcast(array)

# Wait if node.rank != 0:
comm.Barrier()

elphmod.MPI.shm_split(comm=<elphmod.MPI.Communicator object>, shared_memory=True)[source]

Create communicators for use with shared memory.

Parameters:

commMPI.Intracomm: Overarching communicator.
shared_memorybool, default True: Use shared memory? Provided for convenience.

Returns:

nodeMPI.Intracomm: Communicator between processes that share memory (on the same node).
imagesMPI.Intracomm: Communicator between processes that have the same node.rank.

Warning

If shared memory is not implemented, each process shares memory only with itself. A warning is issued.

Notes

Visualization for a machine with 2 nodes with 4 processors each:

 ________________ ________________ ________________ ________________
| comm.rank: 0   | comm.rank: 1   | comm.rank: 2   | comm.rank: 3   |
| node.rank: 0   | node.rank: 1   | node.rank: 2   | node.rank: 3   |
| images.rank: 0 | images.rank: 0 | images.rank: 0 | images.rank: 0 |
|________________|________________|________________|________________|
 ________________ ________________ ________________ ________________
| comm.rank: 4   | comm.rank: 5   | comm.rank: 6   | comm.rank: 7   |
| node.rank: 0   | node.rank: 1   | node.rank: 2   | node.rank: 3   |
| images.rank: 1 | images.rank: 1 | images.rank: 1 | images.rank: 1 |
|________________|________________|________________|________________|

Since both node.rank and images.rank are sorted by comm.rank, comm.rank == 0 is equivalent to node.rank == images.rank == 0.