Validation of Data in cythonarrays¶
The pxd-file example_cython.pxd imports shortcuts for memoryviews and the base class ArrayShapes:
from cythonarrays.numpy_types cimport *
from cythonarrays.array_shapes cimport ArrayShapes
Then it defines the cdef-class _Example, that inherits from ArrayShapes.
It will have the coordinates groups, origins, and destinations.
Paralell loops will use n_threads.
It defines different memoryviews of 1 and 2 dimensions and dtypes i4 (4-byte integer) und d (double).
These memoryviews start with a leading underscore.
In the Python-Class these arrays will be exposed later without the leading underscore as numpy arrays.
The suffix g and i in _persons_gi help later to use the right variables (self._persons_gi[g, i] = …)
The cdef class will define a method calc_model which can be called from python and cython that returns -1 if an exeption occurs.
The cdef class defines two methods _calc_weight_destination and _calc_weight_destination which can be called only from cython.
nogil means that these methods can be called in parallel threads where the global intepreter lock (gil) has been released:
cdef class _Example(ArrayShapes):
"""
Cython CDefClass for Example model
with coordinates groups, origins, and destinations
"""
cdef public long32 groups
cdef public long32 origins
cdef public long32 destinations
cdef public char n_threads
cdef public ARRAY_1D_i4 _zonenumbers_i
# parameter of groups
cdef public ARRAY_1D_d _param_g
# distance matrix (km from a to b)
cdef public ARRAY_2D_d _km_ij
# persons per group and zone
cdef public ARRAY_2D_d _persons_gi
# jobs per zone
cdef public ARRAY_1D_d _jobs_j
# resulting trip matrix
cdef public ARRAY_2D_d _trips_ij
# not initialized matrix
cdef public ARRAY_2D_i4 _not_initialized_ij
cpdef char calc_model(self) except -1
cdef double _calc_weight_destination(self, double param,
double minutes, double jobs) nogil
cdef ARRAY_1D_d _calc_p_destination(self, long32 g) nogil
The cython-module example_cython.pyx implements these methods.
It uses the __cinit__-method to search for all memoryviews defined in its .pxd-file and expose all of them, which start with a leading underscore, as numpy arrays:
cdef class _Example(ArrayShapes):
@cython.initializedcheck(False)
cpdef char calc_model(self) except -1:
"""
Calc the daily trips for all groups and zones
"""
cdef char t
cdef long32 g
self.reset_array('trips_ij')
with nogil, parallel(num_threads=self.n_threads):
t = threadid()
# loop over groups
for g in prange(self.groups, schedule='guided'):
with gil:
self.logger.info('calculate group {}'.format(g))
# calc destination choice
self._calc_p_destination(g)
Now the cython and cpdef methods are implemented.
The example uses parallel threads to calculate the model for different groups.
Within the parallel sessions, pure python-methods like a logging function can be called by temporarily using the gil (global interpreter lock) in a block introduced by with gil:.
The ArrayShape-class defines a logger which is bound to the cdef-class.
To reset the memoryview to its default values, the method self.reset_array(‘trips_ij’) can be used.:
import numpy as np
from cythonarrays.tests.example_python import Example
km_ij = np.array([[1, 4, 5],
[2, 1, 7],
[5, 7, 2]])
jobs = np.array([100, 200, 300])
params_groups = np.array([-0.2, -0.1])
persons_gi = np.array([[100, 0, 200],
[0, 250, 50]])
groups, zones = persons_gi.shape
example = Example(groups, zones)
example.set_array('km_ij', km_ij)
example.set_array('jobs_j', jobs)
example.set_array('param_g', params_groups)
example.set_array('persons_gi', persons_gi)
return example
tempfile_h5 = tempfile.mktemp(suffix='h5')
The following test shows how the code works
class Test01_ExampleCDefClass:
"""Test the Example CDefClass"""
def test_01_test_init_array(self, persons_gi):
"""Test the Example CDefClass creation"""
groups, zones = persons_gi.shape
example = Example(groups, zones)
# set correct shape
values = np.arange(zones)
example.set_array('jobs_j', np.arange((zones)))
np.testing.assert_array_equal(example.jobs_j, values)
np.testing.assert_array_equal(example._jobs_j, values)
# init the array with a default value
example.init_array('jobs_j', default=0)
np.testing.assert_array_equal(example.jobs_j, np.zeros(zones))
# init the array with the other default value
example.init_array('jobs_j', default=2)
np.testing.assert_array_equal(example.jobs_j, np.full(zones, 2))
# new default value will be stored
example.reset_array('jobs_j')
np.testing.assert_array_equal(example.jobs_j, np.full(zones, 2))
# set new shape, keep old default value
example.init_array('jobs_j', shape='groups')
np.testing.assert_array_equal(example.jobs_j, np.full(groups, 2))
def test_02_test_shape(self, persons_gi):
"""
Test if the shapes of array are controlled correctly
"""
groups, zones = persons_gi.shape
example = Example(groups, zones)
# jobs_j should have the shape (zones)
# correct shape
arr = np.ones((zones))
example.jobs_j = arr
# try to set values with the wrong shape
arr = np.ones((groups))
message = """
Arrays are not equal
jobs_j: shape soll: [3], ist: (2,)
(mismatch 100.0%)
x: array([2])
y: array([3])
"""
with pytest.raises(AssertionError,
message=message):
example.jobs_j = arr
# change zones and try again
example.destinations = groups
# now the array fits to the shape defined
example.jobs_j = arr
def test_03_test_dimensions(self, persons_gi):
"""Test the dimensions"""
groups, zones = persons_gi.shape
example = Example(groups, zones)
# try to set shape with the wrong number of dimensions
msg = "builtins.ValueError: 1 Dimensions required, shape ['groups', 'zones'] has 2 dimensions"
with pytest.raises(ValueError,
message=msg):
example.init_array('jobs_j', shape='groups, zones')
# access a non-initialized array
target = np.empty((0, 0))
actual = example.not_initialized_ij
np.testing.assert_array_equal(actual, target)
# set arbitrary values to non-initialized 2D-Array
a = np.array([2, 3, 4])
# with less dimensions than required ...
example.not_initialized_ij = a
b = example.not_initialized_ij
# another dimension is added
assert b.ndim == 2
assert b.shape == (1, 3)
# change value in cython-memoryview
example._not_initialized_ij[0, 1] = 99
target = np.array([[2, 99, 4]])
# the value will be changed in the numpy-view of the array
np.testing.assert_array_equal(example.not_initialized_ij, target)
# as this is a view on the input-data, it will be changed there, too
np.testing.assert_array_equal(a[1], 99)
# the right number of dimensions should be fine
c = np.array([[0, 1],
[2, 3]])
example.not_initialized_ij = c
d = example.not_initialized_ij
np.testing.assert_array_equal(c, d)
# too many dimensions will raise an error
e = np.array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
with pytest.raises(AssertionError,
message="not_initialized_ij: ndim soll: 2, ist: 3"):
example.not_initialized_ij = e
# when the dimensions can be reduced to the target dimensions,
# it should be fine
f = np.array([[[0, 1],
[2, 3]]])
example.not_initialized_ij = f
g = example.not_initialized_ij
np.testing.assert_array_equal(g, f[0])
# change value in cython-memoryview
example._not_initialized_ij[1, 1] = 88
target = np.array([[0, 1],
[2, 88]])
# the value will be changed in the numpy-view of the array
np.testing.assert_array_equal(example.not_initialized_ij, target)
# as this is a view on the input-data, it will be changed there, too
np.testing.assert_array_equal(f[0, 1, 1], 88)
def test_04_test_dtype(self, persons_gi):
"""Test the dimensions"""
groups, zones = persons_gi.shape
example = Example(groups, zones)
# dtype of jobs_j: f8
# dtype of not_initialized_ij: i4
# dtype of n_threads: char
# test if n_threads is really a signed char
example.n_threads = 127
assert example.n_threads == 127
with pytest.raises(OverflowError,
message='value too large to convert to char'):
example.n_threads = 128
# test if jobs_j is really an array of double precision
arr_i4 = np.array([2, 3, 4], dtype='i4')
example.jobs_j = arr_i4
assert example.jobs_j.dtype == np.dtype('f8')
# because the array set to jobs_j is of different dtype,
# it had to be converted to double. So no view is used,
# but a copy of the data
arr_i4[1] = -1
# the data in jobs_j stays untouched
assert example.jobs_j[1] == 3
# test if not_initialized_ij is really an array of 32bit integer
arr_i4 = np.array([[2, 3, 4], [5, 6, 7]], dtype='i4')
example.not_initialized_ij = arr_i4
assert example.not_initialized_ij.dtype == np.dtype('i4')
# because the array set to not_initialized_ij is of the same dtype,
# a view on the data could have been used
arr_i4[1, 2] = -1
# the data in not_initialized_ij changes
assert example.not_initialized_ij[1, 2] == -1
def test_10_test_model(self, example):
"""Test the Example CDefClass model"""
res = example.calc_model()
print(example.trips_ij)
total_trips_target = example.persons_gi.sum()
total_trips_actual = example.trips_ij.sum()
np.testing.assert_almost_equal(total_trips_target, total_trips_actual)
def test_11_dataset(self, example):
"""Test the creation of an xarrays Dataset linked to the Cythonclass"""
example.zonenumbers_i = np.array([100, 200, 300])
example.groupnames_g = np.array(['Female', 'Male'], dtype='O')
example.create_ds()
print(example.ds)
def test_20_save_and_read_ds(self, example, tempfile_h5):
"""Test that saving and re-reading the data works"""
example.zonenumbers_i = np.array([100, 200, 300])
example.groupnames_g = np.array(['Female', 'Male'], dtype='O')
example.create_ds()
example.save_dataset_to_netcdf(tempfile_h5)
# read the data into new class
new_example = Example.from_netcdf(tempfile_h5)
# assert that the data match
np.testing.assert_array_equal(example.jobs_j, new_example.jobs_j)
np.testing.assert_array_equal(example.groupnames_g,
new_example.groupnames_g)
# assert, that the array are correctly linked to the Dataset
new_example.calc_model()
old_value = new_example.ds.trips_ij.sum()
# calculate the model with 77 persons more
new_example.persons_gi[1, 0] += 77
new_example.calc_model()
new_value = new_example.ds.trips_ij.sum()
np.testing.assert_allclose(old_value + 77, new_value)
print(new_example.ds)
if __name__ == '__main__':
pytest.main()
How to work with Cythonarrays CDef-Classes¶
link numpy and openmp sources¶
In order to use Cython with numpy, the numpy sources have to be linked to the Cython code.
One way is to use for each Cython module 4 files:
mymodule_cython.pyx : The actual Cython code
mymodule_cython.pxd : The header files with the definition of all Classes and its attributes and methods
mymodule_cython.pyxbld : the compiler directives
mymodule.py : a python module, that imports the cython code and wraps it to normal python modules
the .pyxbld file will import helpers from cythoninstallhelpers:
from cythoninstallhelpers.build_config import (extra_compile_args,
extra_link_args,
make_ext,)
This tells the compiler to use the platform specific openmp-libraries and to include the numpy dirs.
Cython cdef classes¶
Cython cdef classes are extension types for cython.
All methods and class attributes, that should be used with C-Speed have to by defined. This can be done in the declaration .pxd file:
cdef class MyCythonClass(object):
"""A Cython class"""
cdef unsigned long i, j
cdef public float U
cdef readonly double result
cpdef double calc_result(self, float U)
cdef calc_p(self, unsigned long i, unsigned long j) nogil
Class attributes defined as public are also readable and writable from Python.
Class attributes defined as readonly are readable from Python but not writable.
Class attributes defined without a public or writable flag are only accessible from within a cdef-Cython function.
Methods defined as cdef are only accessible from other class methods.
Methods defined as cpdef are also usable as python classes.
If the method is defined as nogil, then they can be used within a parallel calculation at C-speed.
This method does not need the “Global Interpreter Lock” (GIL). So a method can only be nogil, if no python attributes or methods are used within the function and only other nogil functions are called.
A cpdef method cannot be nogil at the same time.
A nogil function cannot return nothing, so specify at least a return type like char.
memoryviews and numpy arrays¶
A comfortable way to work with numpy arrays in cdef classes is the following:
the cdef class holds an memoryview on the data as an attribute the memoryview can be accessed in C-Speed from within cython functions
from python the data is accessible as a normal numpy array
To facilitate this, the cython-module array_shapes.pyx and the python-module array_types have been written.
Let your Cython class inherit from ArrayShapes:
from cythonarrays.array_shapes cimport ArrayShapes
from cythonarrays.array_shapes import ArrayShapes
import the numpy array types:
from cythonarrays.numpy_types cimport *
specify all arrays that you need in the mymodule_cython.pxd-file with a leading underscore:
cdef class _Example(ArrayShapes):
cdef public ARRAY_2D_f _myfloatarr
cdef public ARRAY_3D_i4 _myintarr
cdef public ARRAY_1D_d _mydoublevector
cdef public int n_rows, n_blocks, n_cols
The method search_memview(cls), that is called from the __cinit__(self)-method of ArrayShapes, searches all memoryviews in the class and the base class.
Create a wrapper Python class in a python module mymodule.py, that inherits from _MyCythonClass and from the Python-Class _ArrayProperties:
import pyximport
pyximport.install()
from mymodule_cython import _MyCythonClass
from cythonarrays.array_properties import _ArrayProperties
class MyClass(_MyCythonClass, _ArrayProperties):
def __init__(self, n_rows, n_cols, n_blocks, *args, **kwargs):
super().__init__(*args, **kwargs)
self.n_rows = n_rows
self.n_cols = n_cols
self.n_blocks = n_blocks
This creates automatically properties for a comfortable access to the array:
>>> myinstance = MyClass(n_rows=4, n_cols=5, n_blocks=6)
The array can be initialised by:
>>> shape = ('n_blocks', 'n_rows', 'n_cols')
>>> myinstance.init_array('myintarr', shape, default=-1)
>>> shape = (6, )
>>> myinstance.init_array('mydoublevector', shape)
or with some data:
>>> arr = np.random.random((4, 5)).astype('f8')
>>> shape = ('n_rows', 'n_cols')
>>> myinstance.set_array('myfloatarr', arr, shape)
In this case, the dtype is automatically casted to the target type of the class attribute (in this case: f4). And the shape is checked. If the shape does not match, an error is raised. The Data is accessible form Python via:
>>> intarr = myinstance.myintarr
>>> intarr.dtype
int32
>>> intarr.shape
(6, 4, 5)
>>> intarr[2, 2:4, 1]
array([-1, -1])
>>> intarr[0] *= 2
and from within a cython function:
cdef class _MyCythonClass(_ArrayShapes):
cpdef sum_mult_by_block(self):
cdef int block, row, col
cdef double res
for block in range(self.n_blocks):
res = 0
for row in range(self.n_rows):
for col in range(self.n_cols):
res += self._myintarr[block, row, col] * self._myfloatarr[row, col]
self._mydoublevector[block] = res
>>> myinstance.sum_mult_by_block()
>>> myinstance._mydoublevector
array([-40., -20., -20., -20., -20., -20.])
You can define an Array within a cdef function:
cdef class _MyCythonClass(_ArrayShapes):
cpdef sum_mult_by_block(self):
cdef int block, row, col
cdef ARRAY_1D_d vec = self._mydoublevector
cdef double res
for block in range(self.n_blocks):
res = 0
for row in range(self.n_rows):
for col in range(self.n_cols):
res += self._myintarr[block, row, col] * self._myfloatarr[row, col]
vec[block] = res
but don’t do that in a subfunction, that is called many times, because assigning memory to the variable vec a costly operation.
Link Cythonarrays-Class to xarray-Dataset¶
You can create an xarray-Dataset which infers the dimensions, coordinates, and data variables from the cdef-class.
>>> example = Example()
>>> example.create_ds()
>>> print(example.ds)
<xarray.Dataset>
Dimensions: (destinations: 3, dim_0: 0, dim_1: 0, groups: 2, origins: 3)
Coordinates:
* destinations (destinations) int32 100 200 300
* groups (groups) object 'Female' 'Male'
* origins (origins) int32 100 200 300
Dimensions without coordinates: dim_0, dim_1
Data variables:
param_g (groups) float64 -0.2 -0.1
trips_ij (origins, destinations) float64 29.02 31.86 39.12 ...
groupnames_g (groups) object 'Female' 'Male'
not_initialized_ij (dim_0, dim_1) int32
persons_gi (groups, origins) float64 100.0 0.0 200.0 0.0 250.0 50.0
zonenumbers_i (origins) int32 100 200 300
jobs_j (destinations) float64 100.0 200.0 300.0
km_ij (origins, destinations) float64 1.0 4.0 5.0 2.0 1.0 ...
xarray-Dataset is linked to cython class¶
The Data variables of the xarray-Dataset share the same memory with the attributes of the cdef-Cython-Class. So when a cdef function modifies a value in example._trips_ij
>>> self._trips_ij[1, 2] = 99
then the value is changed directly in the xarray-Dataset
>>> print(self.ds.trips_ij.values[1, 2])
99.0