Validation of Data in cythonarrays ================================== The pxd-file example_cython.pxd imports shortcuts for memoryviews and the base class ArrayShapes:: from cythonarrays.numpy_types cimport * from cythonarrays.array_shapes cimport ArrayShapes Then it defines the cdef-class _Example, that inherits from ArrayShapes. It will have the coordinates `groups`, `origins`, and `destinations`. Paralell loops will use `n_threads`. It defines different memoryviews of 1 and 2 dimensions and dtypes i4 (4-byte integer) und d (double). These memoryviews start with a leading underscore. In the Python-Class these arrays will be exposed later without the leading underscore as numpy arrays. The suffix *g* and *i* in *_persons_gi* help later to use the right variables (self._persons_gi[g, i] = ...) The cdef class will define a method *calc_model* which can be called from python and cython that returns -1 if an exeption occurs. The cdef class defines two methods *_calc_weight_destination* and *_calc_weight_destination* which can be called only from cython. `nogil` means that these methods can be called in parallel threads where the global intepreter lock (gil) has been released:: cdef class _Example(ArrayShapes): """ Cython CDefClass for Example model with coordinates groups, origins, and destinations """ cdef public long32 groups cdef public long32 origins cdef public long32 destinations cdef public char n_threads cdef public ARRAY_1D_i4 _zonenumbers_i # parameter of groups cdef public ARRAY_1D_d _param_g # distance matrix (km from a to b) cdef public ARRAY_2D_d _km_ij # persons per group and zone cdef public ARRAY_2D_d _persons_gi # jobs per zone cdef public ARRAY_1D_d _jobs_j # resulting trip matrix cdef public ARRAY_2D_d _trips_ij # not initialized matrix cdef public ARRAY_2D_i4 _not_initialized_ij cpdef char calc_model(self) except -1 cdef double _calc_weight_destination(self, double param, double minutes, double jobs) nogil cdef ARRAY_1D_d _calc_p_destination(self, long32 g) nogil The cython-module `example_cython.pyx` implements these methods. It uses the `__cinit__`-method to search for all memoryviews defined in its .pxd-file and expose all of them, which start with a leading underscore, as numpy arrays:: cdef class _Example(ArrayShapes): @cython.initializedcheck(False) cpdef char calc_model(self) except -1: """ Calc the daily trips for all groups and zones """ cdef char t cdef long32 g self.reset_array('trips_ij') with nogil, parallel(num_threads=self.n_threads): t = threadid() # loop over groups for g in prange(self.groups, schedule='guided'): with gil: self.logger.info('calculate group {}'.format(g)) # calc destination choice self._calc_p_destination(g) Now the cython and cpdef methods are implemented. The example uses parallel threads to calculate the model for different groups. Within the parallel sessions, pure python-methods like a logging function can be called by temporarily using the gil (global interpreter lock) in a block introduced by `with gil:`. The `ArrayShape`-class defines a logger which is bound to the cdef-class. To reset the memoryview to its default values, the method `self.reset_array('trips_ij')` can be used.:: import numpy as np from cythonarrays.tests.example_python import Example km_ij = np.array([[1, 4, 5], [2, 1, 7], [5, 7, 2]]) jobs = np.array([100, 200, 300]) params_groups = np.array([-0.2, -0.1]) persons_gi = np.array([[100, 0, 200], [0, 250, 50]]) groups, zones = persons_gi.shape example = Example(groups, zones) example.set_array('km_ij', km_ij) example.set_array('jobs_j', jobs) example.set_array('param_g', params_groups) example.set_array('persons_gi', persons_gi) return example tempfile_h5 = tempfile.mktemp(suffix='h5') The following test shows how the code works :: class Test01_ExampleCDefClass: """Test the Example CDefClass""" def test_01_test_init_array(self, persons_gi): """Test the Example CDefClass creation""" groups, zones = persons_gi.shape example = Example(groups, zones) # set correct shape values = np.arange(zones) example.set_array('jobs_j', np.arange((zones))) np.testing.assert_array_equal(example.jobs_j, values) np.testing.assert_array_equal(example._jobs_j, values) # init the array with a default value example.init_array('jobs_j', default=0) np.testing.assert_array_equal(example.jobs_j, np.zeros(zones)) # init the array with the other default value example.init_array('jobs_j', default=2) np.testing.assert_array_equal(example.jobs_j, np.full(zones, 2)) # new default value will be stored example.reset_array('jobs_j') np.testing.assert_array_equal(example.jobs_j, np.full(zones, 2)) # set new shape, keep old default value example.init_array('jobs_j', shape='groups') np.testing.assert_array_equal(example.jobs_j, np.full(groups, 2)) def test_02_test_shape(self, persons_gi): """ Test if the shapes of array are controlled correctly """ groups, zones = persons_gi.shape example = Example(groups, zones) # jobs_j should have the shape (zones) # correct shape arr = np.ones((zones)) example.jobs_j = arr # try to set values with the wrong shape arr = np.ones((groups)) message = """ Arrays are not equal jobs_j: shape soll: [3], ist: (2,) (mismatch 100.0%) x: array([2]) y: array([3]) """ with pytest.raises(AssertionError, message=message): example.jobs_j = arr # change zones and try again example.destinations = groups # now the array fits to the shape defined example.jobs_j = arr def test_03_test_dimensions(self, persons_gi): """Test the dimensions""" groups, zones = persons_gi.shape example = Example(groups, zones) # try to set shape with the wrong number of dimensions msg = "builtins.ValueError: 1 Dimensions required, shape ['groups', 'zones'] has 2 dimensions" with pytest.raises(ValueError, message=msg): example.init_array('jobs_j', shape='groups, zones') # access a non-initialized array target = np.empty((0, 0)) actual = example.not_initialized_ij np.testing.assert_array_equal(actual, target) # set arbitrary values to non-initialized 2D-Array a = np.array([2, 3, 4]) # with less dimensions than required ... example.not_initialized_ij = a b = example.not_initialized_ij # another dimension is added assert b.ndim == 2 assert b.shape == (1, 3) # change value in cython-memoryview example._not_initialized_ij[0, 1] = 99 target = np.array([[2, 99, 4]]) # the value will be changed in the numpy-view of the array np.testing.assert_array_equal(example.not_initialized_ij, target) # as this is a view on the input-data, it will be changed there, too np.testing.assert_array_equal(a[1], 99) # the right number of dimensions should be fine c = np.array([[0, 1], [2, 3]]) example.not_initialized_ij = c d = example.not_initialized_ij np.testing.assert_array_equal(c, d) # too many dimensions will raise an error e = np.array([[[0, 1], [2, 3]], [[4, 5], [6, 7]]]) with pytest.raises(AssertionError, message="not_initialized_ij: ndim soll: 2, ist: 3"): example.not_initialized_ij = e # when the dimensions can be reduced to the target dimensions, # it should be fine f = np.array([[[0, 1], [2, 3]]]) example.not_initialized_ij = f g = example.not_initialized_ij np.testing.assert_array_equal(g, f[0]) # change value in cython-memoryview example._not_initialized_ij[1, 1] = 88 target = np.array([[0, 1], [2, 88]]) # the value will be changed in the numpy-view of the array np.testing.assert_array_equal(example.not_initialized_ij, target) # as this is a view on the input-data, it will be changed there, too np.testing.assert_array_equal(f[0, 1, 1], 88) def test_04_test_dtype(self, persons_gi): """Test the dimensions""" groups, zones = persons_gi.shape example = Example(groups, zones) # dtype of jobs_j: f8 # dtype of not_initialized_ij: i4 # dtype of n_threads: char # test if n_threads is really a signed char example.n_threads = 127 assert example.n_threads == 127 with pytest.raises(OverflowError, message='value too large to convert to char'): example.n_threads = 128 # test if jobs_j is really an array of double precision arr_i4 = np.array([2, 3, 4], dtype='i4') example.jobs_j = arr_i4 assert example.jobs_j.dtype == np.dtype('f8') # because the array set to jobs_j is of different dtype, # it had to be converted to double. So no view is used, # but a copy of the data arr_i4[1] = -1 # the data in jobs_j stays untouched assert example.jobs_j[1] == 3 # test if not_initialized_ij is really an array of 32bit integer arr_i4 = np.array([[2, 3, 4], [5, 6, 7]], dtype='i4') example.not_initialized_ij = arr_i4 assert example.not_initialized_ij.dtype == np.dtype('i4') # because the array set to not_initialized_ij is of the same dtype, # a view on the data could have been used arr_i4[1, 2] = -1 # the data in not_initialized_ij changes assert example.not_initialized_ij[1, 2] == -1 def test_10_test_model(self, example): """Test the Example CDefClass model""" res = example.calc_model() print(example.trips_ij) total_trips_target = example.persons_gi.sum() total_trips_actual = example.trips_ij.sum() np.testing.assert_almost_equal(total_trips_target, total_trips_actual) def test_11_dataset(self, example): """Test the creation of an xarrays Dataset linked to the Cythonclass""" example.zonenumbers_i = np.array([100, 200, 300]) example.groupnames_g = np.array(['Female', 'Male'], dtype='O') example.create_ds() print(example.ds) def test_20_save_and_read_ds(self, example, tempfile_h5): """Test that saving and re-reading the data works""" example.zonenumbers_i = np.array([100, 200, 300]) example.groupnames_g = np.array(['Female', 'Male'], dtype='O') example.create_ds() example.save_dataset_to_netcdf(tempfile_h5) # read the data into new class new_example = Example.from_netcdf(tempfile_h5) # assert that the data match np.testing.assert_array_equal(example.jobs_j, new_example.jobs_j) np.testing.assert_array_equal(example.groupnames_g, new_example.groupnames_g) # assert, that the array are correctly linked to the Dataset new_example.calc_model() old_value = new_example.ds.trips_ij.sum() # calculate the model with 77 persons more new_example.persons_gi[1, 0] += 77 new_example.calc_model() new_value = new_example.ds.trips_ij.sum() np.testing.assert_allclose(old_value + 77, new_value) print(new_example.ds) if __name__ == '__main__': pytest.main() How to work with Cythonarrays CDef-Classes ========================================== link numpy and openmp sources ----------------------------- In order to use Cython with numpy, the numpy sources have to be linked to the Cython code. One way is to use for each Cython module 4 files: * mymodule_cython.pyx : The actual Cython code * mymodule_cython.pxd : The header files with the definition of all Classes and its attributes and methods * mymodule_cython.pyxbld : the compiler directives * mymodule.py : a python module, that imports the cython code and wraps it to normal python modules the .pyxbld file will import helpers from cythoninstallhelpers:: from cythoninstallhelpers.build_config import (extra_compile_args, extra_link_args, make_ext,) This tells the compiler to use the platform specific openmp-libraries and to include the numpy dirs. Cython cdef classes ------------------- Cython cdef classes are extension types for cython. All methods and class attributes, that should be used with C-Speed have to by defined. This can be done in the declaration .pxd file:: cdef class MyCythonClass(object): """A Cython class""" cdef unsigned long i, j cdef public float U cdef readonly double result cpdef double calc_result(self, float U) cdef calc_p(self, unsigned long i, unsigned long j) nogil Class attributes defined as **public** are also readable and writable from Python. Class attributes defined as **readonly** are readable from Python but not writable. Class attributes defined without a **public** or **writable** flag are only accessible from within a cdef-Cython function. Methods defined as cdef are only accessible from other class methods. Methods defined as cpdef are also usable as python classes. If the method is defined as **nogil**, then they can be used within a parallel calculation at C-speed. This method does not need the "Global Interpreter Lock" (GIL). So a method can only be **nogil**, if no python attributes or methods are used within the function and only other **nogil** functions are called. A cpdef method cannot be **nogil** at the same time. A **nogil** function cannot return nothing, so specify at least a return type like char. memoryviews and numpy arrays ---------------------------- A comfortable way to work with numpy arrays in cdef classes is the following: * the cdef class holds an memoryview on the data as an attribute the memoryview can be accessed in C-Speed from within cython functions * from python the data is accessible as a normal numpy array To facilitate this, the cython-module array_shapes.pyx and the python-module array_types have been written. Let your Cython class inherit from ArrayShapes:: from cythonarrays.array_shapes cimport ArrayShapes from cythonarrays.array_shapes import ArrayShapes import the numpy array types:: from cythonarrays.numpy_types cimport * specify all arrays that you need in the mymodule_cython.pxd-file with a leading underscore:: cdef class _Example(ArrayShapes): cdef public ARRAY_2D_f _myfloatarr cdef public ARRAY_3D_i4 _myintarr cdef public ARRAY_1D_d _mydoublevector cdef public int n_rows, n_blocks, n_cols The method `search_memview(cls)`, that is called from the `__cinit__(self)`-method of ArrayShapes, searches all memoryviews in the class and the base class. Create a wrapper Python class in a python module mymodule.py, that inherits from _MyCythonClass and from the Python-Class _ArrayProperties:: import pyximport pyximport.install() from mymodule_cython import _MyCythonClass from cythonarrays.array_properties import _ArrayProperties class MyClass(_MyCythonClass, _ArrayProperties): def __init__(self, n_rows, n_cols, n_blocks, *args, **kwargs): super().__init__(*args, **kwargs) self.n_rows = n_rows self.n_cols = n_cols self.n_blocks = n_blocks This creates automatically properties for a comfortable access to the array:: >>> myinstance = MyClass(n_rows=4, n_cols=5, n_blocks=6) The array can be initialised by:: >>> shape = ('n_blocks', 'n_rows', 'n_cols') >>> myinstance.init_array('myintarr', shape, default=-1) >>> shape = (6, ) >>> myinstance.init_array('mydoublevector', shape) or with some data:: >>> arr = np.random.random((4, 5)).astype('f8') >>> shape = ('n_rows', 'n_cols') >>> myinstance.set_array('myfloatarr', arr, shape) In this case, the dtype is automatically casted to the target type of the class attribute (in this case: f4). And the shape is checked. If the shape does not match, an error is raised. The Data is accessible form Python via:: >>> intarr = myinstance.myintarr >>> intarr.dtype int32 >>> intarr.shape (6, 4, 5) >>> intarr[2, 2:4, 1] array([-1, -1]) >>> intarr[0] *= 2 and from within a cython function:: cdef class _MyCythonClass(_ArrayShapes): cpdef sum_mult_by_block(self): cdef int block, row, col cdef double res for block in range(self.n_blocks): res = 0 for row in range(self.n_rows): for col in range(self.n_cols): res += self._myintarr[block, row, col] * self._myfloatarr[row, col] self._mydoublevector[block] = res >>> myinstance.sum_mult_by_block() >>> myinstance._mydoublevector array([-40., -20., -20., -20., -20., -20.]) You can define an Array within a cdef function:: cdef class _MyCythonClass(_ArrayShapes): cpdef sum_mult_by_block(self): cdef int block, row, col cdef ARRAY_1D_d vec = self._mydoublevector cdef double res for block in range(self.n_blocks): res = 0 for row in range(self.n_rows): for col in range(self.n_cols): res += self._myintarr[block, row, col] * self._myfloatarr[row, col] vec[block] = res but don't do that in a subfunction, that is called many times, because assigning memory to the variable *vec* a costly operation. Link Cythonarrays-Class to xarray-Dataset ========================================= You can create an `xarray-Dataset `_ which infers the dimensions, coordinates, and data variables from the cdef-class. >>> example = Example() >>> example.create_ds() >>> print(example.ds) Dimensions: (destinations: 3, dim_0: 0, dim_1: 0, groups: 2, origins: 3) Coordinates: * destinations (destinations) int32 100 200 300 * groups (groups) object 'Female' 'Male' * origins (origins) int32 100 200 300 Dimensions without coordinates: dim_0, dim_1 Data variables: param_g (groups) float64 -0.2 -0.1 trips_ij (origins, destinations) float64 29.02 31.86 39.12 ... groupnames_g (groups) object 'Female' 'Male' not_initialized_ij (dim_0, dim_1) int32 persons_gi (groups, origins) float64 100.0 0.0 200.0 0.0 250.0 50.0 zonenumbers_i (origins) int32 100 200 300 jobs_j (destinations) float64 100.0 200.0 300.0 km_ij (origins, destinations) float64 1.0 4.0 5.0 2.0 1.0 ... xarray-Dataset is linked to cython class ---------------------------------------- The Data variables of the xarray-Dataset share the same memory with the attributes of the cdef-Cython-Class. So when a cdef function modifies a value in example._trips_ij >>> self._trips_ij[1, 2] = 99 then the value is changed directly in the xarray-Dataset >>> print(self.ds.trips_ij.values[1, 2]) 99.0