Using the Training Pipeline¶

This page gives further details about the steps described in the basic recipe and describes the all configurable options for the pipeline of the training package.

How it works
Data Format
Configuration of Parameters
Running elektronn-train
CNN Command Line

How it works ¶

After you set up the configuration and run elektronn-train:

The configuration file is parsed and some consistency checks are made (but not every possible combination of anything…). From the architecture parameters a elektronn.net.netutils.CNNCalculator() object is created (this checks the architecture and gives a list of valid input sizes, from which the closest is chosen automatically).

A save directory is created, the cwd is set to this directory, the configuration file is copied to a Backup sub-directory in the save directory.

A elektronn.training.CNNData.CNNData object is created. All CNN relevant architecture parameters (input size, offsets, strides etc.) are passed over, such that suitable patches for training can be created. The data is read from disk into (CPU-)RAM. The required RAM is approx. (number of training image pixels) * 32bit + (number of training label pixels) * 16bit. The sub-processes for don’t copy the data, so using the does not increase the physical RAM usage a lot.

The CNN is created and the training functions are compiled. Compilation of the gradient can take up to several minutes the first time. For subsequent runs, parts of the binaries are cached and compilation becomes significantly faster.

The training loop starts. In each iteration:

From the whole training data set a batch is created by sampling random locations in the image arrays and “cutting” patches from these locations. The patches are augmented randomly according to the configuration.

The CNN makes one update step on the batch (or several steps for CG and l-BFGS)

Every history_freq steps: the current performance is estimated on a larger number of monitor patches. Some numbers are printed, plots are made, and a log file is dumped into Backup.

A backup of the current parameters is overwritten every history_freq steps

Every hour a persistent snapshot of the CNN parameters/weights is saved

Every 3 hours the training is paused and a sample image is predicted as a preview (firstly after 1 hour)

During this iteration you can use the CNN console accessible via ctrl+c.

After the maximal steps or maximal runtime are reached, everything is shutdown and the latest parameters are written to disk.

Data Format ¶

Transform your data arrays to h5 data sets in separate files for images and labels.

images: shape (x,y,z) or (channel,x,y,z), either float (0,1) or uint8 (0,255)

labels: shape (x,y,z)

for classification: labels contain integer numbers encoding the class membership, starting from 0, consecutively to (#classes-1)

for regression: labels contain float numbers

for 2D images the dimension z can be viewed as the axis along which the instances of the training set are stacked

for whole image classification the labels must be 1d

The z-axis is the the axis which has different units than x and y (e.g. different spatial resolution, or time).

Note

For classification tasks you must think of a suitable way to present the training images to the CNN. E.g. if you want to do neurite segmentation in EM images and you have ground truth images in which each pixel carries the ID of the neurite it belongs to, this will not work by using this ground truth as labels directly: the IDs are arbitrary (they could also be permuted) and there may be thousands of distinct IDs, a CNN cannot learn this representation. The task must be formulated in such a way that there are just a few general classes - in this case membrane boundaries and background (or other objects such as mitochondria). The conversion of ground truth is a step that must be carried out by yourself depending on your particular problem that you want to solve.

Conversion of ID-labels to a binary boundary map.

Note

For 2D data you can stack the set of 2D images (and likewise the labels) to a cube and train by iterating over the z-slices (then the option anisotropic_data must be set to True).

Note

For img-scalar training the labels which correspond to a stack of 2D images (see note above) are a vector with an entry for every image.

In the configuration file two lists must be specified consisting of tuples of the form (<file name>, <h5 data set name/key in that file>). One list for the images and one for the labels, both must have the same order.

The data types are preferably uint8 which makes files small and loading fast (255 gray values for images). For the labels it pays off to use the compression option of h5 (they might be compressible by a great factor). Note that in elektronn.training.CNNData.CNNData the integer image data is internally converted to float32 and divided by 255 (to normalise it to [0,1]); the labels to int16 for classification or float32 for regression.

Offsets for img-img Training¶

CNNs can only make predictions with offsets from the image border (see the image below. The offset is cause by convolutions with boundary mode “valid” (the size of the offset can be calculated using elektronn.net.netutils.CNNCalculator()). This implies that for a given labelled image area the raw image area required is larger. So if possible, provide images that are larger than the labels by at least the offset, to make full use of you labelled data. Or conversely never label your data in the offset stripes! The only important condition is that the labels and images must be symmetrically registered to their center. Then the images are cropped or the labels are 0-padded depending on the offset automatically. A 1d example: label.shape=(5) and image.shape=(7) —> the offset is 1 (on either side) and image[i+1] corresponds to label[i]; in particular image[3] corresponds to label[2], the centers of both arrays.

Configuration of Parameters ¶

There are three levels of parameter configuration, higher levels override previous levels:

The master default values are hardcoded into python code in elektronn.training.config.MasterConfig.
Users can set their own default values by editing the file examples/config_template.py (which is just a template and otherwise ignored). The user file must be put into the home directory as ~/.elektronn.config, from there is automatically read and overrides the master defaults. The default values found in the template are intended to provide guidance on some meta-parameters (e.g. learning rate, momentum) and to define certain behaviour of the pipeline (e.g. default save path, save intervals), see section Pipeline Setup.
Specific values for training a particular CNN/NN configuration should be set in a different file (again by editing config_template.py as new file). The path of this file is given as the config argument to the elektronn-train-script. Specific values can override any default values and are mainly used to specify the CNN architecture and the training data options. Some values are mandatory to be provided specifically for each training (e.g. network architecture, data files, save name) - if such a value is not provided a warning is shown.

The configuration file is basically a python file that contains assignments of values to variables. You can use even use list comprehensions to create lists of file names, but then you must del the iteration variable (because this variable would also be read in, but it is not a valid config value) e.g:

d_files          = [('imgs_padd_%i.h5' % ii, 'raw') for ii in range(8)]
l_files          = [('mem_z_%i.h5' % ii, 'labels') for ii in range(8)]
del ii

This page only describes what the values do, for advice on how to find good settings refer to the section Practical Introduction to Neural Networks.

The “mode” coloumn indicates whether a parameter should be set by the user (!) or if this parameters is only needed in special cases and need not be configured normally ($). No indication means it can be left at default initially, but tweaking might improve results. Default undefined means that there might be some value as default, but you should not assume that the default value is sufficient for your situation.

Note

These configurations do not configure theano (except for the option to set a default device). Theano must be configured via .theanorc, also see Installation.

General Setup¶

Name	Mode	Type	Default	Explanation
save_path		`string` (trailing ‘/’!)	`undefined`	At this location a new directory with the name `save_name` is created. If this directory not exists it is created.
plot_on	$	`bool`	`True`	If `True` frequently plots of the training progress are created and saved to the save directory. As plotting is done in a sub-process this does not slow down training.
print_status	$	`bool`	`True`	If `True` frequently several values (loss, training error, validation error if available etc.) are printed to the console
device		`False`/`int`	False	Default device to initialise, if not given as commandline arg. False –> use .theanorc value or int –> use gpu<i>
param_save_h		`float`	1.0	hours: frequency to save a permanent parameter snapshot
initial_prev_h		`float`	1.0	hours: time after which first preview is made
prev_save_h		`float`	3.0	hours: frequency to create previews

Paths and General¶

Name	Mode	Type	Default	Explanation
save_name	!	`string`	`undefined`	The name of the save directory and the prefix for all created files
overwrite		`bool`	`True`	If set to `False` and the Training script finds an existing directory of same name, it terminates before overwriting any files. Use as a safeguard.
param_file	$	`string`	`None`	Absolute path of a parameter file. A new network can be initialised with parameters of another (already trained) network.

Network Architecture¶

Note

The output layer is added automatically (with n_lab outputs). I.e. the total number of layers is len(nof_filters)+ len(MLP_layers) + 1.

General¶

Name	Mode	Type	Default	Explanation
activation_func		`string` or list therof	‘relu’	Global value or entry per layer. Possible values are: tanh, abs, linear, sig, relu. If list, length must equal number of specified layers (conv+mlp).
batch_size	!	`int`	1	Number of patches (i.e. training examples sliced from different locations in the training data) to use for an update step.
dropout_rates		list of `float` (0,1)	`[]`	The “fail”-rates per layer or globally. Empty list disables dropout. The last layer has always no dropout. If list, length must equal number of specified layers (conv+mlp).

Convolutional¶

Name	Mode	Type	Default	Explanation
n_dim	!	`int`	2	Spatial dimensionality of CNN (2 or 3). Channels of multi-channels input images (e.g. RGB) are not counted as a dimension as they are not spatial.
desired_input	!	`int` or 2/3-tuple	200	Desired input size. This must be smaller than the size of the training images. If this is a scalar the size is used in all dimensions, if a tuple is uses each dimension has another size (only the z-dimension should be smaller for “flat” CNNs or anisotropic data). These sizes are not directly used but the next size that gives a valid architectures is automatically selected.
filters	!	see note 2	`undefined`	List of filter size in each layer
pool	!	see note 2	`undefined`	List of maxpooling factor for each layer
nof_filters	!	list of `int`	`undefined`	List of number of filters for each layer
pooling_mode	$	`string` or list thereof	‘max’	Select pooling function (globally or per layer). Available: ‘max’, ‘maxabs’. Maxabs takes maximum over absolute values, but then adds sign again to the value.
MFP	!	list of `bool` or 0/1	`[]`	List whether to apply max fragment pooling for each layer. MFP is only intended for prediction, so for training the emtpy list disables MFP.

Note

The parameters filters and pool can either be lists of ints or lists of 2/3-tuples of ints. For simple lists of ints the scalar values are used in all 2/3 CNN dimensions, for tuples each dimension has its own value. E.g. [[2,2,2], [3,3,3], [2,2,2],...] is identical to [2,3,2,...], in contrast anisotropic filters are declared like [[2,2,1],[3,3,2],...].

Multi Layer Perceptron (MLP) and Others¶

Name	Mode	Type	Default	Explanation
rnn_layer_kwargs	$	`dict`/`None`	`None`	This can install a recurrent layer before MLP-layers, the kwargs are passed to `elektronn.net.convnet.MixedConvNN.addRecurrentLayer()`. It does not together with conv-layers.
MLP_layers	!	list of `int`	`[]`	Numbers of neurons for fully connected layers after conv layers. Empty for img-img training and required for img-scalar training
target		`string`	‘nll’	Loss function, ‘nll’ or ‘regression’

Data Options¶

General¶

Name	Mode	Type	Default	Explanation
mode	!	`string`	“img-img”	Mode of data and label types: `img-img`, `img-scalar`, `vect-scalar`, see Modes.
background_processes	!	`bool`/`int`	`False`	Whether to “pre-fetch” batches in separate background process. This is advisable set to `True` or specify a number of cores in order to speed up training, especially when warping is used. <Bool> or number of processes (True–>2).
n_lab	!	`int` or `None`	`undefined`	Number of distinct labels i.e. different classes. If `None` this is detected automatically, but that is very slow.

Warning

When using background processes, the main process should not be killed from outside. Instead abort using the CNN console via ctrl+c and kill or abort, otherwise the sub-processes become zombies and clutter the RAM.

Images/CNN¶

This block is ignored for mode vect-scalar

Name	Mode	Type	Default	Explanation
data_path	!	`string` (trailing ‘/’!)	`undefined`	Path to directory of training data files (raw images)
label_path	!	`string` (trailing ‘/’!)	`undefined`	Path to directory of training label files
d_files	!	list of tuples	`undefined`	The data files to use from the directory. Tuples contain (<file name>(`string`), <h5 data set name/key in that file>(`string`)). E.g. `[('img1.h5', 'raw'),('img2.h5', 'raw')]`
l_files	!	list of tuples	`undefined`	The label files to use from the directory. As above and in the same order!
cube_prios		`None` or list of `float`	`None`	List of SGD-sampling priorities for cubes (it must be in the same order as `d_files`! ). The priorities are relative and need not be normalised. If `None` sampling probability ~ cube size
valid_cubes		list of `int`	[]	List of cube indices (corresponding to `d_files`) to use as validation data. May be empty, then validation performances are shown as `nan`.
example_ignore_threshold	$	`float`	0.0	If the fraction of negative (this denotes unlabelled pixels) in an example patch exceeds this value, the patch is discarded and a new patch is fetched. Only needed if there are negative / unlabelled labels at all.
grey_augment_channels	!	list of `int`	[0]	Channel-indices to apply random grey value augmentation (GA) to. Use `[]` to disable. GA distorts the histogram of the raw images (darker, lighter, more/less contrast).
use_example_weights	$	`Bool`	False	Whether to use weights for the examples (e.g. for Boosting-like training). Not documented atm
flip_data		`bool`	`True`	Whether to randomly flip/rotate/mirror data for augmentation.
anisotropic_data	!	`bool`	`True`	If `True` 2D slices are only cut in z-direction, otherwise all 3 alignments are used. This can be use to train on 2D images stored as 3D arrays.
lazy_labels	$	`bool`	`False`	`True` activates special Training with lazy annotations (see lazy labels).
warp_on	!	`bool` / `float`	`False`	`True` activates random warping deformations of training examples for augmentation. Alternatively a `float` (0,1) can be used to warp only a fraction of examples randomly. If this options is used, background processes should be used at the same time. Note: to make apply warping the images/cubes must be larger than the CNN patch size : the warping parameters are sampled randomly and the the required patch size (to arrive at the CNN patch size after warping) is determined. If the required size is larger than the training images, the example is not warped (without a message). You can inspect if it works by using the CNN console and comparing `data.n_failed_warp` to `data.n_successful_warp` (you must not use background processes to make the inspection, because you cannot see their attribute values in the main thread).
pre_process	$	`string`/`None`	None	Currently implemented: “standardise” –> 0-mean, 1-std (over all pixels)
zchxy_order		`bool`	`False`	Set to `True` if data is in (z, (ch,) x, y) order, otherwise (ch, x, y, z) is assumed. z as first axis is slightly faster when loading data but for the actual training it is indifferent.
border_mode	!	`string`	“crop”	Only applicable for img-scalar. If the CNN does not allow the original size of the images the following options are available: “crop”: cut the images to the next smaller valid input size, “0-pad” pad to the next bigger valid input with zeros, “c-pad” pad to the next bigger input with the average value of the border, “mirror” and “reject” which throws an exception.
upright_x		`Bool`	False	If true, mirroring is only applied horizontally (e.g. for outdoor images or handwriting)
downsample_xy		`Bool`/`int`	False	Down-sample training data in x/y direction by this factor (or not at all if False)
preview_data_path		`string`/`None`	None	Path to a h5-file that contains data to make preview prediction images. The h5 data set must contain a one or more image cubes (normalised between 0 and 255) in the shape ((ch,) x,y,z). If `None` no previews are made.
preview_kwargs		`dict`		Specification of preview to create, see `elektronn.training.trainer.Trainer.previewSlice()`, only specify `export_class` and `max_z_pred`, the `number` is generated automatically.

A series of 3 preview predictions and the corresponding raw image. Note that the prediction of previews might consume a lot of time; for complex CNNs (with a lot of poolings) only use small preview images.

Alternative / vect-scalar Data Options¶

These replace the options from the image section, and import a data class from elektronn.training.TrainData.

Name	Type	Default	Explanation
data_class_name	`string`	`None`	Name of data class in `elektronn.training.TrainData` or `tuple` for implementation in user file (<file_path>, <class_name_in_file>) e.g. `('~/MyData.py', 'MyClass')`
data_load_kwargs	`dict`	`dict()`	Arguments to init data class
data_batch_kwargs	`dict`	`dict()`	Arguments for `getbach` method of data class (e.g. special augmentations). The batch_size argument is added automatically and needn’t be specified here

Optimisation Options¶

Name	Mode	Type	Default	Explanation
n_steps	!	`int`	`undefined`	Number of maximal update steps
max_runtime	!	`int`	`undefined`	Maximal training time in seconds, may lead to termination before `n_steps`. Measured is the total time including batch creation and performance estimates
history_freq	!	list of 1 `int` (!)	[2000]	Every `history_freq` training steps several values (NLL, training error, validation error if available etc.) are calculated and stored in an internal hisotry file. If the corresponding options are activated these values are also printed and plots are created.
monitor_batch_size		`int`	10	Number of patches to test model for online performance estimation (on training set and if available on validation set)
weight_decay	$	`bool` or `float`	`False`	L2-penalty on weights with this weight relative to the gradient of the loss. `False` is equal to 0.0
class_weights		list of `float`/`None`	`None`	Importance weights for the classes (must have length `n_lab`), will be normalised internally. Weighting disabled by `None`.
label_prop_thresh		`None` or `float` (0.5,1)	`None`	This threshold allows unsupervised label propagation (only for examples with negative/ignore labels).If the predictive probability of the most likely class exceeds the threshold, this class is assumed to be the correct label and the training is pushed in this direction. Should only be used with pre-trained networks, and values <= 0.5 are disabled. `None` disables this option.
optimizer		`string`	‘SGD’	Select ‘SGD’/’CG’/’RPORP’/’LBFGS’ as optimiser method for training
LR_decay		`float`	0.993	Decay multiplier for SGD learning rate w.r.t to an interval of 1000 update steps
LR_schedule		List of tuples /`None`	None	At the specified iteration steps the LR is set to the specified value. This is independent of the decay. Each entry in the list is of format (#iteration, new_LR) e.g. `[(80000, 0.04), (50000, 0.001)]`

Note

Regarding history_freq: If the training or validation errors are estimated on many examples (monitor_batch_size) this might take a while, therefore if you plan to train for 24 hours you should not create an output every 10 seconds but rather every 30 minutes (values 2000 to 5000). But for debugging and checking if a new training case works, it might be usefull to get several plots per minute (values 20 to 200) and use fewer monitor examples. If you know it works, you can raise the value online using the CNN console via ctrl+c. Although this parameter is scalar it is a list for internal reasons.

Optimiser Hyperparameters¶

Name	Mode	Type	Default	Explanation
SGD_params	!	dict	see file	Initial learning rate and momentum for SGD
RPROP_params	$	dict	see file	see code
CG_params	$	dict	see file	Keys: ‘n_steps’: update steps per same batch 3 <–> 6, ‘alpha’: termination criterion of line search, must be <= 0.35, ‘beta’: precision of line search, imprecise 0.5 <–> 0.9 precise, ‘max_step’/’min_step’: similar to learning rate in SGD 0.1 <–> 0.001.
LBFGS_params	$	dict	see file	see code and here

Running elektronn-train ¶

Once the parameter file is set up, the training script can be started. Run the script elektronn-train from command line:

elektronn-train </path/to_config_file> [ --gpu={auto|false|<int>}]

or from an existing python interpreter (e.g. within spyder).

Note

Using false as gpu arguments means a fallback to the configured device in .theanorc (which might be CPU). Otherwise it is advisable give the number of of the target GPU directly as the automatic selection of a free GPU might not work for all drivers (it looks up the power state using nvidia-smi). If the system has only one GPU its number is 0.

CNN Command Line ¶

During training various changes to the setup can be made using the console which is accessible via ctrl+c:

ELEKTRONN MENU
==============

    >> MNIST <<
    Shortcuts:
    'q' (leave menu),                   'abort' (saving params),
    'kill'(no saving),                  'save'/'load' (opt:filename),
    'sf'/' (show filters)',     'smooth' (smooth filters),
    'sethist <int>',                    'setlr <float>',
    'setmom <float>' ,                  'params' print info,
    Change Training Optimizer :('SGD','CG', 'RPROP', 'LBFGS')
    For everything else enter a command in the command line

mfk@ELEKTRONN:

The following manipulations are possible:

Typing any of the above keywords (with optional arguments) and press Enter
“Free” input without parenthesis is translated to printing the value of the variable by that name, e.g.:

>>> mfk@ELEKTRONN: cnn.input_shape
(50, 1, 26, 26)

“Free” input with parenthesis is translated to executing that command literally e.g.

>>> mfk@ELEKTRONN: cnn.setDropoutRates([0.5, 0.5, 0.5, 1.0])

If the return value of a function/method is to be printed, print must be added explicitly (otherwise it is just executed):

>>> mfk@ELEKTRONN: print cnn.SGD_LR.get_value()
>>> 0.00995

Value assignments and variable instantiation are possible, too
The command line resides within the scope of the training loop (run method) of elektronn.training.trainer.Trainer the and has access to:
- The trainer object by self
- An instance of elektronn.net.convnet.MixedConvNN() by cnn
- An instance of elektronn.training.config.Config() by config
- An instance of elektronn.training.CNNData.CNNData() by data

The purpose of the command line is to allow the change of meta-parameters during training and to allow the inspection of the state of variables/parameters.

A particular useful function for debugging gives a visual output of the training examples presented to the CNN:

>>> data, label, info1, info2 = self.debugGetCNNBatch()

This fetches a in exactly the same way as is done for every training iteration. The info variables are for internal use when lazy labels are active, and not of interest here. Besides returning the data and label array this function also prints an image into the save directory name debugGetCNNBatch.png:

Left: the input data. Centre: the labels, note the large offset, this CNN has a very larger field of view it needs 63 pixels on either side to make a prediction for the central pixel. Right: overlay of data with labels, here you can check whether they are properly registered.

For 3D CNNs the image shows a slice along the z-axis of the data.

Note

Some parameters cannot be changed or their change has no effect. This is mainly true for all properties that are hard-compiled into the theano functions like the network architecture (e.g. number of neurons per layer).

Table Of Contents

Previous topic

Next topic

This Page

Using the Training Pipeline¶

How it works ¶

Data Format ¶

Offsets for img-img Training¶

Configuration of Parameters ¶

General Setup¶

Paths and General¶

Network Architecture¶

General¶

Convolutional¶

Multi Layer Perceptron (MLP) and Others¶

Data Options¶

General¶

Images/CNN¶

Alternative / vect-scalar Data Options¶

Optimisation Options¶

Optimiser Hyperparameters¶

Running elektronn-train ¶

CNN Command Line ¶

Using the Training Pipeline¶

How it works¶

Data Format¶

Offsets for img-img Training¶

Configuration of Parameters¶

General Setup¶

Paths and General¶

Network Architecture¶

General¶

Convolutional¶

Multi Layer Perceptron (MLP) and Others¶

Data Options¶

General¶

Images/CNN¶

Alternative / vect-scalar Data Options¶

Optimisation Options¶

Optimiser Hyperparameters¶

Running elektronn-train¶

CNN Command Line¶

How it works ¶

Data Format ¶

Configuration of Parameters ¶

Running elektronn-train ¶

CNN Command Line ¶