zero.stream

Smart Python loops.

Stream

class zero.stream.Stream(loader)[source]

Smart wrapper for iterables.

Stream simplifies managing loops, especially in typical deep learning scenarios (it is usually used to wrap train_dataloader or any other data source).

Stream:

  • simplifies management of the “epoch” and “iteration” variables

  • allows to customize the size of epoch

  • allows to change the underlying data loader on the fly

  • enables useful patterns

  • (not implemented: issue) allows to dump and restore loop’s state: epoch, iteration, etc.

Parameters

loader – any kind of iterable (DataLoader, list, iterator, generator, …)

Raises

AssertionError – if loader is not an iterator and is empty

Examples

stream = Stream([0, 1, 2, 3])
stream = Stream(range(10))
import itertools
stream = Stream(itertools.repeat(0))

from torch.utils.data import DataLoader, TensorDataset
dataset = TensorDataset(torch.randn(10, 2))
stream = Stream(DataLoader(dataset, batch_size=3, shuffle=True))

Tutorial

Let’s revise the conventional approach without Stream:

loader = DataLoader(...)
iteration = 0
for epoch in range(n_epoches):
    if need_custom_epoch_size():
        assert False, 'It is possible, but not convenient'

    for x in loader:
        iteration += 1
        print('Epoch:', epoch, 'Iteration:', iteration)
        ...

    if need_new_loader():
        assert False, 'It is possible, but not convenient'

There are several ways how you can use Stream to enhance this loop. Let’s start with creating a stream:

stream = Stream(DataLoader(...))

The dataloader is accessible via Stream.loader. Now, let’s reproduce the loop above:

for epoch in range(n_epoches):
    for x in stream.data():
        print('Epoch:', epoch, 'Iteration:', stream.iteration)

# or

while stream.increment_epoch(n_epoches):
    for x in stream.data():
        print('Epoch:', stream.epoch, 'Iteration:', stream.iteration)

Firstly, we see that Stream.iteration is created and incremented automatically. We also see that while loop can be used instead of more “conventional” for. It brings the following differences:

  • restoring stream’s state via the state_dict mechanism becomes possible

  • terminating the loop by adding more conditions to the while statement becomes possible; for example, with zero.training.ProgressTracker early stopping can look like this:

    while not progress.fail and stream.increment_epoch(n_epoches):
    
  • epoches numeration effectively starts from 1; it is consistent with iterations numeration (also starts from 1)

In order to customize the epoch size, pass the size to Stream.data:

while stream.increment_epoch(n_epoches):
    for x in stream.data(custom_epoch_size):
        ...

Changing the underlying loader on the fly is possible at any moment (even in the middle of epoch) via Stream.set_loader. For example:

while stream.increment_epoch(n_epoches):
    for x in stream.data(custom_epoch_size):
        ...
        if need_new_loader():
            stream.set_loader(new_loader)

Additionally, two new forms of infinite loop become possible:

for x in stream.data(math.inf):
    ...
    if stream.iteration % frequency:
        ...

while True:
    x = stream.next()
    ...
    if stream.iteration % frequency:
        ...

Note

For better technical understanding, keep in mind that Stream simply incapsulates an “infinite iterator” that is constantly moving forward. The behavior is absolutely the same for both finite and infinite iterables and can be expressed with the following loop:

while True:
    for item in loader:  # loader which is passed in the constructor
        ...

Documentation for Stream.next and Stream.data provide helpful examples.

See also

ManualStream: like Stream, but for cases when one logical step (e.g. training step) does not correspond to one iteration.

Stream.iteration

Current iteration.

Stream.epoch

Current epoch.

Stream.loader

The underlying loader.

Stream.increment_epoch([max])

(Try to) increment epoch.

Stream.data([n_items])

Iterate over the loader.

Stream.next()

Get the next item and increment iteration.

Stream.reload_iterator()

Set the underlying iterator to iter(self.loader).

Stream.set_loader(loader)

Set new loader.

ManualStream

class zero.stream.ManualStream(*args, **kwargs)[source]

Like Stream, but with additional fine-graded control.

ManualStream can be useful when one logical step does not correspond to one iteration (for example, you collect data from several iterations to build one training batch). The class inherits from Stream and adds some features (see documentation for details).

ManualStream.mstep

Current manual step.

ManualStream.increment_mstep()

Increment manual step.

ManualStream.data(*[, n_iterations, n_msteps])

Iterate over the loader.