armory-library

Evaluation Configuration

An evaluation completely defines how Armory will evaluate a model. An evaluation is comprised of chains. Each chain is independent, and identifies the dataset input, the perturbations that will be applied to the dataset samples, the model, what metrics will be calculated on the model outputs, and any exports of samples or predictions.

Chains are defined on an evaluation object using the add_chain context.

from armory.evaluation import Evaluation

evaluation = Evaluation(name="...", description="...", author="...")

with evaluation.add_chain("benign") as chain:
    chain.use_dataset(...)
    chain.use_model(...)
    chain.use_metrics(...)
    chain.use_exporters(...)

with evaluation.add_chain("attack") as chain:
    chain.use_dataset(...)
    chain.use_perturbations(...)  # this chain has input perturbations
    chain.use_model(...)
    chain.use_metrics(...)
    chain.use_exporters(...)

When a component is shared between chains, it can be declared a default for the evaluation. If a chain does not specify or override the component inside add_chain, the default component will be applied to the chain.

from armory.evaluation import Evaluation

evaluation = Evaluation(name="...", description="...", author="...")

# Common components
evaluation.use_dataset(...)
evaluation.use_metrics(...)
evaluation.use_exporters(...)
evaluation.use_model(...)

with evaluation.add_chain("benign") as chain:
    pass

with evaluation.add_chain("attack") as chain:
    chain.use_perturbations(...)

See the tracking documentation for additional information about how to automatically track evaluation parameters when defining evaluation chains.

::: armory.evaluation