Vision data helpers#

Module containing functionality for vision tasks such as face recognition.

Classes:

DatasetWrapper

Generic dataset wrapper.

LdColorizer

Transform that colorizes images.

LdTransformation

Base class for label-dependent augmentations.

LdTransformedDataset

Dataset applying label-dependent transformations.

NoisyDequantize

Callable class for injecting noise into binned (e.g.

Quantize

Callable class that quantizes image data.

TorchImageDataset

Image dataset for pytorch.

Functions:

create_celeba_dataset

Create a CelebA dataset object.

create_cmnist_datasets

Create and return colourised MNIST train/test pair.

create_genfaces_dataset

Create a CelebA dataset object.

set_transform

Set the transform of a dataset to the specified transform.

train_test_split

Split a dataset into train and test splits, of sizes dictated by the train percentage.

class DatasetWrapper(*args, **kwargs)#

Bases: torch.utils.data.Dataset

Generic dataset wrapper.

Parameters
  • dataset (ethicml.vision.data.dataset_wrappers.SizedItemGetter) –

  • transform (Optional[Union[Callable[[...], torch.Tensor], Sequence[Callable]]]) –

__len__()#

Get the length of the wrapped dataset.

Return type

int

property transform: Optional[Union[Callable[[...], torch.Tensor], Sequence[Callable]]]#

The transformation(s) to be applied to the data.

class LdColorizer(scale, min_val=0.0, max_val=1.0, binarize=False, background=False, black=True, seed=42, greyscale=False, color_indices=None)#

Bases: ethicml.vision.data.label_dependent_transforms.LdTransformation

Transform that colorizes images.

Colorizes a grayscale image by sampling colors from multivariate normal distributions.

The distribution is centered on predefined means and standard deviation determined by the scale argument.

Parameters
  • min_val (float) – Minimum value the input data can take (needed for clamping). Defaults to 0.

  • max_val (float) – Maximum value the input data can take (needed for clamping). Defaults to 1.

  • scale (float) – Standard deviation of the multivariate normal distributions from which the colors are drawn. Lower values correspond to higher bias. Defaults to 0.02.

  • binarize (bool) – Whether the binarize the grayscale data before colorisation. Defaults to False

  • background (bool) – Whether to color the background instead of the foreground. Defaults to False

  • black (bool) – Whether not to invert the black. Defaults to True.

  • seed (int) – Random seed used for sampling colors. Defaults to 42.

  • greyscale (bool) – Whether to greyscale the colorised images. Defaults to False.

  • color_indices (Optional[List[int]]) – Choose specific colors if you don’t need all 10

class LdTransformation#

Bases: abc.ABC

Base class for label-dependent augmentations.

class LdTransformedDataset(*args, **kwargs)#

Bases: ethicml.vision.data.dataset_wrappers.DatasetWrapper

Dataset applying label-dependent transformations.

Parameters
  • dataset (torch.utils.data.Dataset) –

  • target_dim (int) –

  • discrete_labels (bool) –

  • ld_transform (Optional[Union[Callable[[...], torch.Tensor], Sequence[Callable]]]) –

  • label_independent (bool) –

  • correlation (float) –

__len__()#

Get the length of the wrapped dataset.

Return type

int

property transform: Optional[Union[Callable[[...], torch.Tensor], Sequence[Callable]]]#

The transformation(s) to be applied to the data.

class NoisyDequantize(n_bits_x=8)#

Bases: ethicml.vision.data.transforms.Transformation

Callable class for injecting noise into binned (e.g. image) data.

Createca NoisyQuantize object.

Parameters

n_bits_x (int) –

class Quantize(n_bits_x=8)#

Bases: ethicml.vision.data.transforms.Transformation

Callable class that quantizes image data.

Create Quantize object.

Parameters

n_bits_x (int) –

class TorchImageDataset(*args, **kwargs)#

Bases: torchvision.datasets.VisionDataset

Image dataset for pytorch.

Large-scale CelebFaces Attributes (CelebA) Dataset.

<http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html> Adapted from torchvision.datasets to enable the loading of data triplets and biased/unbiased subsets while removing superfluous (for our purposes) elements of the dataset (e.g. facial landmarks).

Parameters
  • data (ethicml.utility.data_structures.DataTuple) – Data tuple with x containing the filepaths to the generated faces images.

  • root (pathlib.Path) – Root directory where images are downloaded to.

  • transform (Optional[Callable]) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.ToTensor

  • target_transform (Optional[Callable]) – A function/transform that takes in the target and transforms it.

__len__()#

Length (sample count) of the dataset.

Returns

Integer indicating the length of the dataset.

Return type

int

new_sensitive(label)#

Update a dataset and switch to a new sensitive (s) label.

Parameters

label (str) –

Return type

None

new_task(label)#

Update a dataset and switch to a new task (y) label.

Parameters

label (str) –

Return type

None

create_celeba_dataset(root, biased, mixing_factor, unbiased_pcnt, sens_attr_name, target_attr_name, transform=None, target_transform=None, download=False, seed=42, check_integrity=True)#

Create a CelebA dataset object.

Parameters
  • root (str) – Root directory where images are downloaded to.

  • biased (bool) – Wheher to artifically bias the dataset according to the mixing factor. See get_biased_subset() for more details.

  • mixing_factor (float) – Mixing factor used to generate the biased subset of the data.

  • unbiased_pcnt (float) – Percentage of the dataset to set aside as the ‘unbiased’ split.

  • sens_attr_name (Union[Literal['5_o_Clock_Shadow', 'Arched_Eyebrows', 'Attractive', 'Bags_Under_Eyes', 'Bald', 'Bangs', 'Big_Lips', 'Big_Nose', 'Black_Hair', 'Blond_Hair', 'Blurry', 'Brown_Hair', 'Bushy_Eyebrows', 'Chubby', 'Double_Chin', 'Eyeglasses', 'Goatee', 'Gray_Hair', 'Heavy_Makeup', 'High_Cheekbones', 'Male', 'Mouth_Slightly_Open', 'Mustache', 'Narrow_Eyes', 'No_Beard', 'Oval_Face', 'Pale_Skin', 'Pointy_Nose', 'Receding_Hairline', 'Rosy_Cheeks', 'Sideburns', 'Smiling', 'Straight_Hair', 'Wavy_Hair', 'Wearing_Earrings', 'Wearing_Hat', 'Wearing_Lipstick', 'Wearing_Necklace', 'Wearing_Necktie', 'Young'], typing.Dict[str, typing.List[typing.Literal['5_o_Clock_Shadow', 'Arched_Eyebrows', 'Attractive', 'Bags_Under_Eyes', 'Bald', 'Bangs', 'Big_Lips', 'Big_Nose', 'Black_Hair', 'Blond_Hair', 'Blurry', 'Brown_Hair', 'Bushy_Eyebrows', 'Chubby', 'Double_Chin', 'Eyeglasses', 'Goatee', 'Gray_Hair', 'Heavy_Makeup', 'High_Cheekbones', 'Male', 'Mouth_Slightly_Open', 'Mustache', 'Narrow_Eyes', 'No_Beard', 'Oval_Face', 'Pale_Skin', 'Pointy_Nose', 'Receding_Hairline', 'Rosy_Cheeks', 'Sideburns', 'Smiling', 'Straight_Hair', 'Wavy_Hair', 'Wearing_Earrings', 'Wearing_Hat', 'Wearing_Lipstick', 'Wearing_Necklace', 'Wearing_Necktie', 'Young']]]]) – Attribute(s) to set as the sensitive attribute. Biased sampling cannot be performed if multiple sensitive attributes are specified.

  • target_attr_name (Literal['5_o_Clock_Shadow', 'Arched_Eyebrows', 'Attractive', 'Bags_Under_Eyes', 'Bald', 'Bangs', 'Big_Lips', 'Big_Nose', 'Black_Hair', 'Blond_Hair', 'Blurry', 'Brown_Hair', 'Bushy_Eyebrows', 'Chubby', 'Double_Chin', 'Eyeglasses', 'Goatee', 'Gray_Hair', 'Heavy_Makeup', 'High_Cheekbones', 'Male', 'Mouth_Slightly_Open', 'Mustache', 'Narrow_Eyes', 'No_Beard', 'Oval_Face', 'Pale_Skin', 'Pointy_Nose', 'Receding_Hairline', 'Rosy_Cheeks', 'Sideburns', 'Smiling', 'Straight_Hair', 'Wavy_Hair', 'Wearing_Earrings', 'Wearing_Hat', 'Wearing_Lipstick', 'Wearing_Necklace', 'Wearing_Necktie', 'Young']) – Attribute to set as the target attribute.

  • transform (Optional[Callable]) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.ToTensor

  • target_transform (Optional[Callable]) – A function/transform that takes in the target and transforms it.

  • download (bool) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • seed (int) – Random seed used to sample biased subset.

  • check_integrity (bool) – If True, check whether the data has been downloaded correctly.

Return type

ethicml.vision.data.image_dataset.TorchImageDataset

create_cmnist_datasets(*, root, scale, train_pcnt, download=False, seed=42, rotate_data=False, shift_data=False, padding=False, quant_level=8, input_noise=False, classes_to_keep=None)#

Create and return colourised MNIST train/test pair.

Parameters
  • root (str) – Where the images are downloaded to.

  • scale (float) – The amount of ‘bias’ in the colour. Lower is more biased.

  • train_pcnt (float) – The percentage of data to make the test set.

  • download (bool) – Whether or not to download the data.

  • seed (int) – Random seed for reproducing results.

  • rotate_data (bool) – Whether or not to rotate the training images.

  • shift_data (bool) – Whether or not to shift the training images.

  • padding (bool) – Whether or not to pad the training images.

  • quant_level (int) – the number of bins to quantize the data into.

  • input_noise (bool) – Whether or not to add noise to the training images.

  • classes_to_keep (Optional[Sequence[Literal[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]]) – Which digit classes to keep. If None or empty then all classes will be kept.

Returns

tuple of train and test data as a Dataset.

Return type

Tuple[ethicml.vision.data.dataset_wrappers.LdTransformedDataset, ethicml.vision.data.dataset_wrappers.LdTransformedDataset]

create_genfaces_dataset(root, biased, mixing_factor, unbiased_pcnt, sens_attr_name, target_attr_name, transform=None, target_transform=None, download=False, seed=42, check_integrity=True)#

Create a CelebA dataset object.

Parameters
  • root (str) – Root directory where images are downloaded to.

  • biased (bool) – Wheher to artifically bias the dataset according to the mixing factor. See get_biased_subset() for more details.

  • mixing_factor (float) – Mixing factor used to generate the biased subset of the data.

  • sens_attr_name (Literal['gender', 'age', 'ethnicity', 'eye_color', 'hair_color', 'hair_length', 'emotion']) – Attribute(s) to set as the sensitive attribute. Biased sampling cannot be performed if multiple sensitive attributes are specified.

  • unbiased_pcnt (float) – Percentage of the dataset to set aside as the ‘unbiased’ split.

  • target_attr_name (Literal['gender', 'age', 'ethnicity', 'eye_color', 'hair_color', 'hair_length', 'emotion']) – Attribute to set as the target attribute.

  • transform (Optional[Callable]) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.ToTensor

  • target_transform (Optional[Callable]) – A function/transform that takes in the target and transforms it.

  • download (bool) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • seed (int) – Random seed used to sample biased subset.

  • check_integrity (bool) – If True, check whether the data has been downloaded correctly.

Return type

ethicml.vision.data.image_dataset.TorchImageDataset

set_transform(dataset, transform)#

Set the transform of a dataset to the specified transform.

Parameters
  • dataset (torch.utils.data.Dataset) –

  • transform (Any) –

Return type

None

train_test_split(dataset, train_pcnt)#

Split a dataset into train and test splits, of sizes dictated by the train percentage.

Parameters
  • dataset (torch.utils.data.Dataset) –

  • train_pcnt (float) –

Return type

List[torch.utils.data.Subset]