Vision data helpers#

Module containing functionality for vision tasks such as face recognition.

Classes:

`DatasetWrapper`	Generic dataset wrapper.
`LdColorizer`	Transform that colorizes images.
`LdTransformation`	Base class for label-dependent augmentations.
`LdTransformedDataset`	Dataset applying label-dependent transformations.
`NoisyDequantize`	Callable class for injecting noise into binned (e.g.
`Quantize`	Callable class that quantizes image data.
`TorchImageDataset`	Image dataset for pytorch.

Functions:

`create_celeba_dataset`	Create a CelebA dataset object.
`create_cmnist_datasets`	Create and return colourised MNIST train/test pair.
`create_genfaces_dataset`	Create a CelebA dataset object.
`set_transform`	Set the transform of a dataset to the specified transform.
`train_test_split`	Split a dataset into train and test splits, of sizes dictated by the train percentage.

class DatasetWrapper(*args, **kwargs)#

Bases: torch.utils.data.Dataset

Generic dataset wrapper.

Parameters

dataset (ethicml.vision.data.dataset_wrappers.SizedItemGetter) –
transform (Optional[Union[Callable[[...], torch.Tensor], Sequence[Callable]]]) –

__len__()#

Get the length of the wrapped dataset.

Return type: int

property transform: Optional[Union[Callable[[...], torch.Tensor], Sequence[Callable]]]#: The transformation(s) to be applied to the data.

class LdColorizer(scale, min_val=0.0, max_val=1.0, binarize=False, background=False, black=True, seed=42, greyscale=False, color_indices=None)#

Bases: ethicml.vision.data.label_dependent_transforms.LdTransformation

Transform that colorizes images.

Colorizes a grayscale image by sampling colors from multivariate normal distributions.

The distribution is centered on predefined means and standard deviation determined by the scale argument.

Parameters

min_val (float) – Minimum value the input data can take (needed for clamping). Defaults to 0.
max_val (float) – Maximum value the input data can take (needed for clamping). Defaults to 1.
scale (float) – Standard deviation of the multivariate normal distributions from which the colors are drawn. Lower values correspond to higher bias. Defaults to 0.02.
binarize (bool) – Whether the binarize the grayscale data before colorisation. Defaults to False
background (bool) – Whether to color the background instead of the foreground. Defaults to False
black (bool) – Whether not to invert the black. Defaults to True.
seed (int) – Random seed used for sampling colors. Defaults to 42.
greyscale (bool) – Whether to greyscale the colorised images. Defaults to False.
color_indices (Optional[List[int]]) – Choose specific colors if you don’t need all 10

class LdTransformation#

Bases: abc.ABC

Base class for label-dependent augmentations.

class LdTransformedDataset(*args, **kwargs)#

Bases: ethicml.vision.data.dataset_wrappers.DatasetWrapper

Dataset applying label-dependent transformations.

Parameters

dataset (torch.utils.data.Dataset) –
target_dim (int) –
discrete_labels (bool) –
ld_transform (Optional[Union[Callable[[...], torch.Tensor], Sequence[Callable]]]) –
label_independent (bool) –
correlation (float) –

__len__()#

Get the length of the wrapped dataset.

Return type: int

property transform: Optional[Union[Callable[[...], torch.Tensor], Sequence[Callable]]]#: The transformation(s) to be applied to the data.

class NoisyDequantize(n_bits_x=8)#

Bases: ethicml.vision.data.transforms.Transformation

Callable class for injecting noise into binned (e.g. image) data.

Createca NoisyQuantize object.

Parameters: n_bits_x (int) –

class Quantize(n_bits_x=8)#

Bases: ethicml.vision.data.transforms.Transformation

Callable class that quantizes image data.

Create Quantize object.

Parameters: n_bits_x (int) –

class TorchImageDataset(*args, **kwargs)#

Bases: torchvision.datasets.VisionDataset

Image dataset for pytorch.

Large-scale CelebFaces Attributes (CelebA) Dataset.

<http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html> Adapted from torchvision.datasets to enable the loading of data triplets and biased/unbiased subsets while removing superfluous (for our purposes) elements of the dataset (e.g. facial landmarks).

Parameters

data (ethicml.utility.data_structures.DataTuple) – Data tuple with x containing the filepaths to the generated faces images.
root (pathlib.Path) – Root directory where images are downloaded to.
transform (Optional[Callable]) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.ToTensor
target_transform (Optional[Callable]) – A function/transform that takes in the target and transforms it.

__len__()#

Length (sample count) of the dataset.

Returns: Integer indicating the length of the dataset.
Return type: int

new_sensitive(label)#

Update a dataset and switch to a new sensitive (s) label.

Parameters: label (str) –
Return type: None

new_task(label)#

Update a dataset and switch to a new task (y) label.

Parameters: label (str) –
Return type: None

create_celeba_dataset(root, biased, mixing_factor, unbiased_pcnt, sens_attr_name, target_attr_name, transform=None, target_transform=None, download=False, seed=42, check_integrity=True)#

Create a CelebA dataset object.

Parameters

root (str) – Root directory where images are downloaded to.
biased (bool) – Wheher to artifically bias the dataset according to the mixing factor. See get_biased_subset() for more details.
mixing_factor (float) – Mixing factor used to generate the biased subset of the data.
unbiased_pcnt (float) – Percentage of the dataset to set aside as the ‘unbiased’ split.
sens_attr_name (Union[Literal['5_o_Clock_Shadow', 'Arched_Eyebrows', 'Attractive', 'Bags_Under_Eyes', 'Bald', 'Bangs', 'Big_Lips', 'Big_Nose', 'Black_Hair', 'Blond_Hair', 'Blurry', 'Brown_Hair', 'Bushy_Eyebrows', 'Chubby', 'Double_Chin', 'Eyeglasses', 'Goatee', 'Gray_Hair', 'Heavy_Makeup', 'High_Cheekbones', 'Male', 'Mouth_Slightly_Open', 'Mustache', 'Narrow_Eyes', 'No_Beard', 'Oval_Face', 'Pale_Skin', 'Pointy_Nose', 'Receding_Hairline', 'Rosy_Cheeks', 'Sideburns', 'Smiling', 'Straight_Hair', 'Wavy_Hair', 'Wearing_Earrings', 'Wearing_Hat', 'Wearing_Lipstick', 'Wearing_Necklace', 'Wearing_Necktie', 'Young'], typing.Dict[str, typing.List[typing.Literal['5_o_Clock_Shadow', 'Arched_Eyebrows', 'Attractive', 'Bags_Under_Eyes', 'Bald', 'Bangs', 'Big_Lips', 'Big_Nose', 'Black_Hair', 'Blond_Hair', 'Blurry', 'Brown_Hair', 'Bushy_Eyebrows', 'Chubby', 'Double_Chin', 'Eyeglasses', 'Goatee', 'Gray_Hair', 'Heavy_Makeup', 'High_Cheekbones', 'Male', 'Mouth_Slightly_Open', 'Mustache', 'Narrow_Eyes', 'No_Beard', 'Oval_Face', 'Pale_Skin', 'Pointy_Nose', 'Receding_Hairline', 'Rosy_Cheeks', 'Sideburns', 'Smiling', 'Straight_Hair', 'Wavy_Hair', 'Wearing_Earrings', 'Wearing_Hat', 'Wearing_Lipstick', 'Wearing_Necklace', 'Wearing_Necktie', 'Young']]]]) – Attribute(s) to set as the sensitive attribute. Biased sampling cannot be performed if multiple sensitive attributes are specified.
target_attr_name (Literal['5_o_Clock_Shadow', 'Arched_Eyebrows', 'Attractive', 'Bags_Under_Eyes', 'Bald', 'Bangs', 'Big_Lips', 'Big_Nose', 'Black_Hair', 'Blond_Hair', 'Blurry', 'Brown_Hair', 'Bushy_Eyebrows', 'Chubby', 'Double_Chin', 'Eyeglasses', 'Goatee', 'Gray_Hair', 'Heavy_Makeup', 'High_Cheekbones', 'Male', 'Mouth_Slightly_Open', 'Mustache', 'Narrow_Eyes', 'No_Beard', 'Oval_Face', 'Pale_Skin', 'Pointy_Nose', 'Receding_Hairline', 'Rosy_Cheeks', 'Sideburns', 'Smiling', 'Straight_Hair', 'Wavy_Hair', 'Wearing_Earrings', 'Wearing_Hat', 'Wearing_Lipstick', 'Wearing_Necklace', 'Wearing_Necktie', 'Young']) – Attribute to set as the target attribute.
transform (Optional[Callable]) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.ToTensor
target_transform (Optional[Callable]) – A function/transform that takes in the target and transforms it.
download (bool) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
seed (int) – Random seed used to sample biased subset.
check_integrity (bool) – If True, check whether the data has been downloaded correctly.

Return type

ethicml.vision.data.image_dataset.TorchImageDataset

create_cmnist_datasets(*, root, scale, train_pcnt, download=False, seed=42, rotate_data=False, shift_data=False, padding=False, quant_level=8, input_noise=False, classes_to_keep=None)#

Create and return colourised MNIST train/test pair.

Parameters

root (str) – Where the images are downloaded to.
scale (float) – The amount of ‘bias’ in the colour. Lower is more biased.
train_pcnt (float) – The percentage of data to make the test set.
download (bool) – Whether or not to download the data.
seed (int) – Random seed for reproducing results.
rotate_data (bool) – Whether or not to rotate the training images.
shift_data (bool) – Whether or not to shift the training images.
padding (bool) – Whether or not to pad the training images.
quant_level (int) – the number of bins to quantize the data into.
input_noise (bool) – Whether or not to add noise to the training images.
classes_to_keep (Optional[Sequence[Literal[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]]) – Which digit classes to keep. If None or empty then all classes will be kept.

Returns

tuple of train and test data as a Dataset.

Return type

Tuple[ethicml.vision.data.dataset_wrappers.LdTransformedDataset, ethicml.vision.data.dataset_wrappers.LdTransformedDataset]

create_genfaces_dataset(root, biased, mixing_factor, unbiased_pcnt, sens_attr_name, target_attr_name, transform=None, target_transform=None, download=False, seed=42, check_integrity=True)#

Create a CelebA dataset object.

Parameters

root (str) – Root directory where images are downloaded to.
biased (bool) – Wheher to artifically bias the dataset according to the mixing factor. See get_biased_subset() for more details.
mixing_factor (float) – Mixing factor used to generate the biased subset of the data.
sens_attr_name (Literal['gender', 'age', 'ethnicity', 'eye_color', 'hair_color', 'hair_length', 'emotion']) – Attribute(s) to set as the sensitive attribute. Biased sampling cannot be performed if multiple sensitive attributes are specified.
unbiased_pcnt (float) – Percentage of the dataset to set aside as the ‘unbiased’ split.
target_attr_name (Literal['gender', 'age', 'ethnicity', 'eye_color', 'hair_color', 'hair_length', 'emotion']) – Attribute to set as the target attribute.
transform (Optional[Callable]) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.ToTensor
target_transform (Optional[Callable]) – A function/transform that takes in the target and transforms it.
download (bool) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
seed (int) – Random seed used to sample biased subset.
check_integrity (bool) – If True, check whether the data has been downloaded correctly.

Return type

ethicml.vision.data.image_dataset.TorchImageDataset

set_transform(dataset, transform)#

Set the transform of a dataset to the specified transform.

Parameters

dataset (torch.utils.data.Dataset) –
transform (Any) –

Return type

None

train_test_split(dataset, train_pcnt)#

Split a dataset into train and test splits, of sizes dictated by the train percentage.

Parameters

dataset (torch.utils.data.Dataset) –
train_pcnt (float) –

Return type

List[torch.utils.data.Subset]