Vision data helpers#
Module containing functionality for vision tasks such as face recognition.
Classes:
Generic dataset wrapper. |
|
Transform that colorizes images. |
|
Base class for label-dependent augmentations. |
|
Dataset applying label-dependent transformations. |
|
Callable class for injecting noise into binned (e.g. |
|
Callable class that quantizes image data. |
|
Image dataset for pytorch. |
Functions:
Create a CelebA dataset object. |
|
Create and return colourised MNIST train/test pair. |
|
Create a CelebA dataset object. |
|
Set the transform of a dataset to the specified transform. |
|
Split a dataset into train and test splits, of sizes dictated by the train percentage. |
- class DatasetWrapper(*args, **kwargs)#
Bases:
torch.utils.data.Dataset
Generic dataset wrapper.
- Parameters
dataset (ethicml.vision.data.dataset_wrappers.SizedItemGetter) –
transform (Optional[Union[Callable[[...], torch.Tensor], Sequence[Callable]]]) –
- __len__()#
Get the length of the wrapped dataset.
- Return type
int
- property transform: Optional[Union[Callable[[...], torch.Tensor], Sequence[Callable]]]#
The transformation(s) to be applied to the data.
- class LdColorizer(scale, min_val=0.0, max_val=1.0, binarize=False, background=False, black=True, seed=42, greyscale=False, color_indices=None)#
Bases:
ethicml.vision.data.label_dependent_transforms.LdTransformation
Transform that colorizes images.
Colorizes a grayscale image by sampling colors from multivariate normal distributions.
The distribution is centered on predefined means and standard deviation determined by the scale argument.
- Parameters
min_val (float) – Minimum value the input data can take (needed for clamping). Defaults to 0.
max_val (float) – Maximum value the input data can take (needed for clamping). Defaults to 1.
scale (float) – Standard deviation of the multivariate normal distributions from which the colors are drawn. Lower values correspond to higher bias. Defaults to 0.02.
binarize (bool) – Whether the binarize the grayscale data before colorisation. Defaults to False
background (bool) – Whether to color the background instead of the foreground. Defaults to False
black (bool) – Whether not to invert the black. Defaults to True.
seed (int) – Random seed used for sampling colors. Defaults to 42.
greyscale (bool) – Whether to greyscale the colorised images. Defaults to False.
color_indices (Optional[List[int]]) – Choose specific colors if you don’t need all 10
- class LdTransformation#
Bases:
abc.ABC
Base class for label-dependent augmentations.
- class LdTransformedDataset(*args, **kwargs)#
Bases:
ethicml.vision.data.dataset_wrappers.DatasetWrapper
Dataset applying label-dependent transformations.
- Parameters
dataset (torch.utils.data.Dataset) –
target_dim (int) –
discrete_labels (bool) –
ld_transform (Optional[Union[Callable[[...], torch.Tensor], Sequence[Callable]]]) –
label_independent (bool) –
correlation (float) –
- __len__()#
Get the length of the wrapped dataset.
- Return type
int
- property transform: Optional[Union[Callable[[...], torch.Tensor], Sequence[Callable]]]#
The transformation(s) to be applied to the data.
- class NoisyDequantize(n_bits_x=8)#
Bases:
ethicml.vision.data.transforms.Transformation
Callable class for injecting noise into binned (e.g. image) data.
Createca NoisyQuantize object.
- Parameters
n_bits_x (int) –
- class Quantize(n_bits_x=8)#
Bases:
ethicml.vision.data.transforms.Transformation
Callable class that quantizes image data.
Create Quantize object.
- Parameters
n_bits_x (int) –
- class TorchImageDataset(*args, **kwargs)#
Bases:
torchvision.datasets.VisionDataset
Image dataset for pytorch.
Large-scale CelebFaces Attributes (CelebA) Dataset.
<http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html> Adapted from torchvision.datasets to enable the loading of data triplets and biased/unbiased subsets while removing superfluous (for our purposes) elements of the dataset (e.g. facial landmarks).
- Parameters
data (ethicml.utility.data_structures.DataTuple) – Data tuple with x containing the filepaths to the generated faces images.
root (pathlib.Path) – Root directory where images are downloaded to.
transform (Optional[Callable]) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.ToTensor
target_transform (Optional[Callable]) – A function/transform that takes in the target and transforms it.
- __len__()#
Length (sample count) of the dataset.
- Returns
Integer indicating the length of the dataset.
- Return type
int
- new_sensitive(label)#
Update a dataset and switch to a new sensitive (s) label.
- Parameters
label (str) –
- Return type
None
- new_task(label)#
Update a dataset and switch to a new task (y) label.
- Parameters
label (str) –
- Return type
None
- create_celeba_dataset(root, biased, mixing_factor, unbiased_pcnt, sens_attr_name, target_attr_name, transform=None, target_transform=None, download=False, seed=42, check_integrity=True)#
Create a CelebA dataset object.
- Parameters
root (str) – Root directory where images are downloaded to.
biased (bool) – Wheher to artifically bias the dataset according to the mixing factor. See
get_biased_subset()
for more details.mixing_factor (float) – Mixing factor used to generate the biased subset of the data.
unbiased_pcnt (float) – Percentage of the dataset to set aside as the ‘unbiased’ split.
sens_attr_name (Union[Literal['5_o_Clock_Shadow', 'Arched_Eyebrows', 'Attractive', 'Bags_Under_Eyes', 'Bald', 'Bangs', 'Big_Lips', 'Big_Nose', 'Black_Hair', 'Blond_Hair', 'Blurry', 'Brown_Hair', 'Bushy_Eyebrows', 'Chubby', 'Double_Chin', 'Eyeglasses', 'Goatee', 'Gray_Hair', 'Heavy_Makeup', 'High_Cheekbones', 'Male', 'Mouth_Slightly_Open', 'Mustache', 'Narrow_Eyes', 'No_Beard', 'Oval_Face', 'Pale_Skin', 'Pointy_Nose', 'Receding_Hairline', 'Rosy_Cheeks', 'Sideburns', 'Smiling', 'Straight_Hair', 'Wavy_Hair', 'Wearing_Earrings', 'Wearing_Hat', 'Wearing_Lipstick', 'Wearing_Necklace', 'Wearing_Necktie', 'Young'], typing.Dict[str, typing.List[typing.Literal['5_o_Clock_Shadow', 'Arched_Eyebrows', 'Attractive', 'Bags_Under_Eyes', 'Bald', 'Bangs', 'Big_Lips', 'Big_Nose', 'Black_Hair', 'Blond_Hair', 'Blurry', 'Brown_Hair', 'Bushy_Eyebrows', 'Chubby', 'Double_Chin', 'Eyeglasses', 'Goatee', 'Gray_Hair', 'Heavy_Makeup', 'High_Cheekbones', 'Male', 'Mouth_Slightly_Open', 'Mustache', 'Narrow_Eyes', 'No_Beard', 'Oval_Face', 'Pale_Skin', 'Pointy_Nose', 'Receding_Hairline', 'Rosy_Cheeks', 'Sideburns', 'Smiling', 'Straight_Hair', 'Wavy_Hair', 'Wearing_Earrings', 'Wearing_Hat', 'Wearing_Lipstick', 'Wearing_Necklace', 'Wearing_Necktie', 'Young']]]]) – Attribute(s) to set as the sensitive attribute. Biased sampling cannot be performed if multiple sensitive attributes are specified.
target_attr_name (Literal['5_o_Clock_Shadow', 'Arched_Eyebrows', 'Attractive', 'Bags_Under_Eyes', 'Bald', 'Bangs', 'Big_Lips', 'Big_Nose', 'Black_Hair', 'Blond_Hair', 'Blurry', 'Brown_Hair', 'Bushy_Eyebrows', 'Chubby', 'Double_Chin', 'Eyeglasses', 'Goatee', 'Gray_Hair', 'Heavy_Makeup', 'High_Cheekbones', 'Male', 'Mouth_Slightly_Open', 'Mustache', 'Narrow_Eyes', 'No_Beard', 'Oval_Face', 'Pale_Skin', 'Pointy_Nose', 'Receding_Hairline', 'Rosy_Cheeks', 'Sideburns', 'Smiling', 'Straight_Hair', 'Wavy_Hair', 'Wearing_Earrings', 'Wearing_Hat', 'Wearing_Lipstick', 'Wearing_Necklace', 'Wearing_Necktie', 'Young']) – Attribute to set as the target attribute.
transform (Optional[Callable]) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.ToTensor
target_transform (Optional[Callable]) – A function/transform that takes in the target and transforms it.
download (bool) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
seed (int) – Random seed used to sample biased subset.
check_integrity (bool) – If True, check whether the data has been downloaded correctly.
- Return type
- create_cmnist_datasets(*, root, scale, train_pcnt, download=False, seed=42, rotate_data=False, shift_data=False, padding=False, quant_level=8, input_noise=False, classes_to_keep=None)#
Create and return colourised MNIST train/test pair.
- Parameters
root (str) – Where the images are downloaded to.
scale (float) – The amount of ‘bias’ in the colour. Lower is more biased.
train_pcnt (float) – The percentage of data to make the test set.
download (bool) – Whether or not to download the data.
seed (int) – Random seed for reproducing results.
rotate_data (bool) – Whether or not to rotate the training images.
shift_data (bool) – Whether or not to shift the training images.
padding (bool) – Whether or not to pad the training images.
quant_level (int) – the number of bins to quantize the data into.
input_noise (bool) – Whether or not to add noise to the training images.
classes_to_keep (Optional[Sequence[Literal[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]]) – Which digit classes to keep. If None or empty then all classes will be kept.
- Returns
tuple of train and test data as a Dataset.
- Return type
Tuple[ethicml.vision.data.dataset_wrappers.LdTransformedDataset, ethicml.vision.data.dataset_wrappers.LdTransformedDataset]
- create_genfaces_dataset(root, biased, mixing_factor, unbiased_pcnt, sens_attr_name, target_attr_name, transform=None, target_transform=None, download=False, seed=42, check_integrity=True)#
Create a CelebA dataset object.
- Parameters
root (str) – Root directory where images are downloaded to.
biased (bool) – Wheher to artifically bias the dataset according to the mixing factor. See
get_biased_subset()
for more details.mixing_factor (float) – Mixing factor used to generate the biased subset of the data.
sens_attr_name (Literal['gender', 'age', 'ethnicity', 'eye_color', 'hair_color', 'hair_length', 'emotion']) – Attribute(s) to set as the sensitive attribute. Biased sampling cannot be performed if multiple sensitive attributes are specified.
unbiased_pcnt (float) – Percentage of the dataset to set aside as the ‘unbiased’ split.
target_attr_name (Literal['gender', 'age', 'ethnicity', 'eye_color', 'hair_color', 'hair_length', 'emotion']) – Attribute to set as the target attribute.
transform (Optional[Callable]) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.ToTensor
target_transform (Optional[Callable]) – A function/transform that takes in the target and transforms it.
download (bool) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
seed (int) – Random seed used to sample biased subset.
check_integrity (bool) – If True, check whether the data has been downloaded correctly.
- Return type
- set_transform(dataset, transform)#
Set the transform of a dataset to the specified transform.
- Parameters
dataset (torch.utils.data.Dataset) –
transform (Any) –
- Return type
None
- train_test_split(dataset, train_pcnt)#
Split a dataset into train and test splits, of sizes dictated by the train percentage.
- Parameters
dataset (torch.utils.data.Dataset) –
train_pcnt (float) –
- Return type
List[torch.utils.data.Subset]