Image Processing (MNIST Example)
Installing and importing torchvision
along with other necessary libraries.
!pip install torchvision --index-url https://download.pytorch.org/whl/cpu
import turboml as tb
import pandas as pd
from torchvision import datasets, transforms
import io
from PIL import Image
class PILToBytes:
def __init__(self, format="JPEG"):
self.format = format
def __call__(self, img):
if not isinstance(img, Image.Image):
raise TypeError(f"Input should be a PIL Image, but got {type(img)}.")
buffer = io.BytesIO()
img.save(buffer, format=self.format)
return buffer.getvalue()
transform = transforms.Compose(
[
transforms.Resize((28, 28)),
PILToBytes(format="PNG"),
]
)
Data Inspection
Downloading the MNIST dataset to be used in ML modelling.
mnist_dataset_train = datasets.MNIST(
root="./data", train=True, download=True, transform=transform
)
mnist_dataset_test = datasets.MNIST(
root="./data", train=False, download=True, transform=transform
)
images_train = []
images_test = []
labels_train = []
labels_test = []
for image, label in mnist_dataset_train:
images_train.append(image)
labels_train.append(label)
for image, label in mnist_dataset_test:
images_test.append(image)
labels_test.append(label)
Transforming the lists into Pandas DataFrames.
image_dict_train = {"images": images_train}
label_dict_train = {"labels": labels_train}
image_df_train = pd.DataFrame(image_dict_train)
label_df_train = pd.DataFrame(label_dict_train)
image_dict_test = {"images": images_test}
label_dict_test = {"labels": labels_test}
image_df_test = pd.DataFrame(image_dict_test)
label_df_test = pd.DataFrame(label_dict_test)
Adding index columns to the DataFrames to act as primary keys for the datasets.
image_df_train.reset_index(inplace=True)
label_df_train.reset_index(inplace=True)
image_df_test.reset_index(inplace=True)
label_df_test.reset_index(inplace=True)
image_df_train.head()
label_df_train.head()
Using PandasDataset
class for compatibility with the TurboML platform.
images_train = tb.PandasDataset(
dataframe=image_df_train, key_field="index", streaming=False
)
labels_train = tb.PandasDataset(
dataframe=label_df_train, key_field="index", streaming=False
)
images_test = tb.PandasDataset(
dataframe=image_df_test, key_field="index", streaming=False
)
labels_test = tb.PandasDataset(
dataframe=label_df_test, key_field="index", streaming=False
)
Extracting the features and the targets from the TurboML-compatible datasets.
imaginal_fields = ["images"]
features_train = images_train.get_input_fields(imaginal_fields=imaginal_fields)
targets_train = labels_train.get_label_field(label_field="labels")
features_test = images_test.get_input_fields(imaginal_fields=imaginal_fields)
targets_test = labels_test.get_label_field(label_field="labels")
Model Initialization
Defining a Neural Network (NN) to be used on the MNIST data.
The output_size
of the final layer in the NN is 10
in the case of MNIST.
Since this is a classification task, Cross Entropy
loss is used with the Adam
optimizer.
final_layer = tb.NNLayer(output_size=10, activation="none")
model = tb.NeuralNetwork(
loss_function="cross_entropy", optimizer="adam", learning_rate=0.01
)
model.layers[-1] = final_layer
ImageToNumeric PreProcessor
Since we are dealing with images as input to the model, we select the ImageToNumeric PreProcessor
to accordingly convert the binary images into numerical data useful to the NN.
model = tb.ImageToNumericPreProcessor(base_model=model, image_sizes=[28, 28, 1])
Model Training
Setting the model combined with the ImageToNumeric PreProcessor
to learn on the training data.
model = model.learn(features_train, targets_train)
Model Inference
Performing inference on the trained model using the test data.
outputs_test = model.predict(features_test)
outputs_test
Model Testing
Testing the trained model's performance on the test data.
from sklearn import metrics
labels_test_list = labels_test.input_df["labels"].to_list()
print(
"Accuracy: ",
metrics.accuracy_score(labels_test_list, outputs_test["predicted_class"]),
)
print(
"F1: ",
metrics.f1_score(
labels_test_list, outputs_test["predicted_class"], average="macro"
),
)
print(
"Precision: ",
metrics.precision_score(
labels_test_list, outputs_test["predicted_class"], average="macro"
),
)