Handwritten Javanese Script Classification

6 minute read

Published:

Aksara Jawa, or the Javanese Script is the core of writing the Javanese language and has influenced various other regional languages such as Sundanese, Madurese, etc. The script is now rarely used on a daily basis, but is sometimes taught in local schools in certain provinces of Indonesia.

Specific Form of Aksara

The Javanese Script which we will be classifiying is specifically Aksara Wyanjana’s Nglegena, or its basic characters. The list consists of 20 basic characters, without their respective Pasangan characters.

Dataset

Since I have not been able to find a handwritten Javanese Script dataset on the internet, I have decided to contact one of my English highschool teachers who has once showed my class her ability to write Javanese Script. The characters were written on paper, scanned, and edited manually. Credits to Mm. Martha Indrati for the help!

Image Classification

This project is very much inspired from datasets like MNIST and QMNIST which are handwritten digits and is a go-to dataset for starting to learn image classification. The end goal of this project is to be able to create a deep learning model which will be able to classify handwritten Javanese Script to a certain degree of accuracy.

Code

The main framework to be used is fastai-v2, which sits on top of PyTorch. Fastai-v2 is still under development as of the time of this writing, but is ready to be used for basic image classification tasks.

from fastai2.vision.all import *
import torch

Load Data

The data has been grouped per class folder, which we’ll load up and later split into training (70%) and validation (30%) images.

path = Path("handwritten-javanese-script-dataset")

Notice we’re using a small batch size of 5, mainly because we only have 200 images in total.

Here we’ll apply cropping and resizing as transformations to our image since most of the characters do not fully occupy the image size. Additionally, we’ll resize to 128px.

dblock = DataBlock(blocks     = (ImageBlock(cls=PILImageBW), CategoryBlock),
                   get_items  = get_image_files,
                   splitter   = GrandparentSplitter(valid_name='val'),
                   get_y      = parent_label,
                   item_tfms  = [CropPad(90), Resize(128, method=ResizeMethod.Crop)])
dls = dblock.dataloaders(path, bs=5, num_workers=0)
dls.show_batch()

There are only 20 types of characters in the type of Aksara which we’ll be classifying.

dls.vocab
(#20) ['ba','ca','da','dha','ga','ha','ja','ka','la','ma'...]

Model

We’ll be using XResNet50 as the model, which is based on the Bag of Tricks paper and is an “extension” to the ResNet50 architecture. We’ll pass our data, tell which metrics we’d like to observe, utilize LabelSmoothingCrossEntropy, and add MixUp as our callback.

learn = Learner(dls, xresnet50(c_in=1, n_out=dls.c), metrics=accuracy, loss_func=LabelSmoothingCrossEntropy(), cbs=MixUp)

Training Model

With all things in place, let’s finally train the model to learn from the given dataset and predict which class the image belongs to.

learn.lr_find()
SuggestedLRs(lr_min=0.0003019951749593019, lr_steep=6.309573450380412e-07)
learn.fit_one_cycle(30, 3e-4, cbs=SaveModelCallback(monitor='accuracy', fname='best_model'), wd=0.4)
epochtrain_lossvalid_lossaccuracytime
03.0672683.1088270.05000000:04
12.9299082.6693730.33333300:04
22.7691482.2937640.38333300:04
32.5884812.2154390.31666700:04
42.4162482.3240360.28333300:04
52.3244581.9832550.53333300:04
62.1890002.1058890.38333300:04
72.0784792.3508860.33333300:04
81.9223692.8236100.21666700:05
91.7908201.5841890.65000000:05
101.6838531.5096750.58333300:04
111.5987901.5704870.65000000:04
121.5285861.2561490.83333300:04
131.4845081.6235230.56666700:04
141.4372401.3409250.75000000:04
151.3459871.1387850.81666700:05
161.3508911.3702590.71666700:04
171.2975721.4530330.66666700:04
181.3182481.3305220.75000000:04
191.2639311.0238220.90000000:04
201.2472421.0637680.90000000:04
211.2348291.0090320.93333300:05
221.2032680.9683690.95000000:04
231.1787660.9656010.91666700:04
241.1560690.9395990.93333300:04
251.1836930.9435860.93333300:04
261.1660530.9336290.93333300:04
271.1629390.9360140.93333300:04
281.1328830.9367220.93333300:04
291.1387760.9468420.93333300:04
Better model found at epoch 0 with accuracy value: 0.05000000074505806.
Better model found at epoch 1 with accuracy value: 0.3333333432674408.
Better model found at epoch 2 with accuracy value: 0.38333332538604736.
Better model found at epoch 5 with accuracy value: 0.5333333611488342.
Better model found at epoch 9 with accuracy value: 0.6499999761581421.
Better model found at epoch 12 with accuracy value: 0.8333333134651184.
Better model found at epoch 19 with accuracy value: 0.8999999761581421.
Better model found at epoch 21 with accuracy value: 0.9333333373069763.
Better model found at epoch 22 with accuracy value: 0.949999988079071.
learn.recorder.plot_loss()
learn.save('stage-1')

Analyze Results

After training, let’s see how well our model learned. Any incorrect prediction in a random batch will have its label colored red.

learn.show_results()

Instead of only viewing a batch, let’s analyze the results from the entire validation dataset.

interp =  ClassificationInterpretation.from_learner(learn)

This confusion matrix lists all the actual versus predicted labels. The darker the blue on the diagonal line, the better our model is at predicting.

interp.plot_confusion_matrix(figsize=(8,8), dpi=60)

On the other hand, this type of interpretation shows several of the predicted images, what our model thinks it is, and how confident it is with that prediction.

interp.plot_top_losses(9, figsize=(10,9))

Predicting External Images

To see how our model’s regularization fairs, let’s attempt to feed it an external data and see what it predicted.

from PIL import Image
def open_image_bw_resize(source) -> PILImageBW:
    return PILImageBW(Image.open(source).resize((128,128)).convert('L'))

The following character is supposed to be ma and was picked randomly from available images on the internet.

test0 = open_image_bw_resize('test-image-0.jpg')
test0.show()
<matplotlib.axes._subplots.AxesSubplot at 0x1e8960ffaf0>

Feed it through the model and see its output.

learn.predict(test0)[0]
'ma'

Luckily, the model was able to predict the character correctly. To challenge the model even more, I tried to write Javanese Script characters myself and see what the model predicts. Do note that I do not have any background in writing Javanese Scripts, so pardon my skills.

The following character is supposed to be ca.

test1 = open_image_bw_resize('test-image-1.jpg')
test1.show()
<matplotlib.axes._subplots.AxesSubplot at 0x1e895ef6610>
learn.predict(test1)[0]
'ca'

This character is supposed to be wa.

test2 = open_image_bw_resize('test-image-2.jpg')
test2.show()
<matplotlib.axes._subplots.AxesSubplot at 0x1e8c2a21580>
learn.predict(test2)[0]
'ca'

Well that’s an incorrect guess, which is reasonable firstly because of my poor handwriting skills, and secondly the model was trained on a person’s particular style of handwriting - which in this case is my teacher’s. There could be many other factors which caused the incorrect guess, such as overfitting by the model, small dataset and possibly more.

Closing Remarks

There are several possible improvements which could be made, one of which is to increase the variety and the size of the dataset, since the model is only training on a single person’s handwriting. It’ll be better in terms of regularization to add other people’s handwriting into the mix as well.

That’s it for this mini project of mine. Thanks for your time and I hope you’ve learned something!