Project I: DNA Storage

Efficient DNA-Based Image Coding and Storage

Cihan Ruan, Rongduo Han, Yixiao Li, Shan Gao, Haoyu Wu, Nam Ling

2023 The IEEE International Symposium on Circuits and Systems (ISCAS)(Accepted/ Oral) | May 2023

As global data volume explodes, traditional storage systems face multiple challenges, including the lack of resources and low cost-efficient. Deoxyribonucleic acid (DNA) has attracted researchers’ attention to address these issues with its high storage density, low maintenance cost, and extremely long shelf life. Specifically, DNA is also environmentally friendly compared to traditional disk storage because the disk is non-degradable. In this paper, we propose a strategy for image coding and storage based on compressing intra-predicted images from Versatile Video Coding (VVC) using synthetic genomics theories that enable the high throughput storage of images on DNA. We first design the length and format of the DNA oligo for the implementation of a storage system. Then we improve on the LT codes by optimizing the creation of seeds and turning screening into a dynamic context-based DNA bases mapping, which helps to reduce complexity. Last but not least, we design a voting mechanism to achieve the error correction function for robustness. The experimental results show high compression efficiency while achieving several strict biological constraints, such as GC-content balance and homopolymer control.

Project II: Ancient Chinese Handwritten Character Recognition

ACCR: Auto-labeling for Ancient Chinese Handwritten Characters Recognition on CNN

Peikun Wu, Xin Yang, Fuhao Guo, Li Wang, Cihan Ruan

2022 IEEE Visual Communications and Image Processing Conference (VCIP 2022) Oral | Dec 2023

Chinese Character Recognition(CCR) is a critical application of Optical Character Recognition(OCR), a vital area of pattern recognition. Researches on CCR in the past decades mainly focused on the modern Chinese characters, but not on the ancient ones. Compared to modern Chinese characters, ancient characters are more diverse and multiple ancient characters can correspond to one modern character. When doing recognition, the unique features of ancient Chinese characters cause a significant amount of time on manual labeling. This paper proposes an automatic labeling algorithm based on a semi-supervised dictionary training neural network that drastically decreases the human effort. We first created an offline training set as dictionary including 8,226 Chinese characters from ancient documents in modern fonts. And put the set into the network. Then we recursively retrained the network on a unlabeled data set of about 1.3 million characters' images segmented from ancient documents resulted in a very high accuracy rate. This work is one part of our wide recognition of ancient documents with handwritten Chinese characters project.

ASAHmap: An Adaptive Chinese Handwritten Character Segmentation Algorithm for Large-Scale Ancient Handwritten Document Based on Histogram Projection and Gaussian Kernel Convolution Map

Ruiyang Song, Fuhao Guo, Yunchang Wang, Hongqi Han, Jishou Ruan, Cihan Ruan

2023 8th IEEE International Conference on Image, Vision and Computing (ICIVC 2023)【Submitted】

This paper discusses a method for recognizing handwritten Chinese characters on ancient documents using optical character recognition (OCR). The method involves segmentation algorithm based on histogram projection and Gaussian kernel convolution map, which accurately partitions images of characters into sub-images of a single character. The algorithm was tested on a dataset of over one million Chinese handwritten characters and achieved a segmentation accuracy of more than 97.75% in the best-case scenario on four categories of ancient documents. The proposed method provides a lightweight preprocessing method for subsequent work aimed at recognizing ancient Chinese handwritten character documents using a neural network.

--> -->