Thesis Details

Mining of soluble enzymes from genomic databases

Ph.D. Thesis Student: Hon Jiří Academic Year: 2021/2022 Supervisor: Zendulka Jaroslav, doc. Ing., CSc.
Czech title
Dolování rozpustných enzymů z genomických databází

Enzymes are proteins accelerating chemical reactions, which makes them attractive targets for both pharmaceutical and industrial applications. The enzyme function is mediated by several essential amino acids which form the optimal chemical environment to catalyse the reaction. In this work, two integrated bioinformatics tools for mining and rational selection of novel soluble enzymes, EnzymeMiner and SoluProt, are presented.EnzymeMiner uses one or more enzyme sequences as input along with a description of essential residues to search the protein database. The description of essential amino acids is used to increase the probability of similar enzymatic function. EnzymeMiner output is a set of annotated database hits. EnzymeMiner integrates taxonomic, environmental, and protein domain annotations to facilitate selection of promising targets for experiments. The main prioritization criterion is solubility predicted by the second tool being presented, SoluProt. SoluProt is a machine-learning method for the prediction of soluble protein expression in Escherichia coli. The input is a protein sequence and the output is the probability of such protein to be soluble. SoluProt exploits a gradient boosting machine to decide on the output prediction class. The tool was trained on TargetTrack database. When evaluated against a balanced independent test set derived from the NESG database, SoluProt accuracy was 58.5% and its AUC 0.62, slightly exceeding those of a suite of alternative solubility prediction tools. Both EnzymeMiner and SoluProt are frequently used by the protein engineering community to find novel soluble biocatalysts for chemical reactions. These have a great potential to decrease energetic consumption and environmental burden of many industrial chemical processes.


enzyme mining, protein solubility, protein engineering, machine-learning

Degree Programme
22 March 2022
HON, Jiří. Mining of soluble enzymes from genomic databases. Brno, 2021. Ph.D. Thesis. Brno University of Technology, Faculty of Information Technology. 2022-03-22. Supervised by Zendulka Jaroslav. Available from:
    author = "Ji\v{r}\'{i} Hon",
    type = "Ph.D. thesis",
    title = "Mining of soluble enzymes from genomic databases",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2022,
    location = "Brno, CZ",
    language = "english",
    url = ""
Back to top