Combining variable selection strategies and machine learning techniques to build QSAR models


  • Priscilla S. S. N. Silverio Post-graduate Program in Bioinformatic, Bioinformatics Multidisciplinary Environment, Federal University of Rio Grande do Norte, Natal, RN, Brazil
  • Jéssika de Oliveira Viana Post-graduate Program in Bioinformatic, Bioinformatics Multidisciplinary Environment, Federal University of Rio Grande do Norte, Natal, RN, Brazil https://orcid.org/0000-0001-6027-1759
  • Euzébio B. Guimarães Post-graduate Program in Bioinformatic, Bioinformatics Multidisciplinary Environment, Federal University of Rio Grande do Norte, Natal, RN, Brazil; Post-graduate Program in Pharmaceutical Sciences, Faculty of Pharmacy, Federal University of Rio Grande do Norte, UFRN, Natal, Brazil




Drug Design, 3D-QSAR; Machine learning, Variable selection


Quantitative Structure-Activity Relationship (QSAR) is a computer-aided technology in the field of medicinal chemistry that seeks to clarify the relationships between molecular structures and their biological activities. Such technologies allow for the acceleration of the development of new compounds by reducing the costs of drug design. This work presents 3D-QSARpy, a flexible, user-friendly and robust tool, freely available without registration, to support the generation of QSAR 3D models in an automated way. The user only needs to provide aligned molecular structures and the respective dependent variable. The current version was developed using Python with packages such as scikit-learn and includes various techniques of machine learning for regression. The diverse techniques employed by the tool is a differential compared to known methodologies, such as CoMFA and CoMSIA, because it expands the search space of possible solutions, and in this way increases the chances of obtaining relevant models. Additionally, approaches for select variables (dimension reduction) were implemented in the tool. To evaluate its potentials, experiments were carried out to compare results obtained from the proposed 3D-QSARpy tool with the results from already published works. The results demonstrated that 3D-QSARpy is extremely useful in the field due to its expressive results.


Download data is not yet available.


Carvalho ACPLF, Faceli K, Lorena AC, Gama J. Inteligência Artificial: Uma abordagem de aprendizado de máquina. Rio de Janeiro: Editora LTC; 2011. p. 378.

Casañola-Martin GM, The HP, Garit JAC, Thu HLT. Atom based linear index descriptors in QSAR-machine learning classifiers for the prediction of ubiquitin-proteasome pathway activity. Med Chem Res. 2018;27(3):695-704.

Cramer RD, Patterson DE, Bunce JD. Comparative Molecular Field Analysis (CoMFA). 1. Effect of Shape on Binding of Steroids to Carrier Proteins. J Amer Chem Soc. 1988;110(18):5959-67.

Cross S, Cruciani G. Molecular fields in drug discovery: getting old or reaching maturity? Drug Disc Today. 2010;15:23-32.

Devinyak O, Lesyk R. 5-Year Trends in QSAR and its Machine Learning Methods. Curr Comp Aided-Drug Design. 2016;12(4):265-71.

Freitas MP, Brown SD, Martins JA. MIA-QSAR: A simple 2D image-based approach for quantitative structure-activity relationship analysis. J Mol Struct. 2005;738(1):149-54.

Ghasemi F, Fassihi A, Pérez‐Sánchez H, Dehnavi MA. The role of different sampling methods in improving biological activity prediction using deep belief network. J Comp Chem. 2017;38(4):195-203.

Goodford PJ. A computational procedure for determining energetically favorable binding sites on biological important macromolecules. J Med Chem. 1985;28:849-57.

Gramatica P, Sangion A. A historical excursus on the statistical validation parameters for QSAR models: a clarification concerning metrics and terminology. J Chem Inf Model. 2016;56(6): 1127-31.

Grisoni F, Consonni V, Todeschini R. Impact of molecular descriptors on computational models. In: Brown J, editor. Computational Chemogenomics. Methods in Molecular Biology. New York: Humana Press; 2018; p. 171-209.

Hansch C. Quantitative approach to biochemical structure-activity relationships. Acc Chem Res. 1969;2(8):232-39.

Huang HJ, Yu HW, Chen CY, Hsu CH, Chen HY, Lee KJ, et al. Current developments of computer-aided drug design. J Taiwan Inst Chem Eng. 2010;41(6):623-35.

Jesus J, Canuto A, Araújo D. Dynamic Feature Selection Based on Pareto Front Optimization. In: 2018 International Joint Conference on Neural Networks (IJCNN). 2018; Rio de Janeiro, p 1-1.

Jesus JKL, Canuto AMP, Araujo DSA. Investigating robustness and stability to noisy data of a dynamic feature selection method. In: Brazilian Conference on Intelligent Systems (BRACIS). 2019, Salvador. 8th Brazilian Conference on Intelligent Systems (BRACIS), p.180-5.

Karki R, Jun KY, Kadayat TM, Shin S, Magar TBT, Bist G, et al. A new series of 2-phenol-4-aryl-6-chlorophenyl pyridine derivatives as dual topoisomerase I/II inhibitors: Synthesis, biological evaluation and 3D-QSAR study. Eur J Med Chem . 2016;113:228-45.

Kausar S, Falcao AO. An automated framework for QSAR model building. J Cheminform. 2018;10(1):1-23.

Klebe G, Abraham U. Comparative Molecular Similarity Index Analysis (CoMSIA) to study hydrogen-bonding properties and to score combinatorial libraries. J Computer-Aided Mol Design. 1999;13(1):1-10.

Kubinyi H, Hamprecht FA, Mietzner T. Three-dimensional quantitative similarity-activity relationships (3D QSiAR) from SEAL similarity matrices. J Med Chem . 1998;41(14):2553-64.

Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, et al. Feature Selection. ACM Comp Surveys. 2017;50(6):1-45.

Martins JPA, Barbosa EG, Pasqualoto KF, Ferreira MM. LQTA-QSAR: A new 4D-QSAR methodology. J Chem Info Model. 2009;49(6):1428-36.

Milletti F, Storchi L, Sforna G, Cruciani G. New and Original pKa Prediction Method Using GRID Molecular Interaction Fields. J Chem Inf Model . 2007;47:2172-81.

Nascimento DS, Bandeira DR, Canuto AM, Ara D. Investigating the Impact of Diversity in Ensembles of Multi-label Classifiers. Proc Int Jt Conf Neural Netw. 2018;1:1-8.

Ooms F. Molecular Modeling and Computer Aided Drug Design. Examples of their Applications in Medicinal Chemistry. Curr Med Chem. 2012;7(2):141-58.

Patel BD, Ghate MD. 3D-QSAR studies of dipeptidyl peptidase-4 inhibitors using various alignment methods. Med Chem Res . 2015;24(3):1060-69.

Rahman MM, Karim MR, Ahsan MQ, Khalipha ABR, Chowdhury MR, Saifuzzaman M. Use of computer in drug design and drug discovery: A review. Inter J Pharm Life Sci. 2012;1(2):1-21.

Rezende SO. Sistemas Inteligentes: Fundamentos e Aplicações. São Paulo: Editora Manole; 2005. p. 525.

Tosco P, Balle T. Open3DQSAR: A new open-source software aimed at high-throughput chemometric analysis of molecular interaction fields. J Mol Model. 2011;17(1):201-8.

Van Rossum G, Drake FL. Python 3 Reference Manual. Scotts Valley, CA: CreateSpace; 2009.

Verma J, Khedkar VM, Coutinho EC. 3D-QSAR in Drug Design-A Review. Curr Top Med Chem. 2010;10(1):95-115.

Wilson GL, Lill MA. Integrating structure-based and ligand-based approaches for computational drug design. Future Med Chem 2011;3(6):735-50.

Wu Z, Zhu M, Kang Y, Leung ELH, Lei T, Shen C, et al. Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets. Brief Bioinform. 2021;22(4):1060-69.

Yu W, Mackerell AD. Computer-Aided Drug Design Methods. Meth Mol Biol. 2017;1520:85-106.

Zhao L, Ciallella HL, Aleksunes LM, Zhu H. Advancing computer-aided drug discovery (CADD) by big data and data-driven machine learning modeling. Drug Discov 2020;25(9):1624-38.


This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001 with reference number 88882.375448/2019-01.






Original Article

How to Cite

3D-QSARpy: Combining variable selection strategies and machine learning techniques to build QSAR models. (2023). Brazilian Journal of Pharmaceutical Sciences, 59. https://doi.org/10.1590/s2175-97902023e22373

Funding data