Sunday, September 14, 2014

MolDifML: capturing and applying molecular similarity with respect to its molecular-pair difference correlating molecular structure and chemical property values

MolDifML is an XML-based standard, currently developed by Axeleratio, for the representation of differences between molecular structures and related properties.

The key concept for designing MolDifML is that a formally recognized and captured difference between two molecules can be associated with differences in respective physical and chemical property values. When considering a substructurally additive (atom or group additive) property, we expect pairs of molecules, assuming they exhibit the same structural difference, to share the same—or approximately the same—difference value for that property. Derivation of such difference values, based on the rapidly growing pool of experimental property data available today, is envisioned for in silico property estimation and virtual compound and materials design as well as informed intermolecular navigation and rational cross-validation of chemical property data.  

Group additivity methods have been known for a diverse spectrum of physicochemical and environmental-risk-related properties for some time: a property value can be calculated or approximated by addition of group contributions (group increments) including all the groups (submolecular parts of a molecule) that constitute a molecule. Instead of addition, a more complex functional computation may apply to predict property values from contributions, including corrections for certain group arrangements and interactions. Group contributions are derived statistically—or via an artificial neural network approach—from a set of molecular compounds, for which experimental values of the property of interest have been published and compiled.

In 1993, Axel Drefahl and Martin Reinhard explored the combination of group additivity and molecular similarity. They developed a linear notation to capture and represent molecular structure differences and demonstrated their formal intermolecular group exchange methodgroup interchange method (GIM) for short—to the systematic comparison and prediction of  logKow for organic compounds [1,2].

Axeleratio is now initiating MolDifML as a standard to store and exchange information on GIM-related chemical compounds and their properties. The present MolDifML Blog has been created to discuss the MolDifML implementation and to receive critical comments and constructive suggestions.

The primary purpose of MolDifML is to present and communicate property information of virtual (not yet synthesized) and data-deficient compounds. A compound- and property-specific MolDifML file will abstract a compound of interest, named query compound, along with a collection of molecularly similar compounds, named candidate compounds. An MolDifML file will encode experimental property data for the candidates and estimated property values for the query, including method and property reference information.

Appropriately generated and maintained MolDifML files are expected to not only enhance chemical search and fill property data gaps, but to provide a robust network of unambiguously interrelated molecular structures for property-based screening and molecular design. 

Keywords: cheminformatics, virtual chemistry, chemical property estimation, molecular difference, XML.

References and more to explore
[1] A. Drefahl and M. Reinhard: Similarity-based Search and Evaluation of Environmentally Relevant Properties for Organic Compounds in Combination with the Group Contribution Approach. J. Chem. Inf. Comput. Sci. 1993, 33, pp. 886-895. DOI: 10.1021/ci00016a011.
[2] Similarity-Based and Group Interchange Models, pp. 16-21, in M. Reinhard and A. Drefahl:  Handbook for Estimating Physicochemical Properties of Organic Compounds. John Wiley & Sons, Inc., New York, 1999.

No comments:

Post a Comment