CurlySMILES, a chemical language

Axel Drefahl · www.axeleratio.com
CurlySMILES is a chemical language for the formulation of linear notations that specify materials, chemical compounds, and complex architectures by compositional and molecular structure encoding. CurlySMILES is a versatile language addressing various applications:
  • Documentation and search of chemical information
  • Formulation of precise, yet grainable chemical queries
  • Computation of material descriptors and molecular descriptors
  • Composition- and structure-based property estimation
  • Measurement of material similarity and molecular similarity
  • Rational material design and molecular design
  • Supervised generation or virtual combinatorial libraries

CurlySMILES notation. A CurlySMILES notation is a string with dot-separated subnotations. For example, cobalt(II) nitrate hexahydrate, Co(NO3)2·6H2O, can be entered as:

[Co+2].[O-]N(=O)=O{2}.O{6}

Here, multipliers 2 and 6, enclosed in curly braces at the end of each species, are applied to account for the number of nitrate anions and water molecules, respectively. The corresponding notation based on the original SMILES language is:

[Co+2].[O-]N(=O)=O.[O-]N(=O)=O.O.O.O.O.O.O

The CurlySMILES language introduces more formats that provide encoding short-cuts; mainly, aliases for frequently occuring notations of cations, anions, and other chemical species.

A string with exactly one subnotation is called a unary CurlySMILES notation. For example, the aromatic tropylium cation, C7H7+, is encoded in CurlySMILES as

c1cccccc1{!re=+}

This example demonstrates the key approach of the CurlySMILES grammar: an annotation enclosed in curly braces. Here, the annotation consists of the ring marker !r, indicating that the following entry, e=+, which specifies a formal charge, applies to the entire ring. The annotation is formally anchored to the last atomic node in the notation. In general, an annotations can either be anchored at an atomic node or attached to a subnotation, to encode details of the respective atom, the structural environment of that atom or the whole molecule.

The CurlySMILES language includes a rich annotation grammar (see CurlySMILES: annotated SMILES notations) that covers a diverse set of structural, substructural and extrastructural aspects of a molecule. Further, the annotation format is open to incorporate customized code by still adhering to the basic syntax of CurlySMILES.

Reference

A. Drefahl: CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures . J. Cheminf. 2011, 3:1. doi: 10.1186/1758-2946-3-1

Please, email comments and suggestions to axeleratio@yahoo.com
Custom Search