CurlySMILES [1] provides a shortcut formalism to encode a molecule with multiple occurrences of structurally equal substituents. The approach: encoding of the unsubstituted molecule enhanced by an annotation that encodes the substituent and the positions in the molecule where substitution occurs. This approach is illustrated for sym-pentasubstituted corannulenes. The page within the frame below presents a molecular sketch for sym-pentasubstituted corannulenes and the associated publication [2] provides details on the synthesis and properties of diverse derivatives.

We start with the comparison of a SMILES and a corresponding CurlySMILES notation for sym-pentachlorocorannulene:
SMILES notation

CurlySMILES notation
In the CurlySMILES notation the OPAM annotation at position 13 encodes the substituent, Cl{-}, and specifies position 1, 4, 7 and 10 at which substitution takes place in addition to position 13. Since the substituent contains just one atom the CurlySMILES is longer than the SMILES notation. But this situation changes rapidly with increasing size of the substituent, for example with the substituent trimethylsilylethynyl:
(compound 11 in [2])
The annotation-based notation has not only the advantage of string-length reduction, but also clearly separates parent and substituent structure, which significantly simplifies algorithms that perform queries involving parent/derivative screening and filtering.
_ __ __ submit to reddit __

__ Share on Tumblr ___ bookmark this page


[1] A. Drefahl: CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures. J. Cheminf. 2011, 3:1; doi: 10.1186/1758-2946-3-1.
[2] G. H. Grube, E. L. Elliot, R. J. Steffens, C. S. Jones, K. K. Baldridge and J. S. Siegel: Synthesis and Properties of sym-Pentasubstituted Derivatives of Corannulene. Org. Lett. 2003, 5 {5}, doi: 10.1021/ol027565f.

Custom Search