CurlySMILES notations can have a recursive format: a notation may have annotations containing a dictionary key that require one or more CurlySMILES notations as content. For example, the keys slv, slt, cos and cot, which provide solvent, solute, cosolvent and cosolute information, expect a list of notations that encode solvent, solute, cosolvent and cosolute structure, respectively. Such a list is represented in the conjunct CurlySMILES notation (ConjCN) format. Three formats are possible: list of combinatorial options, list of alternate selections and list of implied composition, encoding “and/or”, “or” and “and” connected SMILES or CurlySMILES notations, respectively.

List of combinatorial options. This is a list of comma-connected SMILES or CurlySMILES notations. The list represents chemical systems including compounds individually and in any possible combination. For example,

OCCCCC,C(O)CCCC,CCC(O)CC

represents 1-, 2-, and 3-pentanol as well as the binary and ternary combinations thereof.

List of alternate selections. This is a list of SMILES or CurlySMILES notations connected by vertical bar characters. For example,

OCCCCC|C(O)CCCC|CCC(O)CC

represents the pool containing 1-, 2-, and 3-pentanol. Within annotation context, this reads as “either with 1-pentanol or with 2-pentanol or with 3-pentanol.”

List of implied composition. This is a list of SMILES or CurlySMILES notations connected by dots. In fact, it is a single notation (already defined in the original SMILES language), consisting of subnotations for chemical species, which do not show a formal bond between the two atoms denoted on either side of a dot, but which may nevertheless interact at the molecular level. For example,

OCCCCC.C(O)CCCC.CCC(O)CC

represents the ternary system 1-pentanol + 2-pentanol + 3-pentanol.

Context targeting example: The ConjCN format allows compact and contextual targeting of topics, concepts and data snippets in the chemical literature. To address the inorganic salt KBr, dissolved in aqueous solutions containing cationic tetradecyltrimethylammonium bromide (TTABr) or anionic sodium dodecyl sulfate (SDS)—for example, studied by Zhou and Hao (doi: 10.1021/je100905g), the following notation applies:

[K+].[Br-]{aqcos=\
CCCCCCCCCCCCCC[N+](C)(C)(C).[Br-]|[Na+].CCCCCCCCCCCCOS(=O)(=O)[O-]}


The SMILES notation of KBr is annotated with marker aq to indicate that KBr occurs in aqueous solution. The two cosolvents of interest (in this case, surfactants) are formally supplied as ConjCN (pair of alternate surfactants) via key cos.
_ __ __ submit to reddit __

__ Share on Tumblr ___ bookmark this page

CurlySMILES Reference

A. Drefahl: CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures . J. Cheminf. 2011, 3:1. doi: 10.1186/1758-2946-3-1


Custom Search