The CurlySMILES language features predefined and customer-specific keys in annotation dictionaries. The latter start with a $ sign, followed by a customer-chosen name. The predefined keys are listed and explained in the following. Notice that the predefined (expected) values also can be overwritten by starting the value string with a $ character, whenever one prefers to assign a different value. Customized keys and values are accepted by the Python CurlySMILES parser and included into the data objects for a notation, but are not further interpreted.

Key (ki) Meaning and description of associated values (vi)
a atomic symbol
b bond specification such as da and dd for a dative bond at an accepting and donating atom, respectively.
c CurlySMILES notation
e formal electric charge: ...,-3, -2, -, +, +2, +3,... (Notice compatibility with SMILES format, but difference with IUPAC standard, according to which the charge sign follows the number)
f fraction: two integers separated by forward slash
i pointer to atomic nodes (see for details)
j pointer to atomic nodes at next higher level (see for example notation of bidentate μ-H ligand in heterocyclic complex
n number or number range to specify size or length of a structural unit such as the number of C atoms in an alkyl group or the number or SRUs in an oligomer
p pointer to atom positions as substitution sites within component (format as for i, but excluding # appendix)
r ring index in an {!r} annotation anchored at an ANC with more than one ring digits; the value of r is the digit of that ring to which the annotation applies
aa atomic symbols as comma-separated list
cc CurlySMILES notations as comma-separated list
id identifier (may contain letters, digits and round braces) associated with the annotated subject
all name of an allotropic modification, can occur in annotations of chemical element encodings, alternately or in addition key psy can be used
axc stereodescriptor of axial chirality: R and S (for Ra and Sa) or P and M (see http://goldbook.iupac.org/A00547.html )
box boxdyl, representing a structural part collapsed into a metaterm (see ARX201/growth-hormone example)
bra 0: linear (unbranched) groups, 1: both branched and linear groups (default setting), 2: branched-only groups
cha chemical abbreviation (or acronym) for a chemical name associated with a structure, species or compound (see DMSO example)
chc chemical code for a chemical (see SCYX-7158 example)
chn chemical name associated with a structure, species or compound (see tiglic_acid example)
cos cosolvent(s) encoded in ConjCN format (within aq, dp and ds annotations)
cot cosolute(s) encoded in ConjCN format (within aq, dp and ds annotations)
cpq copolymer qualifier: a for alternating, b for block, c for co (generic), g for graft, p for periodic, r for random and s for statistical
cps coordination geometry polyhedral symbol: for example TBPY-5 for trigonal bipyramid of a mononuclear complex with coordination number 5 (see Table IR-9-2 on page 176 in Nomenclature of Inorganic Chemistry, RSC Publishing, IUPAC Recommendation 2005)
csy crystal system descriptor: a (triclinic), c (cubic), h (hexagonal), m (monoclinic), o (orthorhombic), r (rhombohedral, trigonal) t (tetragonal)
ctr stereochemical description based on the extended cis/trans formalism: c, r and t for cis-, reference-, and trans-substituent, respectively
dpr degree-of-polymerization range: an integer number range or an integer following “gt” (for example, gt250 when the degree of polymerization is greater than 250)
emr end member and range values to specify compositional series
enz short name for an enzyme
esa stereochemical description of cyclic systems with stereogenic centers using the endo/exo,syn/anti formalism: a for anti, n for endo, s for syn, and x for exo
exc position integers for atomic nodes that are excluded, for example, from a structural repeat unit (see homopolymers)
ful fullerene notation: C followed by a stoichiometric integer, optionally followed by an hyphen and point group symbol
hel chiral helicity: P for plus and M for minus (see http://goldbook.iupac.org/H02763.html )
ila isotopical label notations: specific isotopes, sets and isotope-based compositions
ilu index range(s) to specifiy node set of ladder unit in polymer notation
inc position integers for atomic nodes that are included, for example, into a structural repeat unit (see homopolymers)
isp isomer due to spin of proton(s): ortho or para
mac material class name
man material name
min mineral name
mps multiphase system, represented as a sequence of slash-separated ConjCN notations
nuc nuclide specification of virtual atoms for which a two-letter atomic symbol does not (yet) exist: value is in nuclide encoding format based on either the atomic number or the temporary three-letter atomic symbol.
par partitioning system: comma-separated list of CurlySMILES notations of the solvent phases between which an annotated species is distributed
pdi polydispersity index
pep peptide notation based on three- or one-letter codes for amino acids
pha phase specification, in a liquid crystal (lc) annotation: nem, dis, smA, smB, and smC for nematic, discotic, smectic A, smectic B, and smectic C
phn phase name specifying polymorphs, used, for example, in {*TiO2}{crphn=rutile}, {*TiO2}{crphn=anatase}, and {*TiO2}{crphn=brookite} to encode the titanium dioxide polymorphs rutile, anatase and brookite; alternately or in addition key psi can be used
plm short name for a polymer product
pro short name for a protein
psy Pearson symbol for phase specification, a three-character notation: first, a lower-case letter( a, c, h, m, o or t) designating the crystal system; second, a capital letter F, I, P, R or S) designating the lattice setting; third, a number designating the number of atoms or ions in the unit cell
pMm mass-average molar mass of polymer
pMn number-average molar mass of polymer
pMp peak molar mass of polymer
pMv viscosity-average molar mass of polymer
pMz z-average molar mass of polymer
rcg relative coordination geometry: cis, trans, mer, and fac (applies to square planar and octahedral mononuclear complexes with only two kinds of ligands, i.e. donor atoms)
sfd surface description: Miller indices or other surface notation
sfn stoichiometric formula notation
slt solute(s) encoded in ConjCN format (for example, within lq, sd and IM annotations)
slv solvent(s) encoded in ConjCN format (within dp and ds annotations)
spg space group symbol (Hermann-Mauguin notation): 230 space group notations
srf surface notation: stoichiometric formula notation followed by a crystallographic plane specification (Miller indices) enclosed in parentheses
tmp template specification; example: [Au]{nltmp=ss-DNA} for gold nanocluster (nl) templated by single-stranded DNA (ss-DNA)
trd trade name

Reference

A. Drefahl: CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures . J. Cheminf. 2011, 3:1; doi: 10.1186/1758-2946-3-1 .

Format of an annotation:
{AMk1=v1;k2=v2;...;kn=vn}
where
AM is an annotion marker,
and
ki=vi is a key/value pair.


Custom Search