The CurlySMILES language features predefined and customer-specific
keys in annotation dictionaries.
The latter start with a $ sign, followed
by a customer-chosen name. The predefined keys are listed and explained
in the following. Notice that the predefined (expected) values also can
be overwritten by starting the value string with a
$ character, whenever one prefers to assign
a different value. Customized keys and values are accepted by the
Python CurlySMILES parser and included into the data objects for a notation,
but are not further interpreted.
| Key (ki) |
Meaning and description of associated values
(vi)
|
| a |
atomic symbol |
| b |
bond specification such as da
and dd for a dative bond at an
accepting and donating atom, respectively.
|
| c |
CurlySMILES notation |
| e |
formal electric charge: ...,-3,
-2,
-,
+,
+2,
+3,...
(Notice compatibility with SMILES format, but difference with IUPAC
standard, according to which the charge sign follows the number)
|
| f |
fraction: two integers separated by forward slash |
| i |
pointer to atomic nodes (see for
details)
|
| j |
pointer to atomic nodes at next higher
level (see for example notation of bidentate
μ-H ligand in
heterocyclic complex
|
| n |
number or number range to specify size or length of a structural unit
such as the number of C atoms in an alkyl group or
the number or SRUs in an oligomer |
| p |
pointer to atom positions as substitution sites within component
(format as for i, but excluding
# appendix) |
| r |
ring index in an {!r} annotation
anchored at an ANC with more than one ring digits;
the value of r is the digit
of that ring to which the annotation applies |
| aa |
atomic symbols as comma-separated list |
| cc |
CurlySMILES notations as comma-separated list |
| id |
identifier (may contain letters, digits and round braces)
associated with the annotated subject |
| all |
name of an allotropic modification, can occur in annotations
of chemical element encodings, alternately or in addition key
psy can be used |
| axc |
stereodescriptor of axial chirality:
R and S
(for Ra and Sa)
or
P and M
(see
http://goldbook.iupac.org/A00547.html )
|
| box |
boxdyl,
representing a structural part collapsed into a metaterm (see
ARX201/growth-hormone example)
|
| bra |
0: linear (unbranched) groups,
1: both branched and linear groups
(default setting),
2: branched-only groups |
| cha |
chemical abbreviation (or acronym) for a
chemical name associated with a structure, species or compound
(see DMSO example)
|
| chc |
chemical code for a chemical (see
SCYX-7158 example) |
| chn |
chemical name associated with a structure,
species or compound (see
tiglic_acid example)
|
| cos |
cosolvent(s) encoded in
ConjCN format
(within aq, dp and
ds annotations)
|
| cot |
cosolute(s) encoded in
ConjCN format
(within aq, dp and
ds annotations)
|
| cpq |
copolymer qualifier:
a for alternating, b for block, c for co (generic), g for graft,
p for periodic, r for random and s for statistical |
| cps |
coordination geometry polyhedral symbol: for example
TBPY-5 for trigonal bipyramid of a
mononuclear complex with coordination number 5
(see Table IR-9-2 on page 176 in Nomenclature of Inorganic Chemistry, RSC Publishing, IUPAC Recommendation 2005) |
| csy |
crystal system descriptor:
a (triclinic),
c (cubic),
h (hexagonal),
m (monoclinic),
o (orthorhombic),
r (rhombohedral, trigonal)
t (tetragonal)
|
| ctr |
stereochemical description based on the extended
cis/trans formalism:
c, r and
t for cis-, reference-, and
trans-substituent, respectively
|
| dpr |
degree-of-polymerization range:
an integer number range or an integer following “gt”
(for example, gt250 when the degree of polymerization is greater than 250) |
| emr |
end member and range values
to specify compositional series |
| enz |
short name for an enzyme |
| esa |
stereochemical description of cyclic systems with stereogenic
centers using the endo/exo,syn/anti formalism:
a for anti,
n for endo,
s for syn, and
x for exo
|
| exc |
position integers for atomic nodes that are excluded, for example,
from a structural repeat unit (see
homopolymers)
|
| ful |
fullerene notation: C followed by a
stoichiometric integer, optionally followed by an hyphen and
point group symbol |
| hel |
chiral helicity: P for plus and
M for minus (see
http://goldbook.iupac.org/H02763.html ) |
| ila |
isotopical label notations: specific
isotopes, sets and isotope-based compositions
|
| ilu |
index range(s) to specifiy node set of
ladder unit in polymer notation
|
| inc |
position integers for atomic nodes that are included, for example,
into a structural repeat unit (see
homopolymers)
|
| isp |
isomer due to spin of proton(s): ortho
or para |
| mac |
material class name |
| man |
material name |
| min |
mineral name |
| mps |
multiphase system, represented as
a sequence of
slash-separated ConjCN notations |
| nuc |
nuclide specification of virtual atoms for which
a two-letter atomic symbol does not (yet) exist: value is in
nuclide encoding format based on either
the atomic number or the temporary
three-letter atomic symbol.
|
| par |
partitioning system: comma-separated list of
CurlySMILES notations of the solvent phases between
which an annotated species is distributed |
| pdi |
polydispersity index |
| pep |
peptide notation based on three- or one-letter codes for amino acids |
| pha |
phase specification, in a liquid crystal (lc) annotation:
nem, dis,
smA, smB,
and smC for nematic, discotic,
smectic A, smectic B, and smectic C |
| phn |
phase name specifying polymorphs, used, for example, in
{*TiO2}{crphn=rutile},
{*TiO2}{crphn=anatase}, and
{*TiO2}{crphn=brookite} to
encode the titanium dioxide polymorphs rutile, anatase and brookite;
alternately or in addition key
psi can be used
|
| plm |
short name for a polymer product |
| pro |
short name for a protein |
| psy |
Pearson symbol for phase specification, a three-character
notation: first, a lower-case letter(
a, c,
h, m,
o or t)
designating the crystal system; second, a capital letter
F, I,
P, R
or S) designating the lattice
setting; third, a number designating the number of atoms
or ions in the unit cell
|
| pMm |
mass-average molar mass of polymer |
| pMn |
number-average molar mass of polymer |
| pMp |
peak molar mass of polymer |
| pMv |
viscosity-average molar mass of polymer |
| pMz |
z-average molar mass of polymer |
| rcg |
relative coordination geometry:
cis, trans,
mer, and fac
(applies to square planar and octahedral mononuclear complexes
with only two kinds of ligands, i.e. donor atoms)
|
| sfd |
surface description: Miller indices or other surface notation |
| sfn |
stoichiometric formula notation |
| slt |
solute(s) encoded in
ConjCN format
(for example, within lq, sd
and IM annotations)
|
| slv |
solvent(s) encoded in
ConjCN format
(within dp and ds annotations)
|
| spg |
space group symbol (Hermann-Mauguin notation):
230 space group notations |
| srf |
surface notation: stoichiometric formula notation followed by
a crystallographic plane specification (Miller indices) enclosed
in parentheses |
| tmp |
template specification; example:
[Au]{nltmp=ss-DNA}
for gold nanocluster (nl) templated by
single-stranded DNA (ss-DNA) |
| trd |
trade name |