Group encoding in CurlySMILES uses the
structural
unit annotation format.
A structural unit annotation consists of a bond symbol
(-, =,
#, :,
& or ~
) enclosed in curly braces. CurlySMILES encoding of a terminal group
requires exactly one such annotation, which is anchored at that atom
of the group which contains the formally open bond.
For example, the methyl (-CH3),
amino (-NH2),
hydroxy (-OH), and fluoro
(-F) group have respective notations
C{-},
N{-},
O{-}, and
F{-}.
As in the original SMILES language, in CurlySMILES the number of
hydrogen atoms attached to a non-hydrogen atom is derived from
normal valence assumptions. An open single, double, and triple
bond "substitutes" one, two, and three hydrogen atoms,
respectively. The notation
N{=} ,
for example, represents an imino group (=NH),
in which each valence of the double bond formally replaces an
hydrogen atom of the parent amine molecule
(NH3).
For atoms that do not belong to the organic subset (B, C, N, O, P,
S, F, Cl, Br, and I) the number of hydrogen atoms is explicitly
specified inside the square brackets. For example, the silyl group
is encoded as [SiH3]{-}.
Formally charged atomic groups are encoded in the same manner:
[NH3+]{-} represents
the ammonium group in a mono-substituted ammonium cation.
|
There are no restrictions on group size. The following examples
illustrate encoding of terminal groups containing more than one
non-hydrogen atom:
|
|
|
C1=CCCC1{-}
|
n1c{-}cc2ccccc2c1
|
Cyclopent-2-enyl group
|
3-Isoquinolyl group
|
|
Non-terminal groups are multiply bonded to other structural units.
CurlySMILES encoding of such groups requires corresponding multiple
structural unit annotations, as demonstrated for
multivalent groups.
|
_
__
__
__
__
Share on Tumblr
___
|