A set of molecules can be finite or virtually infinite. In the
latter case we often speak of a molecule or compound class, which
has an uncountable number of members, but within which only a
limited number of members are of interest for most practical purposes.
A very common class type is a homologous series, which is defined by
a root or parent member and in which following class members are
formally generated by successively inserting a bivalent group such
as a methylen group between an already present methylen group
and another group.
Chemists often sketch such sets or classes by simply drawing
one member in a generic manner, using the symbols R, X, and Y
(and others) in the same way they use element symbols. The
following structure contains symbol R, representing an arbitrary
alky group, to define the set of alky n-hexanoates. The
corresponding CurlySMILES encoding uses the annotation
{+R} to formally substitute the H-atom
of the carboxylic acid group:
CCCCCC(=O)O{+R}
Alkyl n-hexanoates
The CurlySMILES annotation format provides various methods to encode
a set of molecules in a more specific or limiting manner.
For example, the annotation entry n=2-10
limits the above set to those molecules that contain alkyl groups
with a C-atom count that is greater or equal to two and lower or equal to ten:
CCCCCC(=O)O{+Rn=2-10}
Ethyl-to-decyl n-hexanoates
This set can further be constrained by excluding branched alkyl groups:
A. Drefahl:
CurlySMILES: a chemical language
to customize and annotate encodings of molecular and
nanodevice structures.
J. Cheminf.2011, 3:1;
doi:10.1186/1758-2946-3-1.