Molecular Dynamics

Molecular Dynamic (MD) is a type of molecular simulation method, which aims to study the dynamic evolution of physical systems through computer simulations of atoms and molecules. Based on MD simulations and statistical mechanics, many macroscopic thermodynamic properties, for instance, free energy or density, can be evaluated. Typically, trajectories of atoms in a simulation are generated by solving Newton’s laws of motion, where the potential energy function \(V\) comes either from force fields or Quantum Mechanic (QM) ab-initio calculations:

\begin{aligned} \dot{p}&=m\ddot{x} = -\frac{\partial V}{\partial x}\\ \end{aligned}

Depending on the smallest indivisible unit during a simulation, MD simulations can be roughly divided into two major categories: All-atom Molecular Dynamics and Coarse-grained Molecular Dynamics (CGMD):

  • All-atom Molecular Dynamics: each individual atom is treated as the smallest indivisible unit for motion and force calculations

  • Coarse-grained Molecular Dynamics: a set of adjacent atoms (such as an amino acid residue, a water molecule) is treated as a coarse grained unit, usually referred as a “bead”. Only interactions between beads are considered, while all intra-bead interactions are neglected during a CGMD. This treatment makes CGMD capable of performing simulations on a larger time scale and for larger physical systems with reduced cost of computation and increased loss of accuracy.

Depending on the accuracy of potential energy functions used during a simulation, MD simulations can be divided into three categories: Classical Molecular Dynamics (Classical MD, cMD), Ab-initio Molecular Dynamics (AIMD) and Machine Learning Molecular Dynamics (MLMD):

  • Classical Molecular Dynamics: potential energy functions of the physical system come from a force field;

  • Ab-initio Molecular Dynamics: potential energy functions of the physical system come from ab-initio calculations;

  • Machine Learning Molecular Dynamics: potential energy functions of the physical system come from a machine learning force field.

Potential Energy Function

Potential Energy Function, usually shortened as “Potential”, refers to the function that is used to describe the energy of interaction within a physical system. In an all-atom MD simulation the potential is a function of the atom types and atomic coordinates within the given physical system, and it could be given by quantum mechanics (QM), molecular mechanics (MM) force fields, or machine learning (ML) force fields.

Force Field

Force Field, conventionally called Molecular Mechanics (MM) Force Field, refers to a collection of empirical functions with fixed mathematical formats to describe the potential energy of the physical system. Parameters for these empirical functions are determined by fitting against experimental data or QM-derived data. Compared to ab-initio methods, MM force fields are less accurate but much faster (usually several magnitudes).


Under the context of classical mechanics, the concept of the Hamiltonian refers to the total energy of a physical system, which is the sum of the potential energy and the kinetic energy of all particles within the given system.

\begin{aligned} H&=\sum_i H_i =\sum_i [\frac{p_i^2}{2m}+V(x_i)]\\ \end{aligned}

In quantum mechanics, the Hamiltonian should be considered as an Hamiltonian operator.

\begin{aligned} \hat{H}=\sum_{i} \frac{\hat{p}^2}{2m_i} + \hat{V} \\ \end{aligned}

Statistical Mechanics

In physics, statistical mechanics is a sub-discipline which applies statistical methods and probability theory to describe large assemblies of microscopic particles so that macroscopic behavior of the physical system (for instance, temperature, pressure) can be related to the behavior of microscopic particles.

State Function

State Function is a physical property to describe the macroscopic property of a physical system. State functions have fixed values for a physical system under certain thermodynamic equilibria and depend only on the current equilibrium state of the system, rather than the path on which the system reaches equilibrium. Examples of State Functions include internal energy, enthalpy, entropy, free energy, etc.


Ensemble is a concept in statistical mechanics, which refers to a collection of a large number of independent systems with identical properties and structures in various motion states under certain macroscopic conditions.

Free Energy

The thermodynamic free energy refers to the energy of a thermodynamic system that can be used to do external work. It can be used as a criterion for whether a thermodynamic process can proceed spontaneously. Under given constraints, the system always tends to transition to a state with low free energy. For example, the process of protein folding is the spontaneous transition from an unfolded state with higher free energy to a folded state with lower free energy. According to the different qualifications, it can be divided into Helmholtz free energy (common notation \(F\)) and Gibbs free energy (common notation \(G\)). Note: free energy is different from potential energy although many people may confuse them.

Boltzmann Distribution

In statistical mechanics, the Boltzmann distribution describes the In statistical mechanics, the Boltzmann distribution describes the probability distribution of particles in a system in possible microscopic quantum states, and has the following form:

\begin{aligned} p_i\propto\exp\left(-\frac{\varepsilon_i}{kT}\right) \\ \end{aligned}

where \(E\) is the quantum state energy, \(k\) is the Boltzmann constant \(T\) is the temperature, \(p_i\) is the probability that the particle is in the \(i\) quantum state, and ε\(_i\) is the energy of the \(i\) quantum state.

Collective Variables (Reaction Coordinates)

The representative parameters that can quantitatively describe the change process of the system are called Collective Variables (CV) or Reaction Coordinates (RC). For example, in the chemical reaction shown in the figure below, the distance between O and C \(d(\mathrm{C-O})\) can be regarded as the reaction coordinate, and the distance between C and Br \(d(\mathrm{Br-C})\) can also be regarded as the reaction coordinate.

Given that the reaction coordinates are well defined, methods such as umbrella sampling can be used to estimate the free energy difference between different reaction coordinates through molecular simulation, and then the free energy change along with the reaction coordinates during the transforming process can be described, which is the basis of kinetic and thermodynamic research.

Slow Degrees of Freedom

In the process of dynamic simulation, some degrees of freedom change rapidly with time (such as bond length, bond angle, etc., usually on the order of fs or ps). And some degrees of freedom change slowly with time (such as the dihedral angle, usually on the order of ns, \(\mu\) s , or even ms).

Enhanced Sampling

Enhanced sampling refers to accelerating the sampling of slow degrees of freedom in the simulation process by some technical means, which are classified as collective variable-based (e.g. umbrella sampling), and collective variable-free (e.g. replica exchange).

Quantum Mechanics

Quantum Mechanics is a branch of physics that studies microscopic systems. By describing the motion and interaction of microscopic particles (such as electrons, protons, etc.), quantum mechanics can explain many experimental phenomena that cannot be explained under the framework of classical mechanics, including blackbody radiation and the spectrum of the hydrogen atom.


Generally, an operator acts on the state space of a physical system, making the physical system transform from one state to another. Within the context of quantum mechanics, the state of a system can be described by a state vector. Physical observables (such as position, momentum, Hamiltonian, etc.) all correspond to a (Hermitian) operator.

Schrödinger Equation

In quantum mechanics, the Schrödinger equation is a partial differential equation that describes the time evolution of the quantum state of a physical system and is the fundamental equation of quantum mechanics. The Schrödinger equation can be divided into two types: the “time-dependent Schrödinger equation”

\begin{aligned} \hat{H}\Psi=i\hbar\frac{\partial}{\partial t}\Psi \\ \end{aligned}

and the “time-independent Schrödinger equation” (also known as the steady-state Schrödinger equation)

\begin{aligned} \hat{H}\Psi&=E\Psi \\ \end{aligned}

where \(\hat{H}\) is Hamiltonian operator, and Ψ is the wave function of the system.

\begin{aligned} \hat{H}&=-\frac{\hbar^2}{2m}\nabla^2+V \\ \end{aligned}

The time-dependent Schrödinger equation describes how the wave function of a quantum system evolves over time, while the time-independent Schrödinger equation describes the physical properties of a stationary quantum system.

First Principle

First Principle, also called ab initio, refers to derivation and calculation based on the basic laws of physics without additional assumptions and empirical fitting. For example, the of use the Schrodinger equation to solve electronic structure.

Wave Function

In quantum mechanics, the state of a quantum system can be described by a wave function. The wave function \(Ψ(r,t)\) is a complex-valued function. According to Bonn’s statistical interpretation, \(\|Ψ\|^2\) is the probability density of finding a particle at position \(r\), time \(t\).

Born-Oppenheimer Approximation

The Born-Oppenheimer approximation refers to the approximate variable separation of the nuclear coordinates and the electron coordinates when solving quantum mechanical equations containing the nucleus and electrons, to decompose the wave function of the whole system into separately solving the nuclear wave function and the electron wave function, which are two relatively simple processes. The basis of this approximation is that the mass of the nucleus is 3 to 4 orders of magnitude larger than that of the electron, and the speed of the nucleus is much smaller than that of the electron, so the electron can be regarded as being in the potential field formed by the stationary nucleus, and the nucleus won’t be affected by the specific position of the electron, only the average force of electrons counts.

Density Functional Theory

Density functional theory (DFT) is a quantum mechanical method to study the electronic structure of multi-electron systems, and it is one of the most commonly used methods in the fields of condensed matter physics and computational chemistry. Since the classical method of electronic structure theory needs to solve the multi-electron wave function with a higher dimension (\(3N\) for a system containing \(N\) electrons), the basic idea of the density function is to use the electron density instead of the wave function as the basic amount of research, thereby reducing the computational complexity. The most common application of density functional theory is implemented with the Kohn-Sham method.


Atomic Orbitals

In Quantum Mechanics, Atomic Orbitals are mathematical functions that describe the wave-like behavior of electrons in atoms. This function can be used to calculate the probability of electrons appearing around the nucleus, and the meaning of “orbital” refers to the probability of electrons appearing in a specific area. According to the “shape” of the track, it can be classified into s, p, d, f, etc.


Electronegativity describes the ability of atoms of an element to attract electrons in a compound. The greater the electronegativity of an element, the stronger the ability of its atoms to attract electrons in the compound. In a period of the periodic table, the electronegativity of the element atom increases from left to right; and it decreases from top to bottom in a group. Therefore, the elements at the upper right of the periodic table (O, N, F, Cl, etc.) have higher electronegativity values. The element with the greatest electronegativity is fluorine.

Chemical Bond

A chemical bond refers to the strong interaction between atoms, ions, and other particles. Through chemical bonds, particles can form polyatomic compounds (such as organic molecules, inorganic molecules, ionic compounds, etc.). Simply put, for a polyatomic system, the most stable configuration between positively charged nuclei and negatively charged electrons is that when electrons are located between nuclei, electrons are attracted between different nuclei, and using this force the nuclei are “attracted” together, forming a chemical bond.

  • Ionic Bond: A chemical bond formed by electrostatic interaction between oppositely charged anions and cations, without directionality, such as sodium chloride (salt), calcium carbonate.

  • Covalent Bond: A chemical bond formed by sharing electron pairs between atoms. Two atoms with similar electronegativity are equally attracted to electrons, so they mainly form chemical bonds by sharing each other’s outer valence electrons. Covalent bonds are directional, resulting in complex molecular structures. For example, in the methane molecule, carbon atoms and hydrogen atoms are connected by covalent bonds to form a regular tetrahedron, the carbon atom is located at the center of the tetrahedron, and the hydrogen atom is located at the vertex of the tetrahedron. According to the number of shared electron pairs, it can also be classified into a single bond, double bond, and triple bond.

  • Hydrogen Bond: When a hydrogen atom forms a covalent bond with an atom with high electronegativity X (usually O, N, F), if it bonds with another atom with high electronegativity. When Y (usually also O, N, F) is close, using hydrogen as the medium between X and Y, a special form of interaction like X-H· · ·Y is generated, known as a hydrogen bond. Hydrogen bonds widely exist in biological macromolecules such as water and proteins and DNA. It plays a crucial role in stabilizing the conformation of biological macromolecules.

Functional Group

Functional groups are atoms or groups of atoms that determine the properties of organic compounds. Common functional groups include hydroxyl (-OH), carboxyl (-COOH), ether bond (C-O-C), carbonyl (C=O), halogen atom (-F, -Cl, -Br, -I), etc.


Aromaticity is a chemical property that exists in cyclic planar molecules co ntaining \(\pi\) bonds composed of delocalized electrons, which can provide molecules with stability that cannot be explained by conjugation alone. The number of electrons in the delocalized \(\pi\) of an aromatic molecule needs to satisfy the Huckel rule (also called the “4n+2” rule). Molecules with aromaticity are called aromatic compounds, and molecules without aromaticity are called aliphatic compounds. Aromatic compounds can be roughly classified into simple aromatic compounds (such as benzene), polycyclic aromatic compounds (such as naphthalene, and anthracene), and heterocyclic compounds (such as pyridine, and pyrrole).


Conformer usually refers to three-dimensional conformation, which refers to the structure that a molecule has in three-dimensional space. For organic molecules, their conformations cannot be randomly generated due to the limitation of the directionality of covalent bonds.


In Organic Chemistry, substances with the same chemical composition (molecular formula) but different structures are called isomers of each other. For example, the compositions of ethanol and dimethyl ether are both \(\mathrm{C_2H_6O}\), but their structures are different:


Stereoisomers refer to molecules in which atoms are topologically connected in the same way but spatial arrangement of the atoms are different. For example, a molecule is likeley to have stereoisomers when it contains carbon atoms to which four different functional groups are bonded. Such atom is called chiral atoms, and usually R/S are denoted to distinguish two different them. In terms of biomolecules, such as peptides, amino acids and sugar, L/D are frequently used to denote different type of stereoisomers. The two amino acid configurations shown in the figure below are stereoisomers of each other. All natural amino acids are in the L configuration, and their carbon atoms are in the S configuration.

Cis-trans Isomerism

Cis-trans isomerism refers to isomerism that occurs due to the hindered free rotation in the compound molecule, which is commonly found in compounds with double bonds or rings.


Tautomerism means the structure of some organic compounds is converted between two functional isomers. Most tautomerisms involve the transfer of hydrogen atoms or protons, and the conversion of single bonds to double bonds. The distribution of tautomers in equilibrium depends on specific factors, including temperature, solvent, and pH, etc. The diagram below shows the keto (left) and enol (right) tautomers present in carbonyl compounds, with the keto structure predominating in the usual case.

Amino Acids

Amino acids are biologically important organic compounds consisting of amino (-NH2) and carboxyl (-COOH) functional groups and side chains attached to each amino acid. Amino acids are the basic units that make up a protein. In nature, there are 20 genetically encoded amino acids.

Protein Structure

Protein structure refers to the spatial structure of a protein biomolecule, which can be divided into four levels to describe different aspects.

  • Primary structure: the linear amino acid sequence that makes up the polypeptide chain of a protein.

  • Secondary structure: a stable structure formed by hydrogen bonds between C=O and N-H groups between different amino acids, mainly \(\alpha\)-helix and β-sheet.

  • Tertiary structure: the three-dimensional structure of a protein molecule is formed by the arrangement of multiple secondary structural elements in three-dimensional space.

  • Quaternary structure: used to describe the interaction of different polypeptide chains (subunits) to form functional protein molecules.


In biochemistry or pharmacology, a ligand refers to a compound that can bind to a receptor and then lead to some physiological effect. In medicinal chemistry, ligands are usually small organic molecules or short peptides composed of several amino acids. The forces between ligands and receptors are usually non-covalent interactions: such as hydrogen bonds, electrostatic interactions, van der Waals interactions, etc.


Signal transduction is responsible for intracellular communication via series of molecular events (protein phosphorylation) upon chemical/physical signal outside cell , where receptor function in the central role as transmit signals outside cells and produce specific effects within cells. It is usually biological macromolecule such as protein. After the receptor binds to a specific stimuli, the structure will change to a certain extent, and the corresponding effect will be induced in the cell. In medicinal chemistry, receptors usually refer to target proteins able to bind with ligands.

Lock and Key Model

The lock-and-key model is a theory proposed by E. Fischer in 1890 to explain the specific binding between enzymes and substrates (or between ligands and receptors). The model believes that the structures of enzymes and substrates at their binding sites should be strictly matched and highly complementary, just like the structural complementarity and matching of a lock and its original key. The disadvantage of this model is that the model treats the structure of the enzyme and the substrate as rigid structures, which is inconsistent with the fact that the conformation of the enzyme and the substrate changes during the catalytic reaction.

Induced Fit Model

The Induced-Fit Model is a model proposed by Koshland in 1958 to describe the enzyme-substrate (ligand-receptor) binding interaction. This model believes that in the process of binding the enzyme to the substrate, the substrate can induce a certain change in the structure of the enzyme, and finally form an active conformation that can bind to the substrate.

Molecular Docking

Molecular Docking is a technique that simulates the interaction between ligands and receptors. The technology predicts ligand binding modes and ligand-receptor binding forces by physically modeling intermolecular interactions and applying optimization algorithms such as the Monte Carlo method.

Reversible Reaction

A reversible reaction is a chemical reaction that can proceed in both the forward and reverse directions under the same conditions. When the degree of the reverse reaction direction is much smaller than that of the forward reaction direction, the reaction can be considered irreversible. Most of the reactions are reversible, such as the dissociation of weak acid/base, ligand-receptor binding, etc.

Chemical Equilibrium

Chemical Equilibrium refers to a state in which the forward and reverse reaction rates of a chemical reaction are equal in a reversible reaction with certain macroscopic conditions, and the concentrations of the reactants and the components of the products do not change. Take the following reaction as an example:

\begin{aligned} \mathrm{aA+bB\rightleftharpoons cC} \end{aligned}

When the equilibrium is reached, the concentrations of \(\mathrm{A,B,C}\) are respectively [\(A\)],[\(B\)],[\(C\)], then the equilibrium constant K can be defined:

\begin{aligned} K&=\frac{[\mathrm{C}]^c}{[\mathrm{A}]^a\mathrm{[B]}^b} \\ \end{aligned}

Given the reaction conditions, the equilibrium constant for a reaction with a fixed stoichiometric ratio is the same, and is related to the free energy change of the reaction as follows:

\begin{aligned} \Delta G=-RT\ln K \\ \end{aligned}

van der Waals force

van der Waals (vdW) force refers to the non-directional, unsaturated, weak interaction force between atoms. Van der Waals interactions are much weaker than chemical bonds, but they will significantly affect the melting point, boiling point, and many other properties. Van der Waals interactions have 3 major contributions:

  • Attractive or replusive interactions are between permanent charges, dipoles, quadrupoles, etc.

  • Induction (also known as polarization), which is the attractive interaction between a permanent multipole on one molecule with an induced multipole on another. This interaction is sometimes called Debye force.

  • Dispersion (usually named London dispersion interactions after Fritz London), which is the attractive interaction between any pair of molecules, including non-polar atoms, arising from the interactions of instantaneous multipoles.

In molecular simulations, van der Waals forces are usually described in terms of the Lanner-Jones potential function, which has the following form:

\begin{aligned} V(r)=\frac{C^6}{r^6}-\frac{C^{12}}{r^{12}} \\ \end{aligned}

Where \(r\) is the distance between two atoms, \(C\) is a parameter, usually obtained by fitting physical quantities such as density and the enthalpy of evaporation.

Hydrophobic interaction

Hydrophobic interaction, also known as a hydrophobic effect, is a chemical phenomenon that which groups with hydrophobicity in an aqueous solution (such as alkyl groups without polarity) are close to each other to reduce the contact area with water. Hydrophobic interactions are the main driver of protein folding.


Thermodynamics focuses on the interaction of heat and work between chemical reactions and system states under the laws of thermodynamics. Generally speaking, the problems (equilibrium state) that do not involve the study of the chemical reaction process belong to the category of chemical thermodynamics, such as phase transition, and the balance of sodium and potassium ions on two sides of the cell membrane.


Kinetics, also known as reaction kinetics and chemical reaction kinetics, is a branch of physical chemistry that studies the rate and mechanism of chemical reactions. Chemical kinetics is different from chemical thermodynamics. It does not care about the equilibrium state, but studies the chemical reaction dynamically, and studies the time required for the transformation of the reaction system, as well as the microscopic process involved.


Cell Biology

A branch of biology that studies the structure, corresponding function and subsequent behaviour of components within a cell.


Solving biological issues with chemical perspectives and techniques.Focusing on the intracellular entities, treating them as chemical blocks and studying the their functionality thus map out the landscape about how life works.

Molecular Biology

Molecular biology studies the composition, structure, function and behaviour of bio-active and/or bio-significant molecules, such as nucleic acids and proteins.


Study heredity in the perspective of elemental blocks from DNA and their temporal/spatial distribution/variation in organism. Originality of diseases (abnormality) and driving force of evolution could be derived from thorough understanding of genetics.


Source of large-scale and comprehensive biological data assembled from Genomics Transcriptomics, Proteomics, Metabonomics, Microbiomics.

Systems biology

Analysis and modeling of complex biological systems based on data acquired by X-omics.

Synthetic biology

Design of new device and circuits based on biological components.


Distribution and determinants of disease in population.


An enzyme is a biological catalyst that is capable of accelerating a specific chemical reaction in cells. The enzyme is not destroyed during the reaction process and could be used again and again (under the sustained condition). In most cases, enzymes are proteins.

Phase I/II biotransformation

metabolism of a drug can be divided into 2 phases. Phase I mainly involves the breakdown (mainly by hydrolysis and oxidation). Phase II mainly involves the conjugation of chemical groups (polar in most cases) to make drug more soluble and suitable for excretion.

Cytochrome P450

A family of key enzymes contain heme as the cofactor to function as mono-oxygenases. It is the typical phase I drug metabolizing enzyme and are involved in so many components’ metabolism from drug and food. They can be easily induced and inhibited by their substrate thus have a outstanding role when studying the drug-drug interaction (DDI). e.g. Patients who are taking Alvastatin are not allowed to eat grapefruit.

Drug targets

Molecules that are intrinsically associated with particular diseases and could be specifically addressed by a drug to take action. Most of the known drug targets are proteins.

Active site

Catalytic center of enzymes that bind substrate(s) and initiate reactions. For enzymes that are proteins, side chains along the backbone of key amino acids constructing the active site, shape it into specific size with specific chemical behavior.

Cofactor/Prosthetic group/Coenzyme

Cofactors are necessary non-peptide components required for enzymes to function properly. Cofactors can either be inorganic metal ions or organic molecules. The assistance of cofactors for enzyme function is achieved by binding to the inactive form of enzyme (apo-enzyme) to produce the catalytic active form (holo-enzyme). A prosthetic group is a type of cofactor that tightly bind to the assisted enzyme and is not easily to be removed. A coenzyme is a specific type of cofactor as they are organic small molecules.

Michaelis-Menten equation

Michaelis–Menten kinetics describe the typical kinetic behaviour of enzymes. The name was given after German biochemist Leonor Michaelis and Canadian physician Maud Menten. The Michaelis–Menten kinetics model describes the rate of enzymatic reactions \(v\) in the form of Michaelis–Menten equation showing bellow:

\begin{aligned} v=\frac{d[P]}{d(t)}={V_{max}}\frac{[S]}+[S]} \\ \end{aligned}

Here, enzyme reaction rate \(v\), the rate of forming product \([P]\), is related with substrate concentration \([S]\) \(V_{max}\) describes the maximum reaction rate achieved by the studied system. It would be reached when the substrate concentration is saturated under a given enzyme concentration. The Michaelis constant \(K_{M}\) is numerically equal to the substrate concentration where half \(V_{max}\) is reached. In most of the enzyme catalyzing single-substrate reactions, their kinetics behaviours are assumed to fit Michaelis-Menten equation, regardless of further assumptions.


Types of enzyme responsible for substrate phosphorylation.

Receptor tyrosine kinase

Tyrosine kinase is a type of kinase for tyrosine phosphorylation. It functions as an “on” or “off” switch in many cellular signalling process. Receptor tyrosine kinase is a subclass of tyrosine kinase that serves as cell surface receptor with high-affinity for many polypeptide growth factors, cytokines, and hormones.

G protein coupled receptors (GPCR)

A large group of evolutionarily-related proteins serve as cell surface receptors to produce cellular response activation upon signal outside cell. The transmembrane domain of GPCRs pass through the cell membrane seven times (typical structure characteristics of GPCRs). Ligands can either bind at extracellular N-terminus and loops or within the transmembrane helices of GPCR. Effective binding of ligand would cause conformational change. Subsequent dissociation of \(\alpha\) subunit from the conjugated G-protein would further facilitate intracellular signal processing.

Catalytic receptor

Type of cell surface protein with the ligand binding site localized at the extracellular surface of the plasma membrane and the functional region possessing catalytic activity on the intracellular face of the plasma membrane. The two parts are linked by a single transmembrane-spanning domain consisting of 20–25 hydrophobic amino acids. It commonly exists and functions as a dimer. Endogenous ligands for catalytic receptor are often peptides or proteins.

Transport protein

A transmembrane protein which function to allow selective passage of specific molecules from the external environment and is able to translocate ions, small molecules, or macromolecules. Transport proteins may be divided into subgroups as channels and carriers.

Carrier protein

Active carrier proteins function in the energy-consumed manner and are able to translocate the substances against concentration gradient. Passive carrier proteins assist the substance by facilitated diffusion.

Ion channel

An ion channel is a type of transmembrane protein that mediates the passage of ions through the membrane. The major differences between ion channels are ion carriers are: (1).high efficiency, usually \(10^6\) per second (or higher); (2).translocation of ions down their electrochemical gradient in an energy conservation way.

Nuclear hormone receptors (NHR)

A class of transcriptional factors to regulate gene expression regulated by their binding ligands. The ligand binding domain (LBD) is capable of recognizing specific ligands to stimulate conformational change (dimerization) of NHR. The DNA binding domain (DBD) mediates the receptor towards its hormone response elements (HRE). DBD functions in the form of a dimer with each monomer recognizing a six base pair sequence of the targeted DNA.


A biological process of protein degradation (intracellularly). The protein is first labelled with a ubiquitin, a 76-amino-acid protein, through a three-step process with help of ubiquitin-activating enzyme (E1), ubiquitin-conjugating enzyme (E2), and ubiquitin-protein ligase (E3), facilitating mono-ubiquitination. The labelled ubiquitin chain could be extended by adding more ubiquitin, resulting in polyubiquitination. The 26S proteasome recognizes the polyubiquitination as a signal to initiate proteolysis and process the protein for degradation.


Huge protein complex to break peptide bonds for unneeded or damaged proteins.

Heat shock proteins (HSP)

Molecular chaperones (proteins) to assist protein functioning in response to stressful conditions (eg. exposure to cold and/or UV light, wound healing etc). HSPs are named according to their molecular weight. HSP90 refers to HSPs which are 90 kilodaltons in size. Ubiquitin (8 kilodaltons) also possess heat shock protein features.


Structural unit for living cell skeletal system. Tubulins are proteins that can be polymerized into long chains or filaments to assemble into microtubules - hollow fibers that serve as cell skeletal system.

Binding Site Detection for Receptors

Not all functional components in our body can be drug targets. However, this doesn’t mean they cannot be modulated. Sometimes they are just too hard to be accessed accurately due to their distribution in tissue or a structural factor, while in other cases inhibition of these components cannot trigger the expected downstream reaction due to intrinsic homeostasis / ignorance of its mechanism. In most cases, orthosteric binding sites ( the pocket to binding endogenous ligand) can be easily determined by sequence / structure alignment. These site may lack selectivity, rendering growing interest in allosteric site detection. (sites not directly binding the endogenous ligand, but modulate its binding behavior) Traditional methods for allosteric site detection rely on MD simulation. See: Investigating Cryptic Binding Sites by Molecular Dynamics Simulations

  • Orthosteric/Allosteric Regulation A protein can have endogenous ligands and protein-protein binding partners. If a drug binds the protein in areas directly involved in endogenous binding, its effect on the protein is called orthosteric regulation. If the drug binds other areas (far away) but can affect the behavior in this area, its effect on the protein is called allosteric regulation. Orthosteric regulation is easier to study: such binding can at least compete with endogenous partner, affecting target behavior. Allosteric regulation is much harder to research, requiring dynamic insight to determine the relationship between orthosteric site and the potential allosteric site.

  • Covalent Regulation Traditionally, a drug molecule binds to the target without a reaction with it. It can bind and dissociate, resulting in a chemical equilibrium. However, some novel types of drugs try to form a chemical bond with the target, binding to them permanently. Giving the obvious Sequelae effect (the drug effect can maintain a long time after the drug’s blood concentration becomes low), this kind of regulation can be both effective and risky.


A branch of physiology raising huge interest recently; studies the immune system of human body.

  • Lymphocyte A type of white blood cell that plays a vital role in immune responses. There two types of lymphocyte: B-cells and T-cells.

  • B-cells and T-cells B-cells are a type of lymphocyte that are able to produce antibodies. T-cells are involved in cell-killing (directly kill the virus-infected cells), immune response amplification (via cytokines, a signal protein secreted from T-cells) and cell memory that enable an organism to respond to the same infection more quickly and efficiently if infection happen again.

  • Antigen and antibody The term antigen originally referred to a substance that may trigger an immune response and serves as a antibody generator. Antibodies (or immunoglobulins) are large, Y-shaped protein secreted from B-cells to recognize and neutralize antigens.

  • Complement system The complement system functions via the cascade involving distinct plasma proteins that react with one another to opsonize pathogens and induce a series of inflammatory responses to fight infection. It works as enhancing and or complementing the effects of antibody activity and is firstly evolved as part of the innate immune system.

  • Cluster of differentiation antigen (CD) Surface proteins on leukocytes, reflecting differentiation stage or activation state of the cell and can be recognized by specific monoclonal antibodies.

  • Epitope Epitope is the antigenic determinant lying on the antigens to simulate immune responses. Binding and subsequent reaction of immune cells and antibodies with antigens is initiated via the recognition of epitope.

  • Antigen-presenting cell (APC), Major histocompatibility complex (MHC) and Human leukocyte antigen (HLA) APCs are cells possessing the ability to present an antigen for T-cell recognition. The heterogeneous group (protein complex) on the APC surface for antigen presentation is called major histocompatibility complex (MHC). There are two type of MHC, class I and class II, differed by structure and expressed cell types. MHC in human is also called human leukocyte antigen (HLA). There is significant work aiming to solve the recognition pattern issues of MHC with presented antigen. AI models have achieved rather ideal accuracy for the prediction task to define whether an antigen (mainly short peptide sequence) could be presented by MHC (thus stimulate the immune reaction from T-cells with much possibility) to design more efficient immune regulators (neoantigen).

  • Cytokines Cytokines are messenger proteins released from immune cells to regulate immune responses. Abnormal activities of cytokines could induce “cytokine storm” which has lethal impact.

Antigenicity and immunogenicity

When a foreign material (antigen) enters, an organism would initiate a barrier system to fight against and eventually eliminate this intruder. Antigenicity describes the ability of an antigen bind to, or interact with the products of the final cell-mediated response (such as B-cell or T-cell receptors). Immunogenicity measures the ability of the antigen to activate the immune response (including innate immune response and the subsequent adaptive (acquired) immune response). Immunogens possess antigenicty, while antigens may not always have immunogenicity. Metal ions are typically haptens, which are antigens, but would not trigger immune responses.

Monoclonal antibody, vaccine and neoantigen

Monoclonal antibodies are engineered antibodies that typically recognize the same epitope, and thus possesses high specificity towards the targeted antigen. Vaccines are the biological preparation containing an agent to initiate the immune responses to form a barrier thus protect the body from certain disease derived from infection. The agent of vaccine resembles the disease-causing microorganism and is often made from weakened or killed forms of the microbe, its toxins, or the surface proteins. Neoantigens are the translation product (protein) of mutated DNA in cancer cells. They are different from the original protein under physiological condition and may thus play a significant role in stimulating immune response against cancer cells.

Prediction and design of protein-protein interaction

Protein-protein interaction (PPI) is the basis for many biological processes to function properly. Specific recognition between the interacting proteins is established on the basis of physical contacts. The forces driving stable/favourable interaction come from electrostatic interaction, hydrogen bonding and/or the hydrophobic effects etc. Based on the forces performed by atom/atom groups, there exist recognition patterns in the aspects of protein sequence as conserved region formed by amino acids that possess similar physicochemical properties have been observed in certain type of PPI. With the understanding of the interaction forces and their corresponding protein sequences, recognition/interaction patterns of PPIs should be reasonably summarized in relation with their biological outcomes. These summarized patterns in forms of models could be further applied for biological effect prediction with the protein sequences as input. Further, one could design functional protein sequences to achieve the desired bio-activity.

Yield, Solubility, Stability of therapeutic Macromolecules

Therapeutic macromolecules are compounds with large molecular weight possessing therapeutics effects and are typically derived from biological processes. The commonly applied therapeutic macromolecules (macromolecular drugs) include peptides, proteins, antibodies, polysaccharides and nucleic acids. Procedures to collect therapeutic macromolecules include bio-synthesis, recombinant protein expression, conjugation and modification etc. In comparison with small-molecule drugs, stable pipeline construction to yield therapeutic macromolecules requires much more effort. It is also worth noting that, as therapeutic macromolecules are typically derived from biological process in organism, some of them possess favourable intrinsic properties such as being lipophilic/hydrophobic and many of them would be easily degraded when recognized by cell metabolizing systems. Thus, enhancing the solubility of therapeutic macromolecules to facilitate desirable distribution properties as well as to sustain the intact entities thus gain stability to occupy therapeutic window wide enough to take action are another important issues for future discovery.


Immuno-therapy functions by activating/mediating/enhancing immune responses of the patients to fight against diseases (cancer).

Cell therapy

Engineered cells (isolated from patients) with therapeutic effects are injected/grafted/implanted (back) into the patient’s body to treat the disease. The most well known cell therapy is CAR-T, where chimeric antigen receptor T cells are genetically engineered to produce an artificial T cell receptor to take action in the way of immunotherapy.



How chemical entities are transferred to medication.


Pharmacokinetics studies how organisms process drugs.

  • Absorption (A): how drugs get into the bloodstream

  • Distribution (D): how drugs are reversibly transferred from one location to another within the body. some drugs tend to concentrate in part of the body like adipose tissue, raising potential risks for clinical usage.

  • Metabolism (M): how drugs are broken down and modified inside the body.

  • Excretion (E): how drugs and their metabolites (a metabolised form of drugs) are removed from the body.

  • Toxicity (T): a pharmacodynamic property of drugs. Since its assessment protocol shares something similar with ADME, they are referred to as a whole in many cases.


Pharmacodynamics studies what a drug does to the body. Pharmacodynamics focus on the molecular, biochemical and physiological effects or actions of the studied drug.

Medicinal Chemistry

Medicinal chemistry refers to designing and synthesizing small molecules as a pharmaceutical agent.

  • Hit-lead-candidate: A hierarchical description to describe the potential precursor of a registered medication entity.

  • Hit: Promising candidates from preliminary screening, typically with a micro-molar EC50 (median effect concentration) values, or top scores from virtual screening.

  • Lead: Candidates demonstrating further potential to become drugs. Lead compounds normally require extensive modification and assessment before becoming a drug candidate.

  • Drug candidate: A well-studied compound, showing sufficient evidence in potency, selectivity, safety, and other drug-like properties. Drug candidates will become registered drugs after thorough clinical trials(usually including thousands of volunteers, tens of years, and investment in the order of billions of dollars)

Synthetic Route Design

Given a promising drug candidate, plan/design of the synthetic routine is performed to find more efficient processes with suitable starting materials to finally yield the product.


Retrosynthesis is a recursion method to design a synthesis pathway for target organic molecules. The target is split into simpler precursors, and the precursor is split in the same manner until the building blocks are commercially available.


Chemical substitutions or groups with similar physical or chemical properties to produce broadly similar biological effects. Bioisostere replacement from one compound to another is mainly applied in the situation where the parent compound is unsuitable for safe use, and/or bio-availability etc while possessing ideal druggability characteristics.

Reaction Output Prediction

Predict the product(s) with given reactants. There are reaction rules and preference reaction sites to be followed and studied in order to achieve this goal. Organic reactions are very dirty: a system can react in a different manner (e.g. substitution reaction & elimination reaction share very similar reagents and reaction condition), and at different sites (e.g. there can be two oxhydryls in one reagent, change the reaction condition can influence the preference during reaction.

Patent recognition/Literature information extraction

Collections of patent filings or literature contain a plethora of information that can guide drug development. On the other hand, novel compounds need to avoid violating existing intellectual property. Chemical patents are quite intricate, requiring sophisticated training and significant time to understand. Thus, automated information extraction via computer vision or natural language processing technology is quite necessary, although the evaluation would be a challenge.

Property Prediction for Ligands

A drug-like ligand requires favourable properties concerning pharmacokinetics issues (which means it can arrive the target properly) and pharmacodynamics issues (which means it can act with the target properly). These properties will be checked closely and separately below. However, medicinal chemists have come up with a few straight-forward and system-independent guidelines for drug design, to filter out risky molecules for downstream development. Below are some classic samples:

  • Lipinski rules of five (RO5) / bRO5: The most classic drug-like guideline. But these rules are being questioned all the time. See Rule of five in 2015 and beyond: Target and ligand structural limitations, ligand chemistry structure and drug discovery project decisions

  • Pan-assay interference compounds (Pan-assay interference compounds) Some molecules can always show a positive signal in high-throughput screening, but so far no one has successfully turned them into registered drugs. Such molecules share some intricate similarity, but how to describe them sharply remains a problem.

  • dosage form: the form drug is marketed for use. e.g. capsule, syrup, injection, etc

  • excipient: inactivate component in a drug product. The purpose of excipients may be to: make the drug stable, soluble, absorbable; change the absorption behavior(e.g. Controlled-release technique); some new excipients are suggested to change the distribution behavior of drug (like liposome).

  • pharmaceutical formulation design: choose the proper dosage form, find out the suitable combination of excipient and their ratio, decide the protocol of manufacturing

  • crystal structure of drugs: the arrangement of the molecules in a crystal. It can affect both physical and chemical (rare) properties of drug product.


Bioactivity refers to the fraction (%) of an administered drug that reaches systemic circulation.


If a protein is suitable to be a drug target. Studies on druggability don’t focus on ’determination of a good target’, but rather on ’how to filter out the unsuitable/difficult targets’. Two ’tangible’ sub-project in this topic: if a target can be modulated by small molecules (containing suitable pocket / covalent modification site / etc for molecule binding); if the inhibition / activation of a target can cause downstream (at least cellular-level) changes (rather than be antagonised and eliminated due to intracellular homeostasis)

Druggability Prediction for Receptors

Define whether certain kind of receptors could serve as a drug target that could be specifically addressed by a drug to take action for certain disease. It is also commonly to detect whether newly identified receptors could be targeted by existing drugs for re-orientated therapeutic purposes (under the circumstances of drug re-purposing/repositioning).


(according to IUPAC) an ensemble of steric and electronic features that is necessary to ensure the optimal supramolecular interactions with a specific biological target and to trigger (or block) its biological response. In short: when we perform a QSAR study, we consider a part of a molecule as a group, and describe it by it’s chemical property (charge, hydrophobicity, aromaticity, steric hindrance, etc). Such group will be ’scored’ and replaced as a whole.

Prediction of structure-activity relationship

structure-activity relationship (SAR) is the relationship between the structure and the biological activity. It tries to answer 2 question: 1.which parts in a bioactive compound / which combination between these parts matters (certain pharmacophore in a certain topology / geometry structure); to modify a molecule according to information gained above (infer a stronger pharmacophore / scaffold to replace the old one)

Molecule Generation

Design a new chemical entity satisfying all demands above (have ideal property, can be synthesised easily, haven’t been patented) is considered as the holy grail of drug discovery. Since the inference of the above properties is still underdeveloped, there is still a long way to go for this ambition. However, today’s development of generative chemistry models can also serve a practical role in settings like library generation (generate at least novel and ’drug-like’ molecule) and conditional design (generate molecule satisfying certain explicit constraint). For more information see in Generative Models for De Novo Drug Design.

Formulation Design

Design of the optimal form of drugs based on the effective compound. Further reading could be referred to: 1.State-of-the-Art Review of Artificial Neural Networks to Predict, Characterize and Optimize Pharmaceutical Formulation; 2.Crystal structures of drugs: advances in determination, prediction and engineering

Regenerative medicine

Regenerative medicine seeks the way to replace the damaged tissues or organs from disease, trauma, or congenital issues, in contrast to the traditional clinical ideas that focus only on alleviating or treating the symptoms.


[1] Jakob Schneider, Ksenia Korshunova, Francesco Musiani, Mercedes Alfonso-Prieto, Alejandro Giorgetti, and Paolo Carloni. Predicting ligand binding poses for low-resolution membrane protein models: Perspectives from multiscale simulations. Biochemical and Biophysical Research Communications, 498(2):366–374, 2018. Multiscale Modeling.