The approach based on atomic pair distribution function (PDF) has revolutionized structural investigations by X-ray/electron diffraction of nano or quasi-amorphous materials, opening up the possibility of exploring short-range order. However, the ab initio crystal structural solution by the PDF is far from being achieved due to the difficulty in determining the crystallographic properties of the unit cell. A method for estimating the crystal cell parameters directly from a PDF profile is presented, which is composed of two steps: first, the type of crystal cell is inferred using machine-learning approaches applied to the PDF profile; second, the crystal cell parameters are extracted by means of multivariate analysis combined with vector superposition techniques. The procedure has been validated on a large number of PDF profiles calculated from known crystal structures and on a small number of measured PDF profiles. The lattice determination step has been benchmarked by a comprehensive exploration of different classifiers and different input data. The highest performance is obtained using the k-nearest neighbours classifier applied to whole PDF profiles. Descriptors calculated from the PDF profiles by recurrence quantitative analysis produce results that can be interpreted in terms of PDF properties, and the significance of each descriptor in determining the prediction is evaluated. The cell parameter extraction step depends on the cell metric rather than its type. Monometric, dimetric and trimetric cells have top-1 estimates that are correct 40, 20 and 5% of the time, respectively. Promising results were obtained when analysing real nanocrystals, where unit cells close to the true ones are found within the top-1 ranked solution in the case of monometric cells and within the top-6 ranked solutions in the case of dimetric cells, even in the presence of a crystalline impurity with a weight fraction up to 40%.
Keywords: crystal cell parameters; crystal lattices; machine learning; multivariate analysis; nanocrystals; pair distribution functions; vector superpositions.
open access.