There are different types of statistical models used in analyzing any engineering data. The accuracy of such analysis depends on whether the chosen model is appropriate; choice of a wrong model can lead to serious errors. Some of these equations and statistical models are:
From appropriate equations and statistical tools used in testing the performance and reliability of air compressor. The investigation shows that machinery failures in Indorama Petrochemical Company is as a result of continuous running without inspection, Poor lubrication; and Poor implementation of preventive maintenance programs.
Statistical Methods By Sp Gupta.epub
Associations between MVPA and health outcomes are typically tested without accounting for time spent in competing behaviours [4,5,6]. Although there have been efforts to examine combinations of behaviours concomitantly [7], complementary movement behaviours are usually included as separate variables in a regression model [8, 9]. This more traditional approach has been criticized [10,11,12] for ignoring the co-dependency of movement behaviours over a 24-hour period [10, 13]. Recently, Pedisic addressed this limitation using the Activity Balance conceptual model [10]. He proposed applying a novel statistical method, compositional regression (CoDA), which accounts for the co-dependent nature of these behaviours. In contrast to traditional regression methods, CoDA would enable the calculation of the relative contribution of each behaviour to a health outcome while also accounting for the 24-hour constraint for all behaviours combined.
The development of computational methods to predict three-dimensional (3D) protein structures from the protein sequence has proceeded along two complementary paths that focus on either the physical interactions or the evolutionary history. The physical interaction programme heavily integrates our understanding of molecular driving forces into either thermodynamic or kinetic simulation of protein physics16 or statistical approximations thereof17. Although theoretically very appealing, this approach has proved highly challenging for even moderate-sized proteins due to the computational intractability of molecular simulation, the context dependence of protein stability and the difficulty of producing sufficiently accurate models of protein physics. The evolutionary programme has provided an alternative in recent years, in which the constraints on protein structure are derived from bioinformatics analysis of the evolutionary history of proteins, homology to solved structures18,19 and pairwise evolutionary correlations20,21,22,23,24. This bioinformatics approach has benefited greatly from the steady growth of experimental protein structures deposited in the Protein Data Bank (PDB)5, the explosion of genomic sequencing and the rapid development of deep learning techniques to interpret these correlations. Despite these advances, contemporary physical and evolutionary-history-based approaches produce predictions that are far short of experimental accuracy in the majority of cases in which a close homologue has not been solved experimentally and this has limited their utility for many biological applications.
Dietary pattern analysis is a promising approach to understanding the complex relationship between diet and health. While many statistical methods exist, the literature predominantly focuses on classical methods such as dietary quality scores, principal component analysis, factor analysis, clustering analysis, and reduced rank regression. There are some emerging methods that have rarely or never been reviewed or discussed adequately.
This paper presents a landscape review of the existing statistical methods used to derive dietary patterns, especially the finite mixture model, treelet transform, data mining, least absolute shrinkage and selection operator and compositional data analysis, in terms of their underlying concepts, advantages and disadvantages, and available software and packages for implementation.
In the past few decades, statistical methods have emerged that make full use of dietary information collected across populations to create dietary patterns [2, 4, 8]. In nutritional epidemiology studies, regardless of the statistical method used for dietary pattern analysis, the goal is to explore the relationship between dietary patterns and health outcomes [2, 3]. From this perspective, evaluating a method depends not only on whether the dietary patterns derived by the method comprehensively reflect the dietary preferences but also on whether these patterns can predict diseases more accurately and promote health.
The majority of published reviews divide the statistical methods for dietary pattern analysis into three categories: investigator-driven, data-driven, and hybrid methods widely used in nutritional epidemiology [2, 3, 8,9,10]. Additionally, several emerging methods have been applied to dietary pattern analyses that are less often or never reviewed adequately. To demonstrate these methods more clearly, we classify the emerging methods based on the existing categories and add a new category.
This paper provides an updated landscape review of these methods based on the underlying concepts, strengths, limitations, and software packages commonly used while paying particular attention to emerging methods. The subsequent content is introduced from the following aspects: (1) investigator-driven methods, containing various dietary scores and dietary indexes; (2) data-driven methods, comprising PCA, factor analysis, traditional cluster analysis (TCA), FMM, and TT; (3) hybrid methods, consisting of reduced rank regression (RRR), DM, and LASSO; (4) compositional data analysis, including compositional principal component coordinates, balance coordinates and principal balances. To conclude, we compare and evaluate these methods, identify the remaining methodological issues, and provide suggestions for future research.
The dietary guidelines and recommendations used to construct dietary quality scores are primarily based on scientific evidence from health and disease prevention studies. These scores can be used to describe overall dietary characteristics and repeat or compare results across populations. Many dietary quality scores have significant associations with disease and mortality outcomes. The total score is easy to understand and use, and the summing process is simpler than in other statistical methods for dietary pattern analysis.
In nutritional epidemiological studies, data-driven methods refer to the dietary intake patterns derived from population data through data dimensionality reduction techniques. These methods use the existing data collected from food frequency questionnaires, 24-h recall questionnaires, or dietary records to obtain dietary patterns instead of defined dietary guidelines [2, 3, 50].
Unlike EFA, confirmatory factor analysis (CFA) is seldom used in nutritional epidemiology [52]. However, CFA can impose statistical tests on the factor structure and factor loadings of food groups and determine the number of factors and food groups contributing significantly to those factors [2, 8]. In the past, CFA was applied as a second step to verify the goodness of fit and reproducibility of the factor structure of dietary patterns after PCA or EFA in the first step [9, 53, 54]. However, it remains uncertain whether the results are better than those obtained only with EFA [54]. Therefore, several studies have used CFA as a one-step approach to replace PCA or EFA [52, 55]. The advantage of CFA is that a latent variable model can be specified and tested, and additional priori knowledge can also be incorporated into the model [55].
There is no singular method for identifying the number of clusters or an appropriate clustering algorithm [68, 69]. One approach is to combine several methods, that is, based on factor analysis, the appropriate k value and a reasonable initial cluster center are identified by hierarchical clustering to minimize the influence of subjective judgment on the clustering results [68, 70]. The other approach is the optimal clustering method, in which several different k values are tried, and quantitative indicators for these k values are compared to select the optimal value of k [8, 71]. The selection of the clustering algorithm mainly depends on the stability of the clusters and their reproducibility, which are often evaluated by the split-half cross-validation method or classifier [64, 72]. The most appropriate clustering algorithm is the one with the highest reproducibility and stability.
There are, however, a few drawbacks: first, each individual is assigned a cluster with a probability of 1 or 0, without considering the uncertainty of individual classification [73]. Second, the researcher is required to make several subjective decisions, such as the selection of the food groupings, clustering algorithms to determine the similarity of individuals, initial values, and the number of clusters. Although some relatively objective methods for selecting clustering algorithms and the number of clusters exist, the reproducibility of results cannot ensure their validity [64]. Third, there is no convenient method for comparing different clustering criteria [74]. Finally, the use of a control group and the unequal sample size of different clusters will limit the power of the statistical analysis [75].
The choice of k values or models can be transformed into a statistical model selection problem. The final model is then identified according to the maximum Bayes Information Standard after the FMM is fitted by setting different k values or imposing different restrictions on covariance matrixes [78]. The FMM is more flexible than TCA as it can account for the within-class correlation between variables [63], allow the variances of food consumption frequencies to vary within and between clusters, and enable covariate adjustment for food intake (e.g., energy intake and age) simultaneously with the fitting process [74, 77]. 2ff7e9595c
Comments