Case Studies

Sample size estimation for bioequivalence trials through simulation

Sample size estimation for clinical trials is challenging when hypothesis tests involve comparisons across multiple endpoints and/or treatments. This situation commonly arises in biosimilar trials, where pharmacokinetic parameters of interest, bioequivalence criteria and reference products may differ between regulatory bodies. We developed the simsamplesize R package to facilitate sample size estimation for Phase 1 randomized bioequivalence trials where multiple hypotheses, treatments and/or correlated endpoints are being investigated. Unlike deterministic approaches commonly used in existing R packages and sample size software, simsamplesize focuses on simulation-based sample size estimations. This enables researchers to address the complexities of evaluating multiple hypotheses across diverse (co-)primary endpoints, as is commonly the case in biosimilar trials.


  • Evaluation of multiple treatment arms
  • Evaluation of multiple (co-)primary endpoints
  • Configuration of distributional assumptions
  • Customization of trial success criteria
  • Adjustment for multiplicity
  • Empirical assessment of power and type-I error
  • Generation of R code & reports
Case Study

A mid sized global biopharmaceutical company (“Sponsor”) wanted to more strategically design its Phase 1 biosmilars program in order to broaden the asset’s market potential. Sponsor found that with a more strategic design, specifically, simulation studies for sample size estimation, the Phase 1 program could meet all market requirements in a single trial and be in a much stronger position to gain many market approvals. To do so, highly advanced clinical trial modeling and simulation was required. Sponsor looked for possible existing solutions but, despite a thorough review, the available tools were too simplistic, and and could not accommodate the program’s needs.. The team considered its options:


Use Internal Resources All existing resources otherwise occupied, skills not as readily available, possibly not available internally
Hire FTE/s The business unit’s operating model was to rely 60-70% on outsourcing, so adding FTE for a single, 1.5 year part time project not an efficient option. Also, additional challenges- administrative burden of managing, hr burden of finding talent, training, etc.
Use one of the two full service CROs already engaged Either CRO would have required more team members, each with only part of the skills required. Also would have required project management resources, an additional layer that is not needed with SDAS. “There is a correlation between size, agility, speed and cost”
Selected "The Best in the field of simulation studies for clinical trials, Thomas Debray, SDAS"
Drug -Sponsor Director of Biostatistics
The Best in the field of simulation studies for clinical trials, Thomas DeBray, SDAS
Traditional Statistical Power meant more patients, a lengthier, more expensive trial, and possibly less and delayed revenue potential
We knew existing tools were too simple. We knew the complexity we wanted. SDAS got us there.
Our Head of R&D , Heads of Clinical Development and Operations are very proud of what we accomplished with SDAS.
Such an elegant solution. Even though on the surface it looks simple.
SDAS designed it to be extremely user friendly. It can be used on other trials, and also by non-statisticians.
In the simpler, traditional trial design scenario, it's super easy to do- just plug and play some parameters in many software programs and it pops out a number; the sample size. However, without the expertise and possibility to add more layers of complexity, one has to rely on strong assumptions. But nobody really wants to work under strong assumptions in drug development. And nor should that be necessary in modern days drug development, due to the application of advanced clinical trial methodology.

Precision Medicine

We understand the importance of having a clear understanding of your data to make informed decisions and drive growth. Our approach is tailored to your specific needs, to help you unlock the full potential of your data. With cutting-edge methods and a deep understanding of the latest developments in your field, we'll work with you to make sense of your data and drive your business forward. Stay ahead of the curve and achieve your goals with our tailored and rigorous approach designed to meet your unique needs.

precmed was developed to help researchers with the implementation of precision medicine in R. A key objective of precision medicine is to determine the optimal treatment separately for each patient instead of applying a common treatment to all patients. Personalizing treatment decisions becomes particularly relevant when treatment response differs across patients, or when patients have different preferences about benefits and harms. This package offers statistical methods to develop and validate prediction models for estimating individualized treatment effects. These treatment effects are also known as the conditional average treatment effects (CATEs) and describe how different subgroups of patients respond to the same treatment. Presently, precmed focuses on the personalization of two competitive treatments using randomized data from a clinical trial (Zhao et al. 2013) or using real-world data (RWD) from a non-randomized study (Yadlowsky et al. 2020).

Precision medicine, also known as personalized medicine, is a rapidly growing field that aims to provide individualized treatment decisions based on a patient's unique genetic, biochemical, and medical profile. Precision medicine recognizes that each patient is unique and that their medical needs and responses to treatments can be different. By taking these differences into account, precision medicine offers the potential for more effective and efficient treatments, and improved health outcomes for patients.

The importance of precision medicine can be explained by several key benefits:

  1. More effective treatments: Precision medicine allows healthcare providers to identify the most effective treatment for each individual patient based on their unique genetic and biological characteristics. This leads to better treatment outcomes and improved patient satisfaction.
  2. Reduced side effects: By tailoring treatments to each patient's specific needs, precision medicine can help reduce the risk of adverse side effects, which can improve quality of life for patients.
  3. Improved patient outcomes: By providing the right treatment to the right patient at the right time, precision medicine has the potential to improve patient outcomes, including survival rates and overall health status.
  4. Better use of resources: Precision medicine can lead to a more efficient use of healthcare resources by avoiding the use of ineffective treatments and reducing the risk of adverse side effects.

In conclusion, precision medicine represents a major step forward in the field of healthcare, delivering customized solutions that meet the unique needs of each patient. We have developed an R package to help researchers in implementing precision medicine in practice. We have elaborated the package here ??

Meta-Analysis of Diagnosis and Prognosis Research Studies

Meta-analysis of diagnostic and prognostic modeling studies. Summarize estimates of prognostic factors, diagnostic test accuracy and prediction model performance. Validate, update and combine published prediction models. Develop new prediction models with data from multiple studies.

  • R package available from CRAN and R-Forge.
  • Maintained by Thomas Debray & Valentijn de Jong
  • Data preparation for systematic reviews of prediction model performance via ccalc and oecalc (Debray et al. 2017, 2018 and Snell et al. 2017).
  • Meta-analysis of prediction model performance via valmeta (Debray et al. 2017, 2018 and Snell et al. 2017).
  • Evaluation of funnel plot asymmetry and publication bias via fat (Debray et al. 2018).
  • Generation of forest plots via forest.

Multiple Imputation by Chained Equations with Multilevel Data

The micemd package provides methods to perform multiple imputation using chained equations in the presence of multilevel data. It includes imputation methods that account for both sporadically and systematically missing values of continuous, binary and count variables. Following the recommendations of Audigier et al. (2018), the choice of the imputation method for each variable can be facilitated by a default choice tuned according to the structure of the incomplete dataset. Allows parallel calculation for 'mice'.

  • R package available from CRAN and GitHUB (maintained by Vincent Audigier).
  • Imputation of sporadically and systematically missing values in multilevel data via mice.impute.2l.2stage.bin (binary data), mice.impute.2l.2stage.norm (continous data) and mice.impute.2l.2stage.pois (count data). See Audigier et al. 2018.
  • Imputation of univariate missing data using a Bayesian generalized linear mixed model with non-informative prior distributions via mice.impute.2l.glm.bin (binary data), mice.impute.2l.glm.norm (continous data) and mice.impute.2l.glm.pois (count data). See Jolani, Debray et al. (2015) and Audigier et al. (2018).
  • Predictive mean matching imputation for multilevel data via mice.impute.2l.2stage.pmm (Audigier et al. 2018).

Multivariate Imputation by Chained Equations

Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011). Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.

  • R package available from CRAN and GitHUB (maintained by Stef van Buuren).
  • Imputation of sporadically and systematically missing values in multilevel data via mice.impute.2l.lmer. See Jolani, Debray et al. (2015) and Audigier et al. (2018).