This is an experiment in using optimization of random weighted combinations of criteria to identify the Pareto efficient frontier of a multicriterion decision problem. The specifics of the problem are simple. A number of binary decisions are to be made, without constraints, and with each decision evaluated independently on a set of criteria.

In what follows, decisions will be labeled “D1”, “D2”, … and criteria will be labeled “C1”, “C2”, … To facilitate computations, we assume that all criteria are to be maximized. (Criteria that should be minimized will be negated.) We also assume that all criteria are normalized to either the interval [0, 1] or the interval [-1, 0].


We first load any required libraries.


Data Generation


Specify the number of decisions and the number of criteria here. Bear in mind that the solution space will have cardinality 2^(# of decisions), so do not go crazy. Also specify a seed for the random number generator, for reproducibility, and the number of attempts to make with the random solution identifier.

nDecisions <- 10     # number of decisions
nCriteria <- 5       # number of criteria
nTrials <- 1000000   # number of attempts to randomly identify Pareto solutions
criterionNames <- paste0("C", 1:nCriteria)  # name the criteria
seed <- 21419        # random number seed

Seed the random number generator.


After all computations are done, the notebook will plot selected two-dimensional slices of the Pareto frontier. List pairs of criterion indices for the desired plots.

frontierPlots <- list(c(1, 2), c(3, 4), c(2, 5), c(1, 3))  # pairs of criteria to plot


We create a tibble of decisions and their criterion values. The criterion values will be randomly generated (uniform over (0, 1)), and some (but not all) will be reversed to give a mix of “more is better” and “less is better” criteria.

# Create the tibble.
decisions <- tibble(ID = paste0("D", 1:nDecisions))
# Set the criterion names.
cNames <- paste0("C", 1:nCriteria)
# Add random variables for the criteria.
for (v in cNames) {
  decisions[, v] <- runif(nDecisions)
# Decide how many criteria to invert (at least 1, at most nCriteria - 1).
nInvert <- 1 + rbinom(1, nCriteria - 2, 0.5)
# Select the criteria to invert.
invert <- sample(cNames, nInvert)
# Invert them.
decisions[, invert] <- -decisions[, invert]


Now we create a tibble of all possible combinations of decisions. The ID (key) will be an integer whose binary expansion gives the vector of decisions. The “DominatedBy” column will give the ID of a combination that dominates the current combination (or -1 if the current combination is Pareto efficient).

solutions <- tibble(ID = 0:(2^nDecisions - 1), DominatedBy = -1)

We will need a couple of functions to convert between integer ID values and binary vectors.

# Create a vector of powers of 2 (for use in conversions from binary vectors to integers).
powers.of.two <- 2^(0:(nDecisions - 1))
# Convert an ID (integer) to a binary vector of appropriate length. Note that the vector is reversed so that the lowest order bit (corresponding to the first decision) comes first.
fromID <- function(id) { as.integer(head(intToBits(id), nDecisions)) }
# Convert a binary vector of appropriate length to an ID value (integer).
toID <- function(vec) { as.integer(vec %*% powers.of.two) }

The value of each criterion for a solution is the sum of the criterion values for the “yes” (1) decisions in that solution.

# Compute the total value of a criterion (specified by name) for a given solution (specified by ID).
combinedValue <- function(criterion, id) { pull(decisions, criterion) %*% fromID(id) }
# Evaluate every solution on every criterion.
for (v in cNames) {
  solutions[, v] <- sapply(solutions$ID, function(id) combinedValue(v, id) )

Computing the Pareto efficient set


We first sort the solution table in lexicographically decreasing order according to the criteria. This guarantees that the first solution in the sort order is nondominated. Moreover, if we remove a nondominated solution from the top of the list, and also remove all solutions it dominates, the top solution in the remaining portion of the list is again guaranteed not to be dominated.

The first step is to create a function to lexically sort a set of solutions (specified by their ID values).

# We create a function that takes a list of solution IDs and outputs the IDs in descending lexicographic sort order according to the criteria.
lex.order <- function(ids) {
  x <- solutions %>%
         filter(ID %in% ids) %>%
         select(cNames) %>% , .) %>% 

We also need a function that compares two solutions (specified by IDs) and determines whether the first solution dominates the second one. Note that if the first solution precedes the second solution in our

dominates <- function(first, second) {
  f <- solutions[first + 1, ] %>% select(cNames) %>% as.numeric()
  s <- solutions[second + 1, cNames] %>% as.numeric()
  all(f >= s) && any(f > s)

Find the Pareto efficient solutions

Our “to-do” list will be the full set of IDs, in sorted order, along with their domination flags (to save some look-ups).

  temp <- lex.order(solutions$ID)
 todo <- solutions %>% select(ID, DominatedBy) %>% arrange(temp)
   user  system elapsed 
  0.003   0.000   0.013 

While the to-do list has length greater than one, we peel off the first ID (which corresponds to a nondominated solution), find the IDs of any solutions it dominates, mark them dominated in the solutions tibble and remove them from the to-do list. This will likely be the slowest part of the code!

cat("Searching for Pareto efficient solutions ...")
Searching for Pareto efficient solutions ...
  for (i in 1:(nrow(todo) - 1)) {
    id1 <- todo[[i, "ID"]]
    # Process the i-th solution only if it is still not dominated.
    if (todo[[i, "DominatedBy"]] < 0) {
      # Compare to every subsequent solution that is not (yet) dominated.
      for (j in (i + 1):nrow(todo)) {
        if (todo[[j, "DominatedBy"]] < 0) {
          id2 <- todo[[j, "ID"]]
          # If the i-th solution dominates this one, update both the solutions tibble and the to-do list.
          if (dominates(id1, id2)) {
            solutions[[id2 + 1, "DominatedBy"]] <- id1
            todo[[j, "DominatedBy"]] <- id1
   user  system elapsed 
312.099   0.277 313.646 

We can now report on the Pareto efficient solution set.

pareto <- solutions %>% filter(DominatedBy == -1) %>% pull(ID)
cat("Number of Pareto efficient solutions = ", length(pareto))
Number of Pareto efficient solutions =  623

Randomly search for Pareto solutions

To randomly find a Pareto efficient solution, we form for each decision a weighted combination of its criterion values (using positive weights uniformly distributed in the interval (0, 1)). Any decision with positive weight is set to 1, and any decision with negative weight is set to zero.

# To avoid repetition, we will extract the criterion values from the decisions tibble to a matrix.
dMatrix <- decisions %>% select(cNames) %>% as.matrix()
# This function generates a random set of weights, forms the combined scores for all decisions, selects the decisions with positive scores, and returns the ID of the corresponding solution.
winner <- function() {
  weights <- runif(nCriteria)
  scores <- as.vector(dMatrix %*% weights)
  chosen <- scores > 0

To keep score, we will add a column to the solutions tibble indicating how many times the solution has been identified as Pareto efficient by the random search.

solutions$Identified <- 0

We now run the specified number of trials, reporting the required time.

cat("Running ", nTrials, " trials ...")
Running  1e+06  trials ...
system.time(hits <- replicate(nTrials, winner()))
   user  system elapsed 
  5.798   0.004   5.803 

We tabulate the number of times each solution key was encountered and update the “Identified” column of the solutions data frame.

temp <-
colnames(temp) <- c("Key", "Frequency")
solutions[solutions$ID %in% temp$Key, "Identified"] <- temp$Frequency

Finally, we report how many Pareto efficient solutions were correctly (or incorrectly) detected.

n <- solutions %>% filter(Identified > 0 & DominatedBy == -1) %>% nrow()
cat("Correctly detected ", n, " Pareto efficient solutions")
Correctly detected  126  Pareto efficient solutions
n <- solutions %>% filter(Identified == 0 & DominatedBy == -1) %>% nrow()
cat("\nFailed to detect ", n, " Pareto efficient solutions")

Failed to detect  497  Pareto efficient solutions
n <- solutions %>% filter(Identified > 0 & DominatedBy >= 0) %>% nrow()
cat("\nFalsely identified ", n, " dominated solutions as Pareto efficient")

Falsely identified  0  dominated solutions as Pareto efficient

Frontier plots

Add a factor indicating whether or not a Pareto solution was found by the random search.

solutions <- solutions %>% mutate(Found = ifelse(Identified > 0, "Found", "Not found"))

Define a function to plot any pair of criteria for the Pareto efficient points.

# Given a pair of criterion indices, scatter plot those criteria for the Pareto efficient solutions.
plotPair <- function(pair) {
  # Get the criterion names.
  names <- criterionNames[pair]
  # Drill down to the relevant data.
  temp <- solutions %>% filter(DominatedBy < 0) %>% select(names, Found)
  show(ggplot(data = temp, mapping = aes_string(x = names[1], y = names[2], color = "Found")) + geom_point())

Plot the calculated and identified Pareto frontier for each selected pair of criteria. Note that, in all plots, the upper right corner is ideal.

plot.list <- lapply(frontierPlots, plotPair)

Search efficacy

To see how often the random search finds already identified points, we can plot a histogram of the number of times each point was found. We restrict to those that were found at least once.

solutions %>% filter(Found == "Found") %>% ggplot(mapping = aes(x = Identified)) + geom_freqpoly(bins = 50)

Clearly some points are identified quite a few times. Here are the quartiles for the distribution of the number of times each Pareto efficient solution found by the random search was hit.

solutions %>% filter(Found == "Found") %>% select(Identified) %>% summary()
 Min.   :     1.00  
 1st Qu.:    70.25  
 Median :   663.00  
 Mean   :  7936.51  
 3rd Qu.:  4336.75  
 Max.   :203690.00  

The distribution has a very long right tail (significant positive skew).