Using OPM

OPM (The Open Porous Media initiative) is a group developing a set of open source tools that allow the scale-up and modeling of fluid flow in subsurface reservoirs. The tools being developed use the industry standard Eclipse data format for input and output. OPM is sponsored by a number of Norwegian organizations, including IRIS, Sintef and Statoil.

This vignette will eventually document a workflow for using the OPM tools, along with tools developed in R, to model a reservoir. In the mean time, it will be a random compilation of snippets documenting tools completed so far, the planned work flow, planned tools and notes to self. At this point, much of the workflow being developed is at the proof of concept phase, and will need some additional features for convenient usage, as well as significant profiling and optimization.

The Vision

Modifying reservoir models so that they are capable of matching observed data (history matching) and making reasonable predictions has long been a very time consuming and arduous task, with often somewhat disappointing results. In recent years there have been many attempts to develop computer assisted methods to both improve the results and to decrease the time and effort necessary to develop models. These methods have had mixed results, and their implementations have often been proprietary. (add references here)

This module, along with other associated modules, is an attempt to develop a workflow that combines the use of the OPM tools with tools already existing in R. Particular effort is made to incorporate the excellent set of tools developed by the DICE and ReDICE consortia to improve complex computer models. Initial efforts will focus on trying to incorporate portions of five R packages that implement a coherent workflow. These packages are DiceDesign, DiceEval, DiceKriging, DiceOptim, and GPareto. The first four packages have links to useful published articles in their “Citation” sections, and the last one has a useful vignette. Additional useful packages, parts of which may be incorporated for use here, include sensitivity, DiceView, KrigInv, MuFiCokriging, fanovaGraph, and GPGame

An additional idea that this package would like to support is that of reproducible research. Reproducible research mixes documentation with computation in a manner that allows the entire modeling process to be reproduced by a third party. Its’ intent is to start with the cruftiest of data, and implement documentation and calculation of the entire workflow to the final product. An excellent introduction to reproducible research is available here: Karl Broman’s course “Tools for Reproducible Research”. An interesting effort in a related field is a project to build a reproducible groundwater flow model by a USGS team using the MODFLOW package: A Case Study in Reproducible Model Building. The github site for this project is here.

The reproducible research approach on a shared platform has great value in an environment where multiple partners may be contributing their expertise, and all need to agree on the results before a project may move forward. Each of the partners may examine the progress towards mutual goals at any time, and make contributions to move the project forward in a totally transparent manner.

For the purposes of a reservoir modeling workflow, a reproducible research approach is more aspirational, than currently achievable. The variety of data necessary, the tools necessary to process the data, and level of interpretation required, make the process difficult to completely reproduce. The workflow developed here envisions creating a GitHub repository for a model, using RMarkdown for documentation, and RStudio for running, analyzing, and committing to version control. Tools will gradually be added to make different parts of the process more reproducible.

First Things First: Run a Simple Model

Obviously, the OPM tools must be installed. Detailed instructions are available at the OPM website. The website explains how to install ‘stable’ binary versions of the tools. It also explains how to download, build and install the sources from GitHub. As the OPM tools are currently in a very rapid state of development, it may sometimes be desirable to download and build the latest versions, if functionality that you need has been recently added. The current project, runOPM, is being developed on Ubuntu, and should work on any modern Linux platform. When time allows, it would be desirable to port the functionality to Mac and Windows platforms.

The functions for running and analyzing the simulation runs assume a consistent directory structure and file naming convention, even when running a “simple” model. A base or project directory will be defined for all runs. Within the base directory, there will be a DECKS directory, where all model decks will be stored. Underneath the DECKS directory will be a series of directories where include files may be stored. Some flexibility for include directories is acceptable, but one would expect GRID (static grid properties), PVT (fluid properties), RELPERM (relative permeability data), WELLS (well path and completion definition) and RECDAT (recurrent data such as producing rules and completion scheduling) directories. Flow tables (FLOT) should also have there own directory. Each directory may have multiple include file versions to capture ranges of uncertainty and various development scenarios.

Within the base directory, and parallel to the decks directory, there will also be an OUTPUT directory and a REPORTS directory. The OUTPUT directory will have an individual directory for each simulation run, where all case specific files (except the deck) will be stored. The case directories will have the same name as the deck, without the deck suffix (either .data or .DATA). The REPORTS directory will include summary and analysis data that covers multiple cases.

In history match or development scenario projects, it is expected that most of the case decks will be generated from a template deck (also stored in the DECKS directory). Using a template is important, as it allows easy tracking of how changes to the input deck influence changes in the simulation output. Each case name should have a base part and a variant part, separated by an underscore, “_“. The base part should indicate which template deck was used to generate the case.

Three functions are needed to run a very simple model:

RunFlow allows you to run the simulator, if you have an appropriate Eclipse style deck.
EclSum converts Eclipse style summary output to a csv file, and loads it.
PlotEach plots the simulator output data from the csv file.

Here is an example running the SPE1 test deck from the first SPE Comparative Solution Project. It is a very simple model with one producer and one gas injector. There is no movable water in the model.

library(runOPM)
# get the SPE1 deck form the package extdata directory
deck <- system.file("extdata", "SPE1_CASE1.DATA", package = "runOPM")
RunFlow(deck, basedir = "tmp", wait = TRUE)
rslts <- EclSum(casename = "SPE1_CASE1", basedir = "tmp")
PlotEach(rslts, wgnames = "FIELD")

History Match Workflow

History matching assumes that you are starting with a suitable geologic model, or possibly a suite of model variations. This cannot be over emphasized. The most useful and reliable models are developed iteratively, with close communication between the earth scientists and the engineers.

This module is only dealing with the engineering portion of the workflow. It implements a standard experimental design approach. If the user would prefer an Ensemble Kalman Filter approach, the ERT project provides appropriate tools. We assume the user is comfortable editing a deck and running simulation models. It also assumes that the engineer has performed all of the normal due diligence. The PVT and relative permeability have been adequately characterized, along with appropriate uncertainties. Some ideas of how permeability and porosity vary with geology are envisioned. Significant upfront time should be spent trying to characterize as many of the uncertainties as possible. Conceptually from a reproducible research point of view, all of this work should be included and documented in the project directories, but tools to help with this have not yet been developed.

Despite this effort at preparation, during history matching, both ranges of uncertainty and the variables of concern will change. The work process must be flexible enough to add and subtract variables, and change uncertainty ranges and distributions.

The history match general workflow is:

Create a history data file for comparison with simulation runs.
Create a deck template defining the variables one wishes to change during the history match process
Generate a suitable experimental design
Generate decks, runs models, and retrieve results
Analyze results and modify the experimental design to try to improve the results. This is where the creativity is required, and where existing tools in R can be very helpful.
Rinse and repeat.

History Data

The history data needs to be in a “long” format comparable to that created with the EclSum function. The ImportHist function will read a column oriented “wide” format csv or Excel file, and add it to the project file for later use.

The history data needs to be on the same dates as the simulation model output. Future enhancements will implement some interpolation to ease this requirement somewhat, but it still is a good idea. This version of the workflow only deals with data measured at the well. Future efforts will attempt to incorporate 4D type data.

Variable Definition

A simulation deck, suitably modified to include design variables, is called a deck template. The modification involves replacing portions of the deck with a variable that may be used to substitute in variations for exploring uncertainty, of the form {$xperm}

Experimental Design

Experimental design for a computer model is significantly different from that for a physical experiment. For example, repetitions serve no purpose, as the computer will always give the same result with the same input (hopefully). A great deal has been written about experimental design for computer models.(add references here) This module will use the tools from the DiceDesign package, which includes a fairly comprehensive set of methods and a number of tools to assess various qualities of the designs. DiceDesign appears to imply that a study should be undertaken to assess the quality of various design approaches for every new project. The vignette ED_Tools follows the logic described in DiceDesign and DiceEval: Two R Packages for Design and Analysis of Computer Experiments, and explains the defaults chosen for use in this module. A user is free to take a different approach, should these defaults be found to be unsatisfactory.

Create, Run, and Retrieve

Blah, blah, blah.

Analyze the Runs

Yada, yada, yada.

History Matching Strategy

Within the general workflow outlined above there are many possible strategies to work towards a suitable history match. This secton outlines a strategy that the tools being developed are intended to support.

It is expected that this approach will take many simulation runs, and that computer time is cheaper than engineer time. Many compromises are necessary to balance objectives with resources. If an excess of computing power is available and the project deadline is short, by all means use parallel processing, but serial runs will always be more efficient.

If the model is large, cumbersome, and slow, it may not be appropriate for this strategy. Consider significant scaling up to get a usable model. If detail is needed in a particular area, perhaps for a possible development well location, consider a staged history match. A coarse model may be history matched, and then used to provide boundary conditions to a local grid in the area of interest.

Experimental design and variable selection

Selection of the appropriate model uncertainties is of paramount importance. This is expected to be an iterative process. The number of uncertainties is usually very large, and the range of uncertainty for each variable is often unclear. Exhaustively exploring all the various combinations is not possible with any reasonable amount of time and resources, so the variables need to be properly curated.

The first experimental design should be something that looks at the extreme values, with some level of internal filling. When examining the results from the first set of runs, one should ensure that the desired range of outcomes have been captured. For example, the range of results for the watercut of a well of interest must be both above and below the historical watercut. If it isn’t, then one must either add more variables, or expand the range of uncertainty of the existing variables. This is why it is important that the early “variable selection” runs include the extreme values.

Adding internal points is important because some of the analytic methods for assessing which variables are most important (e.g. Sobol Indicies) don’t work very well with only extreme values. Using only extreme input values tends to give very extreme estimates of which variables are important, and may lose some desirable subtlety in the choice of which variables to carry forward.

The “variable selection” strategy has two basic parts. First, ensure that the desired outcomes have been bracketed. If not add variables or expand uncertainty ranges. Second ensure that the variables selected are significantly affecting the outcome. If not, drop the variable. In complex models, one is often surprised which variables are the most useful. Variable selection is probably the most difficult part of the modeling exercise. After that, the issues are reduced to successive refining and interpolation.

The tools supplied to assist this process are:

Experimental design: Folded Plackett-Burman combined with a suitable space filling design
Comparison of modeling results with history.
Assessing which input variables are influencing which output variables, and which input variables do very little.

Modeling goals and assessing model quality

Setting the appropriate modeling output goals is important, and is intimatly associated with the purpose for which the model is being created.

discussion <- c(“absolute error”, “relative error”, “error trends”, “Pareto optimality”, “level of error by parameter and model area as related to modeling purpose”)

History Matching Example

Unfortunately, publicly available history matching data sets are difficult to come by, especially if one is not associated with a university, e.g. Norne. Should anyone reading this have access to a suitable Creative Commons licensed data set, I would appreciate a note to gerwathome at gmail.com. Until I can find a suitable data set, I will create an entirely synthetic simple data set based on SPE9. The model has ~9000 cells with a geostatistically derived permeability distribution, one injector, 25 producers and 900 days of history. It will be assumed that the model as given provides the correct “history”. A set of geologic regions will be created using the porosity and permeability. The model has constant permeabilites assigned by layer, which give very odd looking porosity - permeability cross plots. The history match will start with 30 variables. These will be layer multipliers for porosity and permeability, as no reasonable rock type regions could be identified with this synthetic data.

stream of consciousness

need to add overwrite/perhaps/don’t overwrite to error calcs
need to allow selection of case, name, keyword, and error type to error caclulation
need to profile caclulations to make them faster

It turns out that 64 cases, 26 wells (and the FIELD summary), 11 keywords (rates, cums, ratios, producers and injectors) and 12 different error estimation quantities generate a great many error parameters even for this simple case. Some work needs to be done to make the functions quicker and more efficient, but it also behooves the engineer to think carefully about what he is trying to do, and be a little cautious about what he asks from the computer. When permutations are involved, it quickly becomes possible to overwhelm the most powerful computers and the most efficient software. For this toy level problem, we will try to match at the field level only. We will only try to match oil, gas and water rates. We will look at the maximum and minimum errors to ensure that we have bracketed the desired solution, but our optimization will only include absolute and relative fraction error. This means that we will have 1 entity * 3 rates * 2 errors for a total of 6 objectives that need to be optimized. Our meta-modeling technique will be kriging, so we will use the 64 cases to create 6 models of the parameters which we are trying to optimize. We currently have 30 different input variables: one porosity and one permeability multiplier for each of the 15 model layers.

First we will generate Sobol indices to attempt to sort out which input variables influence which error parameters, and which input variables we might be able to neglect.

Next we will EI/EGO algorithms to determine which areas of the model might be most improved by running aditional cases. I’m not sure when the Pareto multi objective optimization should start. It turns out that GPareto doesn’t work well if you don’t have reasonably well defined functions for each objective. In our case, trying to use kriging models developed from a Plackett-Burman(PB) design is not recommended. The optimizer times out on maximum iterations without coming to a decent solution. The rule of thumb for an experimental design with 10 cases per variable is probably appropriate. My approach will be to build a PB design, and then augment it with a spacefilling design until the desired number of cases is created. In this case, with 30 variables and 64 cases from a folded PB design, the spacefilling design will need to add an additional 236 cases.

Multi-objective sensitivity: TunePareto, sensitivity, multisensi, and SPOT.