Skip to content
Snippets Groups Projects
user avatar
Orlane authored
3c100d3e
History

Deep Reinforcement Learning for Controlled Piecewise Deterministic Markov Process in Cancer Treatment Follow-up

Install

Run following lines to download the git repository

git clone git@forgemia.inra.fr:orlane.le-quellennec/controlled_pdmp_po.git
cd controlled_pdmp_po
pip install -r requirements.txt
pip install -e .

Description

Environment fold

The fold env contains all environment used in the paper. The full_pdmp.py corresponds to a piecewise deterministic Markov process (PDMP) simulator. It simulates patient's trajectories. Those trajectories are fully observable.

To create a PDMP trajectory instance :

import gymnasium 
from gymnasium.envs.registration import register

# Import your environment
from env.full_pdmp import Patient

# Register your environment
register(
    id="env/Patient",
    entry_point="env.full_pdmp:Patient",
)

# Load an instance of Patient PDMP model
env = gymnasium.make('env/Patient', render_mode="human")

The partially_observable.py corresponds to the PDMP transformed into Partially Observable Markov Decision Process (POMDP) model detailed in the paper. To create a partially observable patient trajectory instance :

import gymnasium 
from gymnasium.envs.registration import register

# Import your environment
from env.full_pdmp import Patient
from env.wrappers.partially_observable import POWrapper

# Register your environment
register(
    id="env/Patient",
    entry_point="env.full_pdmp:Patient",
)

# Load an instance of partially observable patient (POMDP model)
env = gymnasium.make('env/Patient', render_mode="human")
env_po = POWrapper(env)

Simulations fold

This folder contains script to simulate trajectories according to a specific model and a chosen policy. All trajectories cost are stored in data folder.

This is an example of command line to run to execute the code.

cd simulations
python generate_data.py --env pdmp --policy alea --num-samples 100000
python generate_data.py --env pomdp --policy dqn --num-samples 100000

To compare all policy costs run compare_cost.py script.

cd simulations
python compare_cost.py --logdir ./data/pdmp_alea.csv ./data/pomdp_thresh.csv ./data/pdmp_inactive.csv ./data/pomdp_dqn.csv

Tests fold

The folder tests contains some functions test for each environment and wrappers.

Training

The folder training contains all necessary script to train / run / exploit all neural networks. This folder contains :

  • [Hyperparameter tunning] This script performs hyperparameter tunning with Rllib Tuner to find good hyperparameters for the algorithm. The script outputs a yaml file with the combination of hyperparameters tested as well as the best hyperparameters found.
python ./training/tune.py --config-file ./env/experiment/pomdp_v2_dqn.py --stop-timesteps 100000  --num-samples 1000 --stop-iters 1000 --output-file ./env/experiment/tuned_hyperparams_dqn_v2.yaml
  • [Neural network creation] This script performs multiple training and evaluation cycles using the tunned hyperparameters.
python ./training/evaluate.py  --config-file ./env/experiment/tuned_hyperparams_dqn_v2.yaml --stop-timesteps 100000 --evaluation-interval 5 --stop-iters 1000 --num-samples 3 --output-folder ./env/results/pomdp_xp2_DQN

Training with action masking

  • [Neural network creation with action masking] This script performs multiple training and evaluation cycles using the tunned hyperparameters.
python ./training/evaluate.py --masking --config-file ./env/experiment/tuned_hyperparams_dqn_v3_with_action_mask.yaml --stop-timesteps 100000 --evaluation-interval 5 --stop-iters 1000 --num-samples 3 --output-folder ./env/results/pomdp_xp_DQN_with_action_masking
python ./training/evaluate.py --masking --config-file ./env/experiment/tuned_hyperparams_r2d2_v3_with_action_mask.yaml --stop-timesteps 100000 --evaluation-interval 5 --stop-iters 1000 --num-samples 3 --output-folder ./env/results/pomdp_xp_R2D2_with_action_masking

Roadmap

Still in progress on this repository:

  • Action-masking to deal with constraints (?)
  • R2D2 simulations
  • Graph comparison / script

Authors and acknowledgment

Alice Cleynen, Benoite de Saporta, Orlane Rossini, Régis Sabbadin and Meritxell Vinyals