Modeling Patient Flow among Hospital Wards Using Non-Diagnostic Data

Hospital bed capacity is a limited resource and a key concern in health care planning. Using discrete-event simulation modeling and the MIMIC-III data set, this paper produces a demographic and metadata-only model of patient transfer within hospital wards. The model successfully approximated the underlying transfer dynamics (95.63% accuracy measured using RMSE). The accompanying visualization may be used to examine patient flow. The simulation will be used as a test bed for future work concerning flow of artificially generated patient admissions and can in general be useful in simulating patient flow in cases where demographic information is available but transfer records are not.


Introduction
In health care planning, the hospital occupies a central role. Hospital resources are limited, and particularly so in Intensive Care Units (ICU)-type wards due to the high cost associated with ICUs [1,2]. Traditionally, a hospital's capacity was estimated in terms of the number of beds available in the hospital, but increasingly additional measures (e.g., patient flow) are considered useful in modern hospital capacity planning [3]. Modeling in the field of health care has many applications, including capacity planning, epidemiology, and understanding doctorpatient relationships [4]; however, planning and resource utilisation has been highlighted as the most widely modeled area of health care [5]. Our work builds upon a wellestablished tradition of applying simulation modeling to the problem of improving the use of hospital resources [6]. This is often referred to as "bed modeling" due to the capacity limitation of a hospital being defined (at least in principle) by the number of beds available. Various methods have been used to model the movement of patients through a hospital, including discrete-event simulations, system dynamics, and Monte Carlo methods among others, as described in [7]. Previous research has also specifically investigated the use of discrete-event simulations to estimate ICU bed capacity to explore occupancy rates [2].
Discrete-events simulation (DES) is a popular modeling technique in which individual and independent entities are subject to events which may be as varied as the subject modeled [8,9]. In the context of the choice of modeling technique for this kind of problem, DES may be distinguished from system dynamics by virtue of its nondeterministic and non-continuous nature, allowing events rather than explicit equations to represent the flow through the system [5]. Another commonly used simulation technique, agent-based modeling, allows for incorporating "human behavioral aspects" by allowing for rules of varying complexity that the entities or agents in the system will follow. DES allows us to build a model of relative simplicity that is not focused on individual entities but rather on their flow through the system [10].
This paper aims to investigate the use of discrete-events simulations to produce an abstract overview of patients' movements between hospital units. In particular, the aim is to construct a flexible model of the flow of patients among hospital units via the use of demographic information and patient arrival metadata without any regard to diagnostic information concerning a patient's condition(s). This model may then be used to simulate desired demographic or arrival condition mixes and observe their impact on the system's occupancy. Because a patient's movement within a hospital is largely assumed to be a function of the patient's condition and treatment received which is not directly modeled, it is hoped that the model will yield some insight as to how much of the real-world flow of patients could be inferred without having to consider the primary causes. The model's nature as centered in demographic and arrival methods/times has the potential to be informative for hospital layout planning and in general planning for cases where the detailed diagnostic and transfer information might not be available to planners. This paper is organized as follows: the next section discusses related works; followed by a description of the methods including the source material used and the process employed; then in-depth analysis of the results of the simulation; a discussion of notable aspects of the model and results; and finally conclusions drawn from this stage of our research.

Related Works
The use of modeling techniques and simulation to examine hospital capacity is well established in literature, with Markovian models being reviewed among other techniques by McClain [11]. Bed modeling research has since incorporated demographic characteristics and other ancillary information to simulate interactions in the context of hospital systems with Dumas' work [12]. Different modeling methods have been used explored for bed modeling including neural networks [13], queue-based modeling [11,14,15], probability-based mathematical models [16,17], system simulations [18], and discrete event simulation [19][20][21][22][23]. Specific research has been conducted looking at patient flow in the context of elective surgeries [24], admission policies [25], neuroscience departments [26], cardiovascular surgeries [23], bloodwork and other medical scans administered [22], and among wards [20].

Methods
To build the model, the MIMIC-III dataset was used as source material. The MIMIC III dataset originates with the Beth Israel Deaconess Medical Center in Boston, Massachusetts and spans critical care unit stays between 2001 and 2012. The MIMIC III dataset is "is the only freely accessible critical care database of its kind", and provides detailed information about patient demographics and movement between wards of the hospital [27]. The dataset has been previously used for similar analyzing stay duration length, including predicting stay length duration using neural networks [28].
Patient entities are generated using the demographic data from MIMIC-III. Each entity has the following characteristics:  Admission location: which unit the patient enters the hospital from, e.g., Emergency Room Admission, Clinical Referral, Physician Referral, etc.  Gender: male or female.  Age range: Converted from exact ages to one of four age ranges (under 15, 15-24, 25-64, 65 and over).  Admission hour: 0-23, i.e., from 12:00 a.m.
(midnight) to 11:00 p.m. Patients randomly transfer from one unit to another specific unit based on weighted probabilities reflecting the underlying MIMIC-III data. This is triggered after passing the median timeslot in terms of waiting time in the same underlying dataset. Median time spent in each unit was calculated based on the MIMIC-III data. Using these characteristics of patients and transfers within the hospital, we simulate patient admissions over a single day and their time within hospital up to a predetermined end time (two weeks in our experiments).
The model's basic time unit is a single hour. Simulating patient movement followed this process for each timeslot: 1. New patients are added to the system based on admission hour (no patients added past the first day). 2. For each location and patient, check if the patient needs to move. If so, use the transfer probabilities to stochastically select a destination and update the patient's location accordingly. 3. If not at the end time, continue to the next time slot.
Thus, to create the model, two key metrics were used: probability of transfer between states (where states are the wards patients are treated in), and the amount of time spent in each state. The former is a relatively straightforward exercise, as the dataset is broad enough to be able to simply rely on transition probabilities from the underlying dataset without alterations. The latter was created from the medial time spent in between the given transitions based on demographic data and other metadata considered.
The timestamps present in the source database are internally consistent between each entry, however there is some small portion of the data with inconsistent entries. This however represents only a very small portion of the overall dataset and should have no significant effect on the overall results. Once the basic model is built as matching probabilities and durations, input data can be run through it to simulate the patients' arrival.

Results
The results of running any given input data are exported as a series of timeslots where the placement of patients is tracked by ward and unit combination. This can then be visualized by replaying the succession of timeslots in a basic visualization module. It can also be compared to real non-generalized hospital data for validation purposes. While the model represents an obvious generalization of the underlying data, to ascertain just how much generalization was occurring it was run through with the MIMIC III dataset and compared to the real transfer data on a timeslot by timeslot basis. A simulation was run for patient permanence in the hospital over 14 days (patients n=58952), clustering all arrivals in the first 24 hours but maintaining the hour of arrival. The real transfer data's granularity was reduced to match the one-hour timeslot duration of the model.
The mean square error and root mean square error were calculated for each timeslot. The root mean square error ( Figure 1 Figure 4).

Discussion
At the end of the sample two-week run, RMSE = 2579, which represents an accuracy of 0.9563 over the sample size. This high accuracy is expected as the model is tightly coupled with the underlying MIMIC-III data used to generate the model. The accuracy value can be thought of as a proxy for the model's expressiveness in explaining patient movements using demographic information and admission metadata.
The visualization allows us to see the traffic levels of the different hospital units over time, with large clusters on admission units (the first row) in early time steps, dispersion into many specialist units (central rows) in intermediate time steps, and various discharge locations (bottom row) at the end of simulation. From this, we are able to clearly see which units get the most traffic, which is critical for debugging (during development and extension of the model) as well as analysis. For instance, by running intentionally skewed demographic makeups through the model, one may observe the effect this causes on specific wards and at specific time in the entities' permanence in the system.
During development, a tightly coupled approach to the underlaying data was adopted, taking advantage of the relatively high number of cases available. As the work is extended in future to encompass synthetic and different data (artificially generated patients, patients' records from different sources) the routing model will likely need to be fuzzified to allow for routing paths not explicitly present in the source dataset used to generate the model. In observing the model's visualization, it was noted that the admission and discharge locations appear to possess a Pareto-like distribution, showing that most patient were either sent home or cared for in their home (Home, Home Health Care) and that likewise most patients hospitalized were admitted via the Emergency Room or Clinical Referral. This is a characteristic of the underlaying data which the model replicates, and it would be interesting to see whether acquiring and normalizing data from other hospitals' admission databases would yield similar distributions. In contrast, patients are most evenly distributed at the intermediate stages inside the hospital, however the dataset's documentation does not provide insight into the purpose and nature of non-ICU wards and notes that the numbering may not be consistent for the duration of the data collection period. This limits the ability to draw conclusions from non-ICU portions of the model. This includes the high early allocation of patients to Ward 27. The model is limited by not investigating waiting times directly, but instead relies exclusively on arrival and departure times for each ward or unit. That means that no queuing system is explicitly implemented in the model, though some queuing is in effect implied in the median time spent in a unit. Future work could refine this to include waiting time and explicit capacity limits on wards to improve the model's explanatory power.

Conclusions
This model is a preliminary attempt to consider demographic aspects of hospital resource utilization with high accuracy in matching patient transfers to the source data. This is notable considering that only four patient characteristics are used to compute transfer, two of which are demographic (age and gender) and two which are based on admission (admission location and time of day). While it still makes sense to consider medical characteristics (such as diagnosis) in a more complex iteration of this model, the current results are a strong argument for the importance of including demographic factors. The MIMIC-III data also includes other patient data, including religion, ethnicity, and marital status. These could also be informative characteristics to add to the model, although some specific considerations need to be made, such as whether or not to consider different religious denominations (e.g., Catholic, Baptist, Lutheran, etc.) separately, or to combine them into one group for each religion as a whole. The next stage of investigation is to test this system both using artificially generated patients and patients from other datasets, and test whether the system behavior as a whole is still close to the MIMIC-III data. One key requirement we are working on is ensuring that the generated patients are of sufficiently high quality, i.e., they are similar to real patients admitted to this hospital. We hope that the system model described here will serve as a good testbed for the next stage of our work.

Conflicts of Interest
The authors declare that they have no conflict of interests.

Ethics Approval and/or Participant Consent
Research that only uses secondary data does not require ethical review at Thompson Rivers University. Use of the MIMIC-III data set followed the ethical requirements and process set out by the data holder.

Authors' Contributions
ML: made ongoing contributions to the design of the study, accessed and prepared the data, built the simulation model and visualization, ran the analysis, contributed to the manuscript, and gave final approval of the version to be published. PJ: made ongoing contributions to the design of the study, contributed to and edited the manuscript, and gave final approval of the version to be published.

Funding
This study was funded by a Thompson Rivers University Internal Research Fund Grant (reference no. 101762).