Terminology and Medical Pattern Computing in EMR
For Medical Informatics(MI), the topics relating with medical semantics are doubtlessly much more challengeable than the works at grammatical levels. But, of course, MI could not only stay at the grammatical area. It needs to try go further and deeper into the area of medical semantic problems which are so active in the daily clinic practices. Frankly speaking, the works here are just somewhat of an experiment.
This paper is focused on how to unearth the perhaps clinically informative data patterns based on those relations or courses among the separate dada elements in electronic medical records(EMR). The works also show that perhaps we need to do more in medical terminology to achieve this end.
The concepts and definitions
(1) Event in medical data(EMD): An series of clinical actions, such as measurements or observations and interventions, which are taken as happening at one temporal point or one continuity of period, where the different time points make no different sense in clinic.
(2) Feature(F) of an event: One of those elements which constitute an event. A feature may the description of clinical measurement or observation, clinical intervention for treatment or diagnosis, such as a medication. Namely an event is a set of features.
(3) Attributes(A) are the descriptions of a feature in aspect of computing. They may be the code or name of feature(CNF), recording time(RT), value of existence(VE), clinic nature (CN) of a feature, temporal point or duration nature of the effect(TPDNE) of a feature, types of temporal order (TTO) , serial number of feature(SNF), serial number of event(SNE).
The followings are the descriptions on those attributes.
A. CNF: the code or name of a feature, e.g. “chronic hypertension” ,”proteinuria”, ”blood pressure ranged from 140/80 to 240/140 mmHg”,”Exam Result for pulmonary emboli”, “hypoxemia”, etc.
B. RT: Recording Time (RT)：the time point, usually expressed based on the difference or length between the recording time for an event and a given referential time, e.g. the beginning of a gestation. In the following examples, “GST210d” means “at 210th day in gestation”. Some time we use some uncertain or relative ways to express RT, for instances, “GST210d+” means “after 210th day in gestation”.
C.Value of existence(VE) of a feature: It is so called “existential strength” of a feature.
“0” means “zero existing” or “objective nonexistence” (“NO” or “FALSE”, logically). For example, one can see the following feature with CNF “Exam Result for pulmonary emboli”. “1” means “（objective）existence” (“YES” or “TRUE” logically). Some times we use relative way “1+” to express “relatively higher in existential strength or existential positive increment (compared with the last record of the same or comparative feature)”. It usually indicates a positive turning point(PTP) of the feature. “2” means existence based on subjective judgement or possible existence. For examples, one can see the following features with CNFs “suggestion of release of massive amounts of catecholamines”, “improvement in the patient's condition”.
D. CN:Clinic nature of a feature. Its values include
“1”: normal physiologic feature; “2”: abnormal (or pathologic) feature; “3”: artificial interference fact.
E. TPDNE: The temporal point or duration nature of the effect (TPDNE) of a feature.
“1”: point-like feature(PLF); “2”: DCE(durative and continuous feature ; “0”: TPDNE is unsure.
F. TTO: Types of temporal order (TTO),including following types:
“1”: temporal order is determined by a causal relation between the current feature and previous and neighboring feature or a designated feature, whereas “1-(10)2” and “1-4” mean the causal relations between the current feature and “feature 2 in 10th event” and “feature 4 in current event”, respectively. “2”: means possible physical order(PPO); “0”: means that the time order is just determined by the data order in EMR.
G. SNF: serial number of feature.
H. SNE: serial number of event.
The format and examples of the calculation attribute set(CAS) of feature
Thus the form of a CAS of a feature could be expressed as CNF(RT, VE, CN, TPDNE, TTO, SNF, SNE).
Here are the segments of a medical record downloaded from Internet (David J. Lyman: http://www.jabfm.com/content/15/2/153.full.pdf) and let’s see how to build our CASs for each feature and to explore the methods for digging out the data patterns based on these temporal and existential strength attributes(T-ES attributes) of those features.
The case quoted here tells us a story about a pregnant woman has an extremely high blood pressure which is hardly to be controlled and how the cause of the hypertension was casually found and the severe medical condition was resolved at last.
The following is a selected section of that medical case expressed in free-text.
“One morning at 37 weeks' gestation, the patient awoke with a severe headache and blurred vision. When she was examined at Labor and Delivery, her blood pressure was 170/104 mm Hg. She had brisk deep tendon reflexes, proteinuria (2+), and a favorable cervix. Because of her chronic hypertension and preeclampsia the perinatologist recommended immediate induction of labor, She had an amniotomy and was given oxytocin, magnesium sulfate, and supplemental intrapartum labetalol. She gave birth vaginally to a healthy female infant with Apgar scores of 8 at 1 minute and 9 at 5 minutes and a birth weight of 7 lb 13 oz.”
According to descriptions presented above, we can get the CASs of those features manually(in parentheses).
(11)blood pressure was 170/104 mm Hg(277d,1,2,2,0,3,4)
(12)brisk deep tendon reflexes(277d,1,2,2,0,4,4),
(17)induction of labor recommended(277d,1,3,1,0,9,4)
(19)Administration of oxytocin(277d,1,3,1,0,11,4)
(20)Administration of magnesium sulfate(277d,1,3,1,0,12,4)
(21)Administration of intrapartum labetalol(277d,1,3,1,0,13,4)
(22)giving birth vaginally to a healthy female infant (277d,1,1,1,0,14,4)
As stated above, all of the CASs shown above are built manually. But it is not necessary to do so for a rather mature data engineering. Instead, most of them could be obtained in automatically or half-automatically methods. The code or name of a feature(CNF) may be simply an element in an ontology or terminology, such as SNOMED or its extensions; Recording Time (RT), serial number of feature(SNF) and serial number of event(SNE) are mostly taken from EMR automatically, if the latter is properly formatted. Clinic nature (CN) of a feature and the temporal point or duration nature of the effect (TPDNE) may be also red from the new descriptions from an enhanced ontology(EO) or terminology. The types of temporal order (TTO) for some feature may be red from an EO, whereas for some features they would be determined manually.
What the CASs tell us
Let’s look the calculation attribute set(CAS) of the feature “(3)administration of nifedipine:(GST210d,1,3,1,1-(1)1,1,2)”:
***The feature exists “at 210th day in gestation”(RT=GST210d); ***Its existential status is “YES”(VE=1); ***Its clinic nature is “artificial intervention fact”(CN =3); ***It is a point-like feature(TPDNE=1); ***temporal order is determined by a causal relationship between the current feature “administration of nifedipine” and the feature 1 of event 1, i.e. “complaint of chronic hypertension”(TTO=1-(1)1); ***the serial number of the feature is 1(SNF=1); ***the serial number of the event where this feature is located is 2(SNE=2).
Here RT=GST210d,VE=1,TTO=1-(1),SNF=1,SNE=2 might be achieved from EMR through an application automatically, if the EMR is with these information. As regards CN=3, TTO=1-(1)1, they might be computed through some general or specially enhanced (in operational ability) medical ontology or terminology and some very simple logic rules. For example, through propositions from a terminology “nifedipine mayTreat Hypertension” (see Pic. 1) and “ChronicHypertension is_a Hypertension”(see Pic. 2), we can get the resultant statement TTO=1-(1)1. And through the proposition “administration of drug is-a InterventionConcept or artificial intervention fact” (see Pic. 3) and the proposition “nifedipine is-a drug” (see Pic. 4) we can easily get the statement “administration of nifedipine is-a InterventionConcept or artificial intervention fact” or CN=3.
- Dmpm pic001.png
Pic. 1 the proposition “nifedipine mayTreat Hypertension” in a terminology.
- Dmpm pic002.png
Pic. 2 the proposition “ChronicHypertension is_a Hypertension” in a terminology.
- Dmpm pic003.png
Pic. 3 the proposition “administration of drug is-a InterventionConcept” in the terminology.
- Dmpm pic004.png
Pic. 4 the proposition “nifedipine is-a drug” in the terminology.]]
The time-axis-orientated contextual pattern around sudden-change point(SCP)
（i）The identical subject feature(ISF) and the target concept
Identical subject features(ISFs): If two features(F) or the subjects(Fs) on which the features centre belong to same concept (or same up concept), they are called identical subject features(ISFs) oriented on that concept. And the latter is the target concept for the operations. In another words, if (F1 or Fs1) is_a C and (F2 or Fs2) is_a C, then F1 and F2 are of ISF oriented on the target concept C. Generally speaking, to determine whether two features are of ISF is easily to do by human brain, but not so easily to do by machine. That means we need some more operations to judge them, if we hope to do it by way of so called knowledge engineering and that would be discussed some where else.
We present here the following list of features of ISF oriented on target concept Blood Pressure and it is extracted from the EMR stated above by the author’s expertise and according to definition above:
(1)complaint of chronic hypertension:(GST210d-,1,2,2,2,1,1) (6)hypertension worsened (GST270d,1+,2,2,1,1,3) (11)blood pressure was 170/104 mm Hg(GST277d,1,2,2,0,3,4) (15)chronic hypertension(GST277d,1,2,2,0,7,4) (28)blood pressure ranged from 140/80 to 240/140 mmHg(GST277d+,1,2,1,0,2,6) (38)poorly controlled hypertension(GST278d,1,2,2,0,6,7) (47)normal blood pressure (278d+,1,1,2,1-9,10,8) (60) hypotensive (90/60 mm Hg) (GST292d,1,2,2,PO-2+,5,10) (63)blood pressure rapidly stabilized(GST292d,1,1,2,1,7-6,10) (65)blood pressure is normalized(GST297d,1,1,2,1-(10)2,8,11).
（ii）getting the values of existence(VE) of features
The whole VEs of features may be abtained through both (a) the feature classification through terminology or ontology, where usually subdivided into three subconcepts with higher, normal, or lower blood pressures in this case, and (b) the values of existence(VEs) of the features presented above in so called calculation attribute sets(CASs). Thus we can get an order relation of a triple-units of identification number-subconcept label-value of existence, like bellow:
where letter “h”, “n” and “l” represent higher, normal and lower blood pressures, respectively. And based on some rules set up in advance, we can get the following picture(Pic.5). Those rules include: (a) the points of features belonging to classes of higher, normal and lower blood pressures will have the positive, zero and negative values of Y axis, respectively; (b) The difference “+” of VE will make the point of the relative feature go up one unit at Y axis, oppositely, when the difference is “-”, the point will go down one unit. (c) the positions at X axis indicate the temporal order only and they are irrelevant to the length of temporal period.
（iii）the algorithm for SCP
The sundden-change-point(SCP) is identified by the value calculated from the standardized D/θ, where D is a significant change in VE at Y axis and θ, the short enough period during which that change occurred, both D and θ are valid under the clinical senses. In this case, the algorithm is
if |VSCP|>=δ, then SCP=1, else SCP=0.
Where VEn1-VEn2 and RTn1-RTn2 represent the differences of the VEs and the RTs between the neighboring features, respectively, and is a reasonable positive number defined under clinical significance.
Therefore, it is not very difficult to make the SCPs out and they are features (38), (47), (60) in this case. And now we come to analyzing the contexts around those SCPs.