The recently published practice parameter on quantitative sensory
testing (QST) contains useful information but further tweaking with
respect to the primary question asked, the descriptive information
provided, and its claim of being evidence based might improve it! Before
discussing perceived shortcomings, two disclaimers: 1) we were involved in
the
development of CASE IV and receive royalty from its sale (see below) and
2) we assume no lack of competence or excessive partiality toward any
position of the co-signers or of the oversight committees of the AAN.
The Committee's question was, What is the utility of quantitative
sensation testing (QST) in diagnosing specific neurologic diseases? Surely
this is not the correct question. The question should be, How sensitive,
specific, meaningful and useful is QST for the characterization and
quantitation of sensation loss or abnormality (by modality) in health and
in disease?
The need to address a somewhat different question to that asked by
the Committee is suggested by the observation that essentially no test
results by themselves should be considered diagnostic of a medical
condition. All abnormal test results must by interpreted taking into
account the history, findings, previous treatments and other test results.
The neurologist who sees a patient with polyneuropathy and finds a raised
fasting plasma glucose result cannot simply diagnose diabetic
polyneuropathy. Although the raised fasting plasma glucose may be due to
diabetes mellitus, it may also be due to not having fasted, a metabolic
disorder other than diabetes mellitus, e.g., liver disease, a medication
effect, or a technical mistake in the laboratory. Assuming that the raised
plasma glucose is due to diabetes mellitus, the physician still cannot
diagnose diabetic neuropathy from the plasma glucose alone, it is
necessary to establish (by history, examination, other tests and medical
judgement) that the neuropathy is not due to another cause but that it is
due to diabetes mellitus. We have reported that approximately five percent
of neurologic findings among diabetics are due to another cause. [1] A
second example is the patient with typical electrophysiologic findings of
median neuropathy at the wrist (carpal tunnel syndrome) still persisting
years after having undergone successful section of the carpal ligament
with relief of symptoms. The physician makes the judgement that the test
results can be set aside as not being diagnostic of CTS since the patient
does not have symptoms and findings of CTS and has been successfully
treated. Another example is the case of a patient, a plumber whose
elevated plasma level of lead (from use of white lead) was not the cause
of his neuropathy-it was later established that undiscovered
hypothyroidism was its cause and the neuropathy improved rapidly with
thyroid replacement. [2] Physicians, not tests, make diagnoses based on
medical history, physical examination, test results, and clinical
judgment.
There is the further point not adequately expressed in the practice
parameter that a test does not need to be diagnostic to be clinically
useful. As we have shown previously, many evaluations may be needed to
characterize the symptoms and impairments and underlying pathophysiology
of a disease such as a specific case of peripheral neuropathy. It may be
possible to correctly surmise the correct diagnosis from a very cursory
examination with few tests but a more intensive examination and more tests
are needed to be certain that the diagnosis is correct. [3]
Many tests, for example, nerve conduction and EMG, are useful for
this purpose. The use of visual tests by ophthalmologists and neurologists
and of hearing tests by
otolaryngologists in diagnosing disease are instructive for how QST may be
used in neurologic practice. QST, using psychophysical approaches to
assess various thresholds of modalities of sensation and stimulus response
characteristics, is the counterpart of assessing vision and hearing by
psychophysical approaches. It would be inappropriate to judge the value of
tests of vision only by their ability to diagnose specific varieties of
eye disease (corneal dystrophy, cataract, vitreous hemorrhage or
retinopathy) all of which conditions can cause decreased vision. Likewise,
although audiometric tests may be useful in localizing abnormalities to
end organ, auditory nerve or brain, it would be wrong to judge its value
solely by its ability to diagnose specific diseases.
Therefore, the appropriate questions to ask of QST is its utility to
detect, quantitate, or characterize sensation loss or abnormality given
that such abnormality is present. The question then is, Does QST identify
selective modalities of sensation loss, pan-modality sensation loss or
altered stimulus-response characteristics (e.g., hyperalgesia) which
information can then be used by neurologists to identify populations of
receptors or neurons which are dysfunctional and used for
characterization, diagnosis, following course, and recognition of a
therapeutic effect?
The evidence that QST is useful for these purposes has probably
already been demonstrated. In a series of papers with E. H. Lambert, we
showed that a selective loss of a modality of sensation was correlated
with a selective decrease in the amplitude of a peak of the compound
action potential of a biopsied sural nerve in vitro and also with loss or
abnormality of a class of sensory fibers in diameter histograms of a
biopsied nerve. [4] Thus in spinocerebellar degeneration, selective loss
of touch-pressure and vibration sensation was associated with degeneration
of alpha-beta sensory fibers as shown by a decreased amplitude of this alpha-beta peak
in the compound action potential in vitro and by a reduced number of large
diameter fibers in diameter histograms of transverse sections of nerve. In
other studies (certain patients with amyloidosis, varieties of hereditary
sensory and autonomic neuropathies, Fabry and Tangier disease) raised
thresholds of heat-pain sensation and normal thresholds of
mechanoreception had their basis in a
selective involvement of unmyelinated and small myelinated sensory fibers
as revealed by the characteristic abnormalities of the compound action
potential in vitro and by morphometric studies. In still other cases,
involvement was broadly distributed among different classes of fibers. The
correctness of the forgoing correlations were under girded by the
pioneering studies of Gasser and Erlanger [5] and by electrophysiologic
studies of sensory units. [6]
The correct question, therefore, should relate to the utility of QST
to characterize sensation with development and maturation and with aging
in health (among modalities,
different sites, and with age and anthropomorphic features) and with
alterations due to disease. Since it is sensation, a psychophysical
experience, which is to be assessed, it must be judged by how well
sensation itself is being assessed, not by how well it correlates with
surrogate measures or by detection of disease, albeit also of interest.
Other measures, for example, sensory nerve conduction, somatosensory
evoked potentials, and even pathologic and morphometric studies of nerve
may also be assessed; but they are surrogate measures of sensation, not
sensation itself.
We recognize that despite the fact that quantitative sensation tests
are used as only a part of the diagnostic process, the statistical
entities of sensitivity and specificity are two (among many) useful
attributes of a test. Thus, we accept that studies that provide estimates
of these types of attributes are important. Is the report correct that
there is no class 1 evidence bearing on this question? We suggest that
there is such evidence. In the Rochester Diabetic Neuropathy Study (RDNS),
we proposed studies that were reviewed by intramural and extramural (NIH)
review committees, were approved, and were prospectively performed. A
major objective was to determine the sensitivity, specificity,
representativeness, reproducibility and meaningfulness of quantitative
sensation test results, attributes of nerve conduction, autonomic tests,
summated composite scores of selected items of the preceding items and
various clinical measures of symptoms and impairment so that the best
measures might be used to detect, characterize and follow course of
neuropathy and neuropathic outcomes. A large subject cohort of more than
500 persons without diabetes mellitus or neurologic disease agreed to
participate and normal percentile values (and normal deviates) were
estimated so that results in patients could be expressed as percentiles
and normal deviates specific for modality, site, age, sex and applicable
anthropomorphic variables. All known diabetic patients in Rochester, MN
were contacted and invited to participate in cross-sectional and
longitudinal studies employing standard tests. The consenting diabetic
persons less than 70 years old were not significantly different from the
group who did not agree to participate by the criteria of co-morbidity.
All neuropathic evaluations and tests were predetermined and performed
using standard baseline and testing conditions. QST was done using
predetermined algorithms and independent of other neuropathic evaluations.
The results have been extensively published. [7, 8, 9] To illustrate, the
intraclass correlation coefficients testing reproducibility of vibration
and cooling detection threshold using CASE IV
and the 4, 2 and 1 stepping algorithm with null stimuli were better than
0.9 and comparable to what we found assessing the best attributes of nerve
conduction. [10] VDT was sufficiently reproducible to show a monotone and
significant worsening over long periods of time. [11]
Our published data suggests that some attributes (or preferably
summated normal deviate scores of nerve conduction) are more sensitive in
diagnosing diabetic polyneuropathy than are QST results. But as we have
also suggested, although nerve conductions appear to be more sensitive,
QST results tend to be more meaningful - they actually reflect sensation
loss or hyperalgesia! There is an additional difference in what should be
inferred from QST versus nerve conduction measures. The integrity of the
entire sensory apparatus (the receptor, nerve and central nervous system
tracts, and cerebral recognition and interpretation) is tested with QST.
By contrast nerve conduction provides information only about the cable
transmission properties of a class of sensory fibers (essentially large
myelinated fibers) from the point of stimulation to a more defined
proximal point. Also, nerve conduction measurements may include fibers,
which although contributing to the action potential, are already
disconnected from their receptors. Importantly also QST allows detection
of abnormality of functional classes of sensory fibers, which are not
tested by, nerve conduction tests. Small fibers are usually not tested by
nerve conduction tests.
Laboratory QST, as compared to clinical bedside testing of sensation,
allows standardization of all aspects of testing: the environment, the
initial load, use of precisely shaped and defined invariant waveforms and
given over a broad range of known stimulus magnitudes, null stimuli and
use of sophisticated and standard computer algorithms for testing and
finding threshold. Results can then be compared to values obtained from
healthy subjects tested by exactly the same approaches so that responses
can be expressed as percentiles specific for modality, anatomical site,
age, sex and applicable anthropomorphic variables (e.g., height, weight
and body mass index). Because clinical evaluation of symptoms and
impairments, nerve conduction tests, QST and quantitative autonomic tests
provide independent and useful information, which allows for comparison
and validation, we advocate use of several of these measurements in
composite scores. [12] This makes sense since polyneuropathy is the sum of
symptoms, impairments and functional alterations of various classes of
fibers.
The comments about algorithms of testing and finding threshold also
need tweaking. It is reported that these algorithms can be divided into
the method of limits and the method of steps (referred to as forced
choice). This summary is too simple and, at least in part, a little askew.
In characterizing algorithms of testing and finding threshold, one needs
to consider many aspects of the system, the testing procedures and how
threshold is to be estimated and validated. True, algorithms may employ
stimuli which increase (or decrease) continuously (linear, exponential or
other) or in steps (by linear, exponential or other), but this is barely a
beginning in describing algorithms. It is quite inaccurate to characterize
algorithms by these limited criteria. Algorithms vary depending on the
preconditions of testing, the environment, and the instructions given,
whether static loads or preconditions are used and what they are, the
stimulus waveform magnitude and presentation, whether ramps or steps will
be used, whether null stimuli will be given, the specific rules of
testing, the specific rules of determining threshold, the number of
repetitions and turnarounds, and how results will be presented and
compared to normal values. In laboratory testing of QST, we suggest that
all of these steps need to be documented, standard, the same at all sites
and times, and the same for controls and patients. The specific rules of
the algorithm of testing and finding threshold may spell the difference
between a valid or an invalid algorithm of testing. To illustrate, we
found that too low a number of turnarounds (from sensitivity to
insensitivity and vice versa) in the 4, 2 and 1 algorithm with null
stimuli might in itself produce spurious results. Difference in results
should be due to patients' response differences and not to differences of
testing procedures. To clarify, using step testing in an algorithm is not
equivalent to forced choice testing! In two alternative forced-choice
testing stimulus events are always given in pairs with one being the
stimulus and the other the null stimulus. For each pair of stimulus
events, indicated, for example, by the display of the numbers 1 and then
of 2, a stimulus event is given by chance in the 1 or 2 interval. The null
stimulus is given in the other interval. The subject or patient then has
to choose whether the stimulus was felt in period 1 or 2. The patient is
forced to choose (forced choice) the interval 1 or 2, which is the most
likely to have contained the stimulus. Using this method, response
criteria should be the same among subjects. Even with forced choice
testing the specific rules of testing may be different among algorithms.
Therefore it is necessary to use only highly characterized and validated
algorithms and not to assume that a standard and validated procedure is
being followed simply because the authors state that they used the 'method
of limits' or the 'method of forced choice' - without exact specification
these terms are almost meaningless.
We also want to comment on the statement about reproducibility among
test results. One of the possible reasons for differences in results among
different QST systems is use of different expressions of stimulus
magnitude, i.e.; absolute measured values versus use of just noticeable
difference units (JND). Whereas it is attractive to use absolute units of
displacement, for example, µm of displacement superimposed on a standard
load or as deltaoC (as we do in CASE 4), care must be taken how these values are
interpreted and analyzed.
It has been known for a long time that sensation does not increase
linearly but increases as an exponential function. [13] This has been
extensively studied for vision and hearing. Therefore, it is better to
express results as just noticeable difference. For a variety of reasons,
it is even better to express results as percentiles (relative to a healthy
population) and do the statistical analyses using normal deviate values.
Secondly, it is well known among statisticians that reproducibility
should be concerned with agreement, not correlation. The use of the usual
(product-moment) correlation should not be used for this purpose. The
intraclass correlation should be used. If one first transforms to ranks,
this will facilitate comparisons of correlation with other QST results
using other systems and with other nerve test results.
We note that several multicenter trials have already been done or are
being done employing centralized calibration of stimuli, standard
algorithms of testing and finding
threshold, central quality control and QST approaches are proving to be
useful in the conduct of controlled trials.
Finally we want to reiterate that we think that QST, especially laboratory
based QST, is useful for the characterization and quantification of
alterations of sensation in health and in disease and increasingly will be
used for epidemiologic surveys and controlled clinical trials. Use of QST
appears also to be especially useful in detecting thermal hyperalgesia -
evidence of sensory receptor and fiber dysfunction. It is now possible to
use stimuli, which are defined and accurately calibrated over a broad
range of stimulus magnitudes, pre-programmed and validated algorithms of
testing using null stimuli, and expression of results as percentiles and
normal deviates allowing comparison of results from different systems.
Further development of laboratory based systems should be encouraged.
References:
1. 1. Dyck PJ, Kratz KM, Karnes JL, et al. The prevalence by staged
severity of various types of diabetic neuropathy, retinopathy, and
nephropathy in a population-based cohort: The Rochester Diabetic
Neuropathy Study. Neurology 1993;43:817-824.
2. Dyck PJ, Lambert EH. Polyneuropathy associated with
hypothyroidism. J.Neuropathol.Exp.Neurol. 1970;29:631-658.
3. Suarez GA, Chalk CH, Russell JW, et al. Diagnostic accuracy and
certainty from sequential evaluations in peripheral neuropathy. Neurology
2001;56:1118-1120.
4. Lambert EH, Dyck PJ. Compound action potentials of sural nerve in
vitro in peripheral neuropathy. In: Dyck PJ, Thomas PK, Griffin JW, Low
PA, Poduslo JF, eds. Peripheral
Neuropathy, 3rd ed. Vol. Philadelphia: W. B. Saunders, 1993:672-684.
5. Erlanger J, Gasser HS. Electrical Signs of Nervous Activity.
Philadelphia: University of Pennsylvania Press, 1933.
6. Light AR, Perl ER. Peripheral sensory systems. In: Dyck PJ, Thomas
PK, Griffin JW,Low PA, Poduslo JF, eds. Peripheral Neuropathy, 3rd ed.
Vol. Philadelphia: W. B.
Saunders Company, 1993:149-165.
7. Dyck PJ, Bushek W, Spring EM, et al. Vibratory and cooling
detection thresholds compared with other tests in diagnosing and staging
diabetic neuropathy. Diabetes Care
1987;10:432-440.
8. Dyck PJ. Detection, characterization, and staging of
polyneuropathy: assessed in diabetics. Muscle Nerve 1988;11:21-32.
9. Dyck PJ, Dyck PJB, Velosa JA, Larson TS, O'Brien PC. Patterns of
quantitative sensation testing of hypoesthesia and hyperalgesia are
predictive of diabetic
polyneuropathy. A study of three cohorts. Diabetes Care 2000;23:510-517.
10. Dyck PJ, Kratz KM, Lehman KA, et al. The Rochester Diabetic
Neuropathy Study: design, criteria for types of neuropathy, selection
bias, and reproducibility of neuropathic tests. Neurology 1991;41:799-807.
11. Dyck PJ, Davies JL, Litchy WJ, O'Brien PC. Longitudinal
assessment of diabetic polyneuropathy using a composite score in the
Rochester Diabetic Neuropathy Study
cohort. Neurology 1997;49:229-239.
12. Dyck PJ. Assessment: thermography in neurologic practice.
Neurology 1990;40:523-525.
13. Stevens SS. Neural events and the psychophysical law. Science
1970;170(962):1043-1050.