Indian Journal of Pathology and Microbiology

: 2008  |  Volume : 51  |  Issue : 1  |  Page : 22--25

Gleason scoring of prostatic carcinoma: Impact of a web-based tutorial on inter- and intra-observer variability

K Mulay, M Swain, S Jaiman, S Gowrishankar 
 Department of Anatomic Pathology, Apollo Hospitals, Jubilee Hills, Hyderabad, Andhra Pradesh, India

Correspondence Address:
S Gowrishankar
Department of Anatomic Pathology, Apollo Hospitals, Jubilee Hills, Hyderabad - 500 033, Andhra Pradesh


A total of 40 cases of prostatic adenocarcinomas were scored independently by four pathologists using the Gleason scoring system. After attending a web-based tutorial, the scoring was repeated by all. Consensus scores were obtained by simultaneous viewing of each case in a multihead microscope by all four pathologists. The scores were then compared. The pretutorial kappa (κ) values ranged from 0.36 to 0.64 with an average of 0.459. After the tutorial, the κ values ranged from 0.44 to 0.678 with the average κ value increasing to 0.538, thus indicating an improvement in the agreement. The intraobserver agreement ranged from 0.435 to 0.788. We conclude that web-based tutorials with emphasis on images developed by experts serve to achieve an uniformity in reporting.

How to cite this article:
Mulay K, Swain M, Jaiman S, Gowrishankar S. Gleason scoring of prostatic carcinoma: Impact of a web-based tutorial on inter- and intra-observer variability.Indian J Pathol Microbiol 2008;51:22-25

How to cite this URL:
Mulay K, Swain M, Jaiman S, Gowrishankar S. Gleason scoring of prostatic carcinoma: Impact of a web-based tutorial on inter- and intra-observer variability. Indian J Pathol Microbiol [serial online] 2008 [cited 2022 Jul 4 ];51:22-25
Available from:

Full Text


Gleason scoring system is the most widely used and officially recommended system for scoring prostatic adenocarcinoma. [1],[2] Stratification of Gleason scores into groups is also accepted as this has a bearing on prognosis and treatment. [3] Though the Gleason scoring is easily learned, interobserver variability in scoring exists. Prof. Gleason has himself said, "I have duplicated my exact previous histologic scores approximately 50% of the times and within 1 of the histologic score (range, 2-10) approximately 85% of the time". [2] The study was, therefore, undertaken to evaluate the inter- and intra-observer variation in Gleason scoring in a group of general surgical pathologists, and also study the impact of a web-based tutorial on this variability.

 Materials and Methods

A total of 40 cases of prostatic adenocarcinomas were randomly selected from the archives of the anatomical pathology section of the department. These included 29 transrectal ultrasound (TRUS) guided needle biopsies, eight trans urethral resection of prostate (TURPs) and, three cases with both TRUS biopsies and TURP. The hematoxylin and eosin (H&E) stained sections were circulated amongst the four pathologists, three of whom were general surgical pathologists with experience varying from 4 to 15 years. One was a resident pathologist in training. All the cases were scored independently using the Gleason scoring system, giving a major and minor score to each case. A web-based tutorial was then attended by all. This tutorial on Gleason scoring system is available on the website http// It comprised of a pretutorial test of 20 out of 38 consensus cases. The tutorial consisted of images and written material following which there was a posttutorial test of the same previous 20 cases. The results of the pre- and post-tutorial tests were then displayed with the facility to review the images if required. Then, the same 40 cases in a changed order were recirculated and scored again by the four pathologists. Later a consensus score was arrived at in each of the 40 cases by simultaneous viewing of all sections under a multihead microscope by all four pathologists.

The scores were stratified into four groups - group 1 (scores 2-4), group 2 (scores 5-6), group 3 (score 7), group 4 (scores 8-10). This grouping has been recommended in earlier studies [3] and in the web-based tutorial. The inter- and intra-observer variations were analyzed using the SAS/STAT software and the kappa (κ) value of variance obtained.


P1-P4 were the four pathologists. [Table 1] gives the κ value for the four, compared with each other before and after the tutorial. The κ values were calculated using the SAS/STAT software and were interpreted as score 0 indicating no agreement, scores 0.2, 0.2-0.4, 0.41-0.6, 0.61-0.8, and 0.81-1 indicating poor, slight, fair, moderate, substantial, and almost perfect agreement, respectively. The pretutorial scores ranged from 0.328 to 0.571 indicating a fair agreement in all except one, with the average score of 0.511 being in the fair agreement category. The κ value for the posttutorial scores ranged from 0.418 to 0.611, all in the fair agreement category. There was a slight improvement indicating better agreement, though this was probably not significant.

[Table 2] gives the comparison of the scores of the four pathologists with the consensus scores (as arrived at by viewing under the multihead microscope). The pretutorial values ranged from 0.360 to 0.646 with an average value of 0.459 and the posttutorial from 0.461 to 0.678 with an average of 0.538. The increment in the κ value was 0.079. The third column in [Table 2] gives comparison of the scores for each of the four pathologists (P1-P4) of the pretutorial score with the posttutorial score.

Except for P1 with the κ value in the fair agreement range, all others had a moderate intraobserver agreement.

[Table 3] gives the percentage agreement values in the various groups pre- and post-tutorial. In groups II and III, group agreement, the exact score agreement, and score agreement when 1 scores were taken, increased in the posttutorial assessment. In group IV, however, the percentage values diminished slightly after the tutorial. Overall, the agreement values increased after the tutorial.

The analysis of the overgrading and undergrading is given in [Table 4]. 12.5% of group 2 (scores 5 and 6), and 15% of group 3 (score 7) were overgraded in the pretutorial round. There was a decrease in the percentage of both these to 10.6 and 11.25%, respectively, in the posttutorial round. The percentage of undergrading was low in both rounds. In the pretutorial round, it was 0, 5.63 and 4.37% in groups 2, 3, and 4, respectively. A marginal increase in these percentages was observed in the posttutorial round to 1.25, 6.87, and 5% in groups 2, 3, and 4, respectively.


While Gleason's scoring is an internationally accepted method of grading prostate cancer, there is no mention of reproducibility of Gleason grading in early studies by Gleason and the Veterans administration cooperative urologic research group (VACURG) [6] or in a subsequent review. [2]

One of the problems in analyzing the interobserver variation is the establishment of the correct diagnosis. The ideal true diagnosis would be the one arrived at by the expert pathologists in the speciality. [3] In the absence of expert urologic pathologists in centers in developing countries such as India where most surgical pathologists are expected to be general pathologists, we resorted to a consensus score for the comparison.

[Table 5] shows the comparison of our data with similar studies done elsewhere.

We have followed the protocol of stratification of the cases into four groups as prescribed in the tutorial as that has an impact on the prognosis and therapy. [3] The average κ value in our study, pretutorial was 0.459 indicating a moderate agreement and this has ranged from 0.16 to 0.836 in various studies. [3],[4],[7],[8],[13],[14],[15] The agreement values ranged from a value as low as 36% [13] to a value as high as 70.8%. [16] When score of 1 was considered, the agreement has increased to 94%. [14] In our study, the exact score agreement pretutorial was 45.6 and 87% when score of 1 was taken. Both these values of agreement increased in the posttutorial evaluation to 51.4 and 90.6%, respectively.

Standard textbooks of pathology have only a few images of the various Gleason grades, and to the average surgical pathologist, these images and the accompanying literature are not sufficient to bring out the finer points. A study of the same web-based tutorial has reported a significant improvement in the scoring of 15/20 images. [4],[16] This tutorial has re-emphasized the following points in prostate cancer grading, and on analysis we also felt that these were the reasons for the discrepant scores:

Irregular acini or microacini infiltrating between benign glands indicates pattern 3.A small-circumscribed focus of neoplastic glands does not mean a low-grade tumor, but usually means a small sample of a higher-grade tumor.In pattern 3, one can mentally draw a line around each acinus.Fused glands with ragged outlines indicates pattern 4.Cribriform pattern larger than the size of a normal acinus is pattern 4 while a cribriform pattern smaller or equal to the size of an acinus is pattern 3.Comedonecrosis indicates pattern 5.Intracytoplasmic vacuolation indicating pattern 5 needs to be differentiated from true tubular lumina differentiation.Minor components should not be incorporated in a score when Areas of crush artifact not to be included in the score. If entire area crushed, it is better to defer the score to a repeat biopsy.

An appreciation of these points has led to an improvement in the posttutorial score with the κ value being 0.538, an increment of 0.79. In a study of the impact of education on the accuracy of Gleason grading by Mc kami et al. , [5] the κ values had risen by a significant value of 0.22 and 0.27 in two groups, the educational material in this study consisting of a 40-min lecture in one group and a histopathology atlas in the second. The group of pathologists in that study has a larger and a more heterogeneous one and that could be the reason for the greater impact of the education compared to the present study which comprised a smaller group of pathologist from the same institute. It is significant that in our study, the maximum increase in score was in P3, the pathologist in training.

The most common and important misinterpretation is in undergrading and overgrading of scores 5-6 and score 7. [7],[8],[15] [Table 3] gives the rates of underscoring and overscoring in various studies. We overgraded 12.5% cases of scores 5-6 and 15% cases of score 7 in the pretutorial round. This was reduced to 10.6 and 11.25%, respectively, in the posttutorial round.

Undergrading did not appear to be a significant problem in our study because we followed Epstein's recommendation that tumors diagnosed on a needle biopsy are seldom of low grade. [17] Scores 2-4 diagnosed by needle biopsies is often a high-grade tumor on subsequent prostatectomy.

The value of the web in imparting education has been described in other studies, ranging from teaching of technical skills to clinical management skills and even in surgical training. [10],[18],[19],[20] Recently, a web-based tutorial similar to the one in our study has been used to successfully introduce the Bethesda system of reporting of cervical smears in Mainland Chain. [21] For the general surgical pathologist who has to be aware of the finer points in classification and grading of lesions, both neoplastic and nonneoplastic, we feel that such web-based tutorials, which are freely available, developed by experts with emphasis on images will greatly improve the understanding and the uniformity of reporting.


This study shows that the degree of agreement in the Gleason scoring system between pathologists is moderate and is comparable with that in other studies.

There is scope for improvement in this and a web-based tutorial with emphasis on images and on the finer points in the differentiation between grades, would serve this purpose. As the science and art of surgical pathology today is still intensely image-based and as subspecialty pathologists are not available in most of the developing countries, we would strongly recommended that web-based programs be developed by experts which are accessible by pathologists around the world for this will help to achieve an uniformity of reporting.


We acknowledge the help of Mr. Prasanna Krishna, Statistician, National Institute of Nutrition, Hyderabad, India in the statistical analysis of the data.


1Allsbrook WC Jr, Mangold KA, Yang X. The Gleason grading system: An overview. J Urol Pathol 1998;10:141-57.
2Gleason DF. Histologic grading of prostate cancer: A perspective. Hum Pathol 1992;23:273-9.
3Allsbrook WC Jr, Mangold KA, Johnson MH, Lane RB, Lane CG, Epstein JI. Interobserver reproducibility of Gleason grading of prostatic carcinoma: general pathologists. Hum Pathol 2001;32:81-8.
4Kronz JD, Silberman MA, Allsbrook WC Jr, Bastacky SI, Burks RT, Cina SJ et al . Pathology residents use of a web-based tutorial to improve Gleason grading of prostate carcinoma. Hum Pathol 2000;32:1044-50.
5Mc Kami Y, Manabe T, Epstein JI, Shirazi T, Furusato M, Tsuzuk T et al . Accuracy of Gleason grading by practicing pathologists and impact of education on improving agreement. Hum Pathol 2003;34:658-65.
6Gleason DF, Mellinger GT and The Veterans Administration Cooperative Urological Research Group. Prediction of prognosis for prostatic adenocarcinoma by combined histologic grading and clinical staging. J Urol 1974;111:58-64.
7Bain GO, Koch M, Hanson J. Feasibility of grading carcinomas. Arch Pathol Lab Med 1982;106:265-7.
8ten Kate FJW, Gallee MPW, Schmitz PIM. Problems in grading of prostatic carcinoma: interobserver reproducibility of five different grading systems. World J Urol 1986;4:147-52.
9Rousselet MC, Saint-Andre JP, Six P, Soret JY. Reproductibilite et valeur pronostique des grades histogiques de Gleason et de Gaeta dans les carcinomes de la prostate. Ann Urol 1986;20:317-22.
10de las Morenas A, Siroky MB, Merriam J, Stilmant MM. Prostatic adenocarcinoma: reproducibility and correlation with clinical stages of four grading systems. Hum Pathol 1991;19:595-7.
11di Loreto C, Fitzpatrick B, Underhill S, Kim DH, Dytch HE, Galera- Davidson H. Correlation between visual clues, objective histologic features, and interobserver agreement in prostate cancer. Am J Clin Pathol 1991;96:70-5.
12Ozdamar SO, Sarikaya S, Yildiz L, Atilla MK, Kandemir B, Yildiz S. Intraobserver and interobserver reproducibility of WHO and Gleason histologic grading systems in prostatic adenocarcinoma. Int Urol Nephrol 1996;28:73-7.
13Mc Lean M, Srigley J, Banerjee D, Warde H, Hao Y. Interobserver variation in prostate cancer scoring: are there implications for the design of clinical trials and treatment strategies? Clin Oncol 1997; 9:222-5.
14Lessels AM, Burnett RA, Howatson SR, Lang S, Lee FD, Mclaren KM et al . Observer variability in the histopathological reporting of needle biopsy specimens of the prostate. Hum Pathol 1997;28:646-9.
15Sranholm H, Mygind H. Prostatic carcinoma: reproducibility of histologic grading. Acta Path Microbiol Immunol Scand 1985;93:67-71.
16Kronz JD, Silberman MA, Allsbrook WC Jr, Epstein JI. A web-based tutorial improves practicing pathologists' Gleason grading of images of prostate carcinoma specimens obtained by needle biopsy: validation of a new medical education paradigm. Cancer 2000;89:1818-23.
17Epstein JI. Gleason score 2-4 adenocarcinoma of the prostate on needle biopsy: a diagnosis that should not be made. Am J Surg Pathol 2000;24:477-8.
18Erickson RA, Chang A, Johnson CE, Gruppen LD. Lecture versus web tutorial for pharmacy students learning of MDI technique. Ann Pharmacother 2003;37:500-05.
19Barnes K, Itzkowitz S, Brown K. Teaching clinical management skills for genetic testing of hereditary nonpolyposis. Genet Med 2003;5:43-8.
20Li Y, Brodlie K, Phillips N. Web-based VR training simulator for percutaneous rhizotomy. Stud Heath Technol Inform 2000;70:175-81.
21Yuan Q, Chang AR, Ng HK. Introduction of the Bethesda system to mainland China with a web-based tutorial. Acta Cytol 2003;47;415-20.