Reproducibility of the STARD checklist: an instrument to assess the quality of
reporting of diagnostic accuracy studies

Smidt, N; Rutjes, A; Van Der Windt Da,; Ostelo, Rw; Bossuyt, Pm; Reitsma, Jb; Bouter
LM,; De Vet Hc,

doi:10.1186/1471-2288-6-12

BACKGROUND: In January 2003, STAndards for the Reporting of Diagnostic accuracy studies (STARD) were published in a number of journals, to improve the quality of reporting in diagnostic accuracy studies. We designed a study to investigate the inter-assessment reproducibility, and intra- and inter-observer reproducibility of the items in the STARD statement. METHODS: Thirty-two diagnostic accuracy studies published in 2000 in medical journals with an impact factor of at least 4 were included. Two reviewers independently evaluated the quality of reporting of these studies using the 25 items of the STARD statement. A consensus evaluation was obtained by discussing and resolving disagreements between reviewers. Almost two years later, the same studies were evaluated by the same reviewers. For each item, percentages agreement and Cohen's kappa between first and second consensus assessments (inter-assessment) were calculated. Intraclass Correlation coefficients (ICC) were calculated to evaluate its reliability. RESULTS: The overall inter-assessment agreement for all items of the STARD statement was 85% (Cohen's kappa 0.70) and varied from 63% to 100% for individual items. The largest differences between the two assessments were found for the reporting of the rationale of the reference standard (kappa 0.37), number of included participants that underwent tests (kappa 0.28), distribution of the severity of the disease (kappa 0.23), a cross tabulation of the results of the index test by the results of the reference standard (kappa 0.33) and how indeterminate results, missing data and outliers were handled (kappa 0.25). Within and between reviewers, also large differences were observed for these items. The inter-assessment reliability of the STARD checklist was satisfactory (ICC = 0.79 [95% CI: 0.62 to 0.89]). CONCLUSION: Although the overall reproducibility of the quality of reporting on diagnostic accuracy studies using the STARD statement was found to be good, substantial disagreements were found for specific items. These disagreements were not so much caused by differences in interpretation of the items by the reviewers but rather by difficulties in assessing the reporting of these items due to lack of clarity within the articles. Including a flow diagram in all reports on diagnostic accuracy studies would be very helpful in reducing confusion between readers and among reviewers.

Reproducibility of the STARD checklist: an instrument to assess the quality of reporting of diagnostic accuracy studies / Smidt, N., Rutjes, A., Van Der Windt, D.a., Ostelo, R.w., Bossuyt, P.m., Reitsma, J.b., Bouter LM, ., De Vet, H.c.. - In: BMC MEDICAL RESEARCH METHODOLOGY. - ISSN 1471-2288. - 6:(2006), pp. N/A-N/A. [10.1186/1471-2288-6-12]

Reproducibility of the STARD checklist: an instrument to assess the quality of reporting of diagnostic accuracy studies

Smidt N;Rutjes A;van der Windt DA;Ostelo RW;Bossuyt PM;Reitsma JB;Bouter LM;de Vet HC

2006

Abstract

BACKGROUND: In January 2003, STAndards for the Reporting of Diagnostic accuracy studies (STARD) were published in a number of journals, to improve the quality of reporting in diagnostic accuracy studies. We designed a study to investigate the inter-assessment reproducibility, and intra- and inter-observer reproducibility of the items in the STARD statement. METHODS: Thirty-two diagnostic accuracy studies published in 2000 in medical journals with an impact factor of at least 4 were included. Two reviewers independently evaluated the quality of reporting of these studies using the 25 items of the STARD statement. A consensus evaluation was obtained by discussing and resolving disagreements between reviewers. Almost two years later, the same studies were evaluated by the same reviewers. For each item, percentages agreement and Cohen's kappa between first and second consensus assessments (inter-assessment) were calculated. Intraclass Correlation coefficients (ICC) were calculated to evaluate its reliability. RESULTS: The overall inter-assessment agreement for all items of the STARD statement was 85% (Cohen's kappa 0.70) and varied from 63% to 100% for individual items. The largest differences between the two assessments were found for the reporting of the rationale of the reference standard (kappa 0.37), number of included participants that underwent tests (kappa 0.28), distribution of the severity of the disease (kappa 0.23), a cross tabulation of the results of the index test by the results of the reference standard (kappa 0.33) and how indeterminate results, missing data and outliers were handled (kappa 0.25). Within and between reviewers, also large differences were observed for these items. The inter-assessment reliability of the STARD checklist was satisfactory (ICC = 0.79 [95% CI: 0.62 to 0.89]). CONCLUSION: Although the overall reproducibility of the quality of reporting on diagnostic accuracy studies using the STARD statement was found to be good, substantial disagreements were found for specific items. These disagreements were not so much caused by differences in interpretation of the items by the reviewers but rather by difficulties in assessing the reporting of these items due to lack of clarity within the articles. Including a flow diagram in all reports on diagnostic accuracy studies would be very helpful in reducing confusion between readers and among reviewers.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2006
			
	Rivista
	
				BMC MEDICAL RESEARCH METHODOLOGY
			
	N° del Volume
	
				6
			
	Pagina iniziale
	
				N/A
			
	Pagina finale
	
				N/A
			
	Codice DOI
	
				https://dx.doi.org/10.1186/1471-2288-6-12
			
	Codice Scopus
	
				2-s2.0-33746426274
			
	Codice PubMed
	
				16539705
			
	Citazione
	
				Reproducibility of the STARD checklist: an instrument to assess the quality of
reporting of diagnostic accuracy studies / Smidt, N., Rutjes, A., Van Der Windt, D.a., Ostelo, R.w., Bossuyt, P.m., Reitsma, J.b., Bouter
LM, ., De Vet, H.c.. - In: BMC MEDICAL RESEARCH METHODOLOGY. - ISSN 1471-2288. - 6:(2006), pp. N/A-N/A. [10.1186/1471-2288-6-12]
			
	Tutti gli autori
	
						Smidt, N; Rutjes, A; Van Der Windt, Da; Ostelo, Rw; Bossuyt, Pm; Reitsma, Jb; Bouter
LM, ; De Vet, Hc
					
	Tipologia
	
				Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
2006_Smidt_Rutjes_BMC-STARD-Reprod.pdf Open access Tipologia: VOR - Versione pubblicata dall'editore Licenza: [IR] creative-commons Dimensione 296.89 kB Formato Adobe PDF Visualizza/Apri	296.89 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris