Introduction
A number of researchers have considered the benefits and limitations of Computer Aided Learning (CAL) and its effects on the educational community. CAL has been compared with the traditional human teacher based methods to investigate its effectiveness and has been observed to perform better in many applications (Kaplan & Rock, 1995). In some of the other cases, the low performance of CAL can be attributed to poor interface design (Hazari & Reaves, 1994; Wong, 1994), less flexibility than human teachers or poor and inappropriate evaluation of CAL packages (Murray, 1993; Shute & Regian, 1993; Duncan, 1993; Alexander & Hedberg, 1994). It is therefore important to develop benchmarks for assessing the suitability of CAL packages in the actual learning environment.
Heller (1991) noted that instructional software, like all other educational material, should be evaluated before it is used in the classroom or research laboratory. The challenge is to decide what to evaluate, who should carry out the process and how it should be carried out. The literature suggests that the evaluation of a tutoring system needs to be carried out in two stages (Wyatt & Spiegelhalter, 1990; Murray, 1993; Legree et. al., 1995). Initially, the system should be evaluated for its overall effectiveness and usability. Such evaluations play an important role of informing the subsequent modifications of procedures and interface design. When a system meets the objectives of the initial evaluation stage, the efficacy of its components should be determined in the real environment. This paper presents some of the analysis and findings of a multi-institutional evaluation study to investigate the efficacy of intelligent tutoring systems designed for numeric disciplines. The study was conducted as a part of formative evaluations carried out towards the end of software development cycle.
The evaluation of Intelligent Tutoring Systems in numeric disciplines has not received much attention in the literature. Although there are some instances of small-scale evaluations that have been completed within a single institution, little work has been reported on large-scale evaluations conducted across several institutions. This paper is concerned with the findings of research involving multi-institutional evaluation of the effectiveness of tutoring packages as an alternative to the human-led tutorials. It employs a quantitative approach, in the main, as favoured by various researchers (for example, Legree et. al., 1993; Murray, 1993; Mark & Greer, 1993) for initial investigations, although the subjective views of the students towards the functionality and effectiveness of these packages have also been recorded. The evaluation is based on three packages used for teaching different techniques in management accounting. Although the evaluation studies were conducted under the laboratory-based control-testing conditions and may not provide a fully accurate picture of how the students would behave in real teaching environment, the multi-institutional nature of this study brings it near to a field trial carried out with the sample size exceeding that required for a power level of 0.95 (Altman, 1991). This is sufficient to enable drawing of firm conclusions about the efficacy of the tutoring system, at least within the scope of the testing conditions. In addition, an independent study by Stoner & Harvey (1999), described later in this paper, validates the effectiveness of tutoring packages in real environment.
Byzantium model of CILE and Intelligent Tutoring Tools
Although computers are being used at all levels of the curriculum, introductory topics are becoming more popular for the use of CAL. An explanation of this fact may be the simple and relatively discrete nature of the concepts acquired at the introductory level that are inter-linked at later stages of studies to solve more complex problems. To recognise that students construct knowledge of different degrees of complexity at different stages of their learning, a model of Computer Integrated Learning Environments (CILE) was formulated by a consortium of six universities under the Byzantium project, funded through the Teaching and Learning Technology Programme (TLTP) of the Higher Education Funding Councils of United Kingdom. This model, which proposes that the level at which a discipline is taught and learnt provides a vital context for tutoring software design, divides the learning of the subject discipline into three distinct knowledge levels:
At the introductory application level, a student forms mental maps of various conceptual objects, each consisting of a small network of interrelated conceptual atoms, and learns how to use the basic tools of a subject discipline. The basic tutoring package or an Intelligent Tutoring Tool (ITT) is designed to suit this level.
At the advanced application level, the vertical and horizontal integration of conceptual objects takes place. Vertical integration involves a comparison of the results of multiple use of the same tool, e.g. by comparing the Net Present Value of three projects. Horizontal integration employs multiple tools to solve a given problem, e.g. using the Budgeting, Absorption Costing and Job Costing ITTs to calculate a job cost. The individual ITTs can be used for various sub-tasks but an intelligent application providing a suitable interface for (i) holding and comparing the results of multiple instances of an ITT and (ii) linking various ITTs will be able to guide a student through a more complex task.
The actual application approximation level attempts to simulate a simplification of real world problems. Here the students learn how to account for behavioural and environmental factors. Tutoring software at this level requires the ability to handle qualitative, probabilistic and imprecise data.
The current research output is focused on the development of the first level packages and their evaluation. It is recognised, however, that the on-going developments in the fields of the Internet, fuzzy logic and natural language processing may greatly assist developments at subsequent development stages by respectively providing: (i) an infra-structure for distributing development efforts but also for linking the outputs of such distributed efforts (Patel & Kinshuk, 1997); (ii) the processing of imprecise and possibly qualitative data; and (iii) a more natural student-computer interaction interface that removes much of the effort in encoding data to suit computer processing and thus lifts current limitations on the range of activities that can be performed on a computer with ease.
The Intelligent Tutoring Tools (ITTs) are aimed at extending a lecturer's scope by horizontally partitioning some of the teaching activities, e.g. supervising the development of operational skills, and assigning them to a tutoring package. Although the accounting domain has been used to develop these ITTs, the structure of an ITT is considerably domain independentand the same structure can be used for any numeric discipline. The structure and use of the ITTs have been discussed in Patel & Kinshuk (1996a and 1996b) and Patel, Kinshuk & Russell (2000) respectively.
Evaluation of ITTs
The evaluation stage of the ITT design commenced in May 1995, when the students at one university in United Kingdom studied Capital Investment Appraisal in a two-groups parallel trial. The Control group had classroom-based tutorials led by an experienced teacher, whereas the CAL group was exposed to the CAL package in computer laboratory based tutorials. Group comparison with the help of pre and post tests provided the initial validation of the effectiveness of the ITT, whereas the observations and subjective questionnaire feedback from CAL group validated the interface design adopted in the ITT. The study also provided a validation of the measurement techniques and questionnaire design adopted. A Phase II study was carried out subsequent to incorporating some design changes as a result of the Phase I study. It was conducted at six UK institutions and utilised three CAL packages. Capital Investment Appraisal, Absorption Costing and Marginal Costing packages were used by different groups of students for this purpose. A two group study was organised at two universities on the Capital Investment Appraisal package. At other institutions, the testing of all three ITTs took place on a random sample of about 40 students, as it was not feasible to test all students at other institutions.
Since the aim of the evaluation in this study was to examine the overall effectiveness of the tutoring packages mainly quantitative methods are used, as suggested by various researchers (Legree et. al., 1993; Murray, 1993; Mark & Greer, 1993), although qualitative views were also obtained from comments recorded during student observation and through subjective questionnaire. Subject-based evaluation methods were used in the study, as they are widely employed and favoured for the evaluation of CAL packages (Daroca, 1986; Simpson, 1986; Gallagher & Letza, 1991; Tonge et. al., 1994; Iqbal et. al, 1999). These are based directly on the user's judgement and the process of data collection is facilitated under laboratory conditions with less chances of bias. Two-group trial studies, the most common technique for evaluation of CAL packages (Webb et. al., 1991; Simons & De Jong, 1992; Wang & Sleeman, 1993; Ruf et. al., 1994; Forrester, 1995; Magnuson-Martinson, 1995), had been adopted for assessing the effectiveness of the packages.
Two types of subject-based evaluation techniques were used: questionnaires and observations. Since questionnaires contained both structured and open-ended questions, it was quite easy to elicit large amount of specific information quickly and easily. Also, users were free to provide detailed opinions about the packages in the open-ended questions. Students were also observed by one of the authors and a staff member at the various institutions. The information collected through both these techniques provided valuable understanding about the students' feelings towards the navigational procedures, screen layouts and other human-computer interaction related matters. Student observation was employed as a supplementary technique to augment the information obtained through the questionnaire. It was also used to capture the initial reactions of students that may not be conveyed in a questionnaire completed at the end of a session, when the initial problems may have been forgotten due to increased confidence in operating the software.
The main objective of the research was to determine if the Byzantium project ITTs are an effective alternative to the resource-intensive human-tutor-led tutorials for introductory numeric disciplines.
Research questions
The research questions addressed for statistical analysis in the study were as follows:
Are the gains in students’ procedural knowledge of a numeric discipline, as obtained through the tutoring packages, comparable to human-led tutoring?
Are the gains in students’ knowledge consistent across different packages?
Are the gains in students’ knowledge consistent across different institutions?
What are the views of the students towards the design of the interface, classified according to the following factors:
Gender
Previously computer training
Confidence in operating computers
Enjoyment in using computers
Are there any differences in the performance of students who did not have any previous computer training, who did not have confidence in operating computers and who did not enjoy using computers, with those who had these attributes?
Questionnaires
The questionnaires employed in the study consisted of: (i) Pre and Post Test Questionnaire for all three packages; (ii) a Learning Style Questionnaire and (iii) the Subjective Questionnaire. Since the students had a mixed background of subjects studied at secondary school level, the Pre and Post Test Questionnaires were essential for eliminating any bias of previous exposure to the subject matter and were designed to assess the improvements in student knowledge following the use of each package. The subjective questionnaire was divided into three parts. The first part contained information regarding biographical data of users and the second part was related to their experience with general computing. The information obtained in these two parts provides the basis for the division of students into various subgroups according to their background for the purpose of analysis. The third part of the questionnaire was related to the subjective assessment of the tutoring system. It contained 113 closed-ended statements and one three-part open-ended question. The closed-ended statements in the questionnaire were 44% in favour and 56% not in favour of the packages so that the questionnaire was unbiased and balanced. All statements had a five point agree-disagree Likert Scale to facilitate easy and reliable analysis.
Sample size determination
The adequacy of the sample size is based on standardised difference, which is the ratio of the difference of interest to the standard deviation of the observations. In the comparative study of the two teaching methods for the introductory subjects, the difference in the means of gains obtained by students was used as a basis for comparison. A real difference of 10% between the means of gains was taken as representing an important difference between the performance of two teaching methods. The standard deviation for phase II study varies between 7.4 and 14.7. Therefore, taking the maximum value of standard deviation at 14.7, the value of standardised difference comes to 0.68. According to Altman (1991), the power level of 0.95 is achieved with a total sample size of 110 for a significance level of 0.05. This value of power is large enough to draw firm conclusions. The total sample sizes for all packages under the study were well above 110 students.
Statistical analysis
Initially, the gains were obtained for different institutions, where the two-group trial studies were carried out. Two-way ANOVA analysis was applied to the data to investigate whether the gain in student knowledge was consistent for both the teaching methods. The interaction between modes of instruction and centres was also analysed and the Least Significant Difference method was employed to investigate which centres had significantly different results (see Altman, 1991). The consistency among the gains obtained by the students was also investigated by two-way ANOVA analysis for different packages at various centres. To ascertain the students’ views about the packages, the subjective questionnaire data was analysed. The questionnaires were grouped according to centres and packages, and since the data was categorical, Mantel-Haenszel chi-square test was used for the analysis.
Analysis of the evaluation data
The evaluation took place at six universities in United Kingdom. Four universities out of six (universities A, B, D and E in table 1) were new universities (formerly polytechnics). The other two (universities C and F) were traditional universities. One new university (university E) used the packages in their open learning programs, whereas, at the other universities, the packages were used in general tutorial settings.
Table 1 lists the number of students who participated in the evaluation at various universities.
Thank you very much for developing a list of students and teachers online radiology degree, nursery education and technical schools. Also on the addition, even today, has some students online nutrition degree. a
Charnitski, C. W. & Harvey,
F. A. Integrating Science and Mathematics Curricula Using Computer Mediated
Communications: A Vygotskian Perspective. East Lansing, MI: National Center
for Research on Teacher Learning. (ERIC Document Reproduction Service No.
ED436144)
Coughlin, E. (1999). Professional
competencies for the digital age classroom. Learning and Leading with Technology,
27(3), 22-27.
Eckman, J. (1996). "Don't
believe the hype": Electronic textuality and the composition
classroom. East Lansing, MI: National Center for Research on Teacher Learning.
(ERIC
Document Reproduction Service No. ED 402 605)
Ertmer, P.A., & Russell, J.D.
(1995). Using case studies to enhance instructional design education. Educational
Technology, 35(4), 23-31.
Eydburn, D. L., & Gardner,
J. E. (1999). Integrating technology into special education teacher preparation
programs: Creating shared visions. Journal of Special Education Technology,
14(2), 3-20.
Hardy, J. V. (1998). Teacher attitudes
toward and knowledge of computer technology. Computers in the Schools, 14(3-4).
Heflich, D. A. (1996). The impact
of online technology on teaching and learning: Attitudes and ideas of educators
in the field. East Lansing, MI: National Center for Research on Teacher Learning.
(ERIC Document Reproduction Service No. ED 403 872)
Hoffman, B. (1997). Integrating
technology into schools. Educational Technology, 62(5), 51-55.
Jacobsen, D. M. (1998). Adoption
patterns of faculty who integrate computer technology for teaching and learning
in higher education. East Lansing, MI: National Center for Research on Teacher
Learning. (ERIC Document Reproduction Service No. ED 428 675)
Jamieson, M., Kajs, R., & Agee,
A. (1996). Computer-assisted techniques to enhance
transformative learning in first-year literature courses. Computers and the
Humanities, 30,
157-164.
Marsh II, George E. (2000). AIL
601 Instructional Technology (IT). Retrieved June, 2001, from University of
Alabama, Theories of Learning Applied to Technological Instruction Web site:
http://www.bamaed.ua.edu/ail601/instructional_technology.htm
Mergel, B. (1998). Instructional
Design & Learning Theory. Graduate School: University of Saskatchewan,
Educational Communications and Technology.
Mergendoller, J. R. (2000). Technology
and learning: A critical assessment. Principal (Reston, Va.), 79(3), 5-9.
National Research Council. (2000).
How People Learn: Brain, Mind, Experience,
and School (Expanded Edition. Washington, D.C.: National Academy Press.
Northrup, P. T., & Little, W. (1996). Establishing instructional technology
benchmarks for teacher preparation programs. Journal of Teacher Education,
47(3), 213.
Novek, E. M. (1996). Do professors dream of electronic sheep? Academic anxiety
about the information age. East Lansing, MI: National Center for Research
on Teacher Learning. (ERIC Document Reproduction Service No. ED 399 594).
Painter, D. D. (2000). Teacher
as researcher: A means to assess the effectiveness of technology in the classroom.
Learning and Leading with Technology, 27(7), 10-13.
Santrock, J. W. (1998). Adolescence
(7th ed.). New York: McGraw-Hill.
Saye, J. W. (1998). Technology
in the classroom: The role of dispositions in teacher
gatekeeping. Journal of Curriculum & Supervision, 13(3), 210.
Schrum, L. (1999). Technology
professional development for teachers. Educational Technology Research and
Development, 47(4), p. 83-90.
Semple, A. Learning theories and
their influence on the development and use of educational technologies. Australian
Science Teachers Journal, Sep2000, Vol. 46 Issue 3, p21, 7p
Soules, A., & Adams, E. (1998).
Classroom technology: A view from the trenches. Educom Review, 33(3), 50.
Tichenor, S. (1998). Writing and
computer skills: students need more time! East Lansing, MI: National Center
for Research on Teacher Learning. (ERIC Document Reproduction Service No.
ED 416 482)
Van Alkemade, K. (1996). Questioning
the humanist vision of computer technology. East Lansing, MI: National Center
for Research on Teacher Learning. (ERIC Document Reproduction Service No.
ED 404 659)
Van Dusen, L. M., & Worthen,
B. R. (1995). Can integrated instructional technology transform the classroom?
Educational Leadership, 53(2), 28.
Wang, S., & Sleeman, P. J.
(1993). Computer-assisted instruction effectiveness: A brief review of the
research. International Journal of Instructional Media, 20(4), 333.
White, C., & Walker, T. (1999). Technology, teacher education, and the
postmodern: Encouraging the discourse. Action in Teacher Education, 21(3),
45-56.
Winn, W. D. Advantages of a Theory-Based Curriculum in Instructional Technology.
East Lansing, MI: National Center for Research on Teacher Learning. (ERIC
Document Reproduction Service No. ED381126)
Wiske, M. A. (1988). How technology affects teaching. East Lansing, MI: National
Center for Research on Teacher Learning. (ERIC Document Reproduction Service
No. ED 296 706)