Like many fields, assessment has its own specific vocabulary. Below you will find an explanation of many key terms in assessment.
Evaluation, Assessment, and Testing are often used interchangeably in everyday conversations, but each has particular meanings among assessment professionals and researchers. (Note that in addition to its particular meaning, assessment can also encompass all three terms, as when we refer to assessment terminology or assessment professionals.)
Assessment purpose refers to the reason for conducting the assessment and suggests how the information obtained through the assessment process will be used. Shepard (2000) has identified three major categories of assessment purposes: administrative, instructional, and research.
Classroom assessment and external assessment are two ways of categorizing instructional assessments and refer to the agents who design and/or administer the assessment. Classroom assessments are used by individual teachers with their students while external assessments are created by private companies or government agencies and administered on a very large scale (e.g., the TOEFL, the SAT).
Standards are descriptive statements of what learners must know or be able to do in order to demonstrate competence or proficiency at various levels within a domain of study. Standards are intended to represent up-to-date theory and knowledge within the domain and often serve as the basis for educational programs. Assessments are then employed to determine whether the standards are being met.
Alternative or Complementary Assessment are terms that were introduced as various approaches to assessment became more popular. These terms were meant to set these approaches apart from testing, which has traditionally been the dominant form of assessment. Examples of alternative assessments include projects, portfolios, games, debates, interviews, and learner presentations. Alternative assessments are often described as authentic and performance-based.
Authentic assessment is a term associated with alternative assessments because the kinds of tasks learners are asked to perform are intended to reflect better the demands and contexts of everyday life than do the questions that comprise traditional tests.
Performance assessment means that learners are asked to demonstrate their language knowledge or abilities in some way other than answering traditional test questions; that is, they are asked to do something using the language. This might involve creating a product or performing a certain language function.
Assessment Program means that classroom-based assessment is best conceived of as an ongoing process rather than a one-off testing episode. An effective assessment program will make use of multiple assessment instruments and will include a focus on both product (what learners are able to do) and process (how learners orient to assessment tasks, strategies they employ) throughout the course of study.
Achievement testing/assessment involves assessing what students have learned in a particular course or program of study.
Proficiency testing/assessment refers to the assessment of knowledge or ability within a given domain but not restricted to any course or program (e.g., general or overall language ability).
Summative assessment occurs at the end of the program of study.
Formative assessment occurs during the program of study and is used to inform subsequent teaching and learning.
Rubrics include various dimensions of a task presented as hierarchical descriptors in order to inform assessment decisions. For example, one dimension of a writing task might be the coherence and organization of the piece. A rubric would explain to students what constitutes excellent (good, average, poor, etc.) coherence and organization.
Rating Scales are similar to rubrics in that they provide criteria for determining the quality of performance.
Validity involves whether or not an assessment actually assesses the knowledge or abilities it is intended to assess.
Reliability refers to the consistency of an assessment score or grade. Reliability recognizes that the view of abilities that emerges from an assessment procedure may reflect other variables. In testing theory, these other variables are considered sources of measurement error because they obscure individuals’ true abilities. For this reason, an observed test score is believed to be comprised of a true score and error. Assessment procedures typically try to control for or minimize error.
Internal consistency is the extent to which individual test items are thought to measure the same ability.
Test item analysis helps to determine the quality of individual items on a test. Each test item should reveal some feature of the knowledge or abilities under assessment. Items are described according to their level of difficulty and their discriminating power. This kind of analysis is especially important when an assessment is intended to group or classify test takers.
Level of difficulty of a test item is determined by how many test takers answered the item correctly. If most or all test takers answered an item correctly, it may be too easy (or, conversely, an item may be too difficult if few or no test takers got it right).
Discrimination index is calculated for each test item to reveal whether individual items distinguish well between weaker and stronger learners. An item that has good discriminating power will be answered correctly by test takers who attain high scores on the test but not by those who score poorly.
Next section: Classroom Assessment