Window on wonder. Schools are more than just testing factories. These children in Ghana don't need telling.
Window on wonder. Schools are more than just testing factories. These children in Ghana don't need telling.

external image images_testing.gif

This blog considers existing international assessment programs that are utilized to assess the relative effectiveness of instruction for each country with respect to other countries.

International tests are increasingly being used as a yard stick to measure student performance and ascertain the effectiveness of school systems. Countries are using international test rankings as a way to understand how much their students know relative to those from other countries and how such information could be used to make improvements in education and training systems in what is acknowledged to be an ever more competitive global environment. Also, stakeholders in many countries recognize a correlation exists between the knowledge and skills with which young people enter the workforce and long-term economic competitiveness their economies. As such any investments in education are anticipated to produce social and economic benefits for society.

With this increased attention to international test data, comes scrutiny and calls for caution. Some have argued that the interpretations of such data are oversimplified, frequently exaggerated, and misleading (Carnoy & Rothstein, 2013). Specifically, they note that conclusions based on these tests ignore the complexity and contexts within which the tests are taken and it might be imprudent for policymakers to use the results as drivers for education reform.

Below is a brief overview of the most common international tests, concerns raised about the assessments, and implications for practice in developing countries.
Program for International Student Achievement (PISA)

external image PISA-OECD-189x300.jpg

external image 0098095.gif

  • A brain child of the Organization for Economic Co-operation and Development (OECD)
  • Launched in 1997 to measure 15-year-old students' reading, mathematics, and science literacy.
  • Tests are given every 3 years – with focus given to one subject in each year of assessment.
  • In 2012, over 70 countries and economies – 34 OECD member countries and 36 non-member partners – participated in PISA testing.
  • Data derived from this test is increasingly being used to assess the impact of education quality on incomes and growth, and for understanding what causes differences in achievement across nations

Sample ranking shown below.

external image PISA_OECD_rankings3.jpg

Trends in International Mathematics and Science Study (TIMSS)
external image countries_timss.jpg

external image 1_2b6973bd60af6bad9bb5bd0ca381abec.jpg?w=180&h=180

  • Launched in 1995, and developed by the International Association for the Evaluation of Educational Achievement (IEA).
  • Lets participating nations to compare students' educational achievement across borders.
  • Measures trends in mathematics and science achievement at the fourth and eighth grades.
  • Tests is conducted on a regular 4-year cycle.
  • In the latest round of testing done in 2011, nationally representative samples of students in 63 countries and 4 bench marking entities (regional jurisdictions of countries, such as states) participated in TIMSS.
  • Unlike PISA, TIMSS has content or subject matter dimensions students are expected to know by the 4th and 8th grades.
  • Additionally, the test covers a cognitive dimension which encompasses a range of cognitive processes involved in solving problems throughout the primary and middle school years

So, how well do PISA and TIMSS compare?

Progress in International Reading Literacy Study (PIRLS)

external image TIMSS_PIRLS_geography.jpg

external image images?q=tbn:ANd9GcQplFbalSquoPLZlmIIXmcvQWYai5BQYYk_ZftmZIWImUBgcBu-QA

  • Developed IEA, and first assessment was done in 2001.
  • PIRLS is conducted on regular 5-year cycles.
  • Measures trends in reading comprehension at the fourth grade level.
  • Students respond to questions designed to measure their reading comprehension across two overarching categories for reading: reading for literary experience; and reading to acquire and use information
  • In 2011, PIRLS was expanded to include prePIRLS, a stepping stone to participating in PIRLS and a way to assess reading at the end of the primary school cycle for a range of developing countries. Depending on a country’s educational development, prePIRLS can be given at the fourth, fth, or sixth grade.
  • Both TIMSS and PIRLS were assessed in 2011, when the cycles of both studies came into alignment.

Regional Assessments for Developing Countries

Regional testing programs have been used to evaluate performance for many countries in Latin America and Sub-Saharan Africa.
Some reasons cited for the growth in popularity of regional testing:
  • The need for regional and international benchmarking
  • Persistent low scores for some participating countries on OECD and IEA based tests have rendered that data unusable
  • The increasingly popular assumption that students 'have' to be tested and compared to similar cohorts of students in other countries.
  • In some cases comparison tests are introduced as a precondition for education funding (loans & other aid / support) by donor agencies.

Major regional testing bodies in developing countries
Latin American Laboratory for Assessment of Quality in Education (LLECE)
Southern and Eastern African Consortium for the Monitoring of Education Quality (SACMEQ)
Program for the Analysis of Educational Systems of the CONFEMEN (francophone Africa) countries (PASEC).

external image map-of-learning-assessment-programs1.png?w=1024&h=403

These regional assessments have much in common with the international assessments, but there are several important differences, including:
  • The relatively greater proximity in content between test and curriculum;
  • Normative scales that may or may not be tied to local (normed) skill levels;
  • And attention to local policy concerns (such as the role of the French language in PASEC countries)
The overlap in expertise between the specialists working on the international and regional levels has generally meant that these regional tests are given substantial credibility.

For more information on regional testing bodies check out:

The Latin American Laboratory for Assessment of Quality in Education (LLECE)
Southern and Eastern African Consortium for the Monitoring of Education Quality (SACMEQ)
Program for the Analysis of Educational Systems of the CONFEMEN countries (PASEC).

Use of data from testing organizations
International assessments are designed to measure learning across multiple countries. By providing comparison rank tables that order the achievement scores on countries by test scores, it has been possible to conduct both cross-national comparisons as well as within country analysis of education systems. Generally, participating countries have used these tests to explore educational issues, including:
  • Monitoring system-level achievement trends in a global context,
  • Establishing achievement goals and standards for educational improvement,
  • Stimulating curriculum reform,
  • Improving teaching and learning through research and analysis of the data,
  • Conducting related studies (e.g. monitoring equity or assessing students in additional grades), and
  • Training researchers and teachers in assessment and evaluation

Unintended consequence of the rankings include
Public debates on a country’s education quality:
In the United States, for example, few events in education have had as galvanizing an effect on public debate as the release of international test scores. Much of the public ire has revolved around the relative fall in ranking for US students and the implications such a fall has on the future economic prospects of the country. The inevitable casualties of the public discourse have been public schools that are deemed to be ineffective in educating the youth.
Policy makers’ clamor for ‘what works’ in highly ranked countries.
As international tests play an ever significant role in ascertaining the quality of a country’s education system, policy makers are seeking out opportunities to learn from high performing countries about education aspects such as teacher quality and effectiveness; curriculum and academic standards; school funding and fiscal equity; governance; and the role and responsibilities of civic institutions in shaping the educational infrastructure.
A driver for significant changes in professional development programs.
Data from surveys indicate that there is a wide disparity in participating countries’ status and caliber of teachers. To improve test score standings, and effectively implement challenging educational reforms, policy makers acknowledge the need to develop high quality teaching professionals who will assume high levels of responsibility. Testing organizations’ recommendations are being taken to heart by countries seeking to improve the recruitment, retention, and evaluation of high quality and effective teachers.
Data from international tests has been used to drive research into various aspects of countries’ educational systems.

Issues of comparability
Comparability is at the heart of international assessment. In today’s world of cross-national contacts and interdependence, issues related to the quality of education systems products, particularly those of educational achievement are of particular concern to education stakeholders. The ranking of the United States in TIMMS, PIRLS and PISA has been used as a driving force and rationale for the implementation of educational reforms. Given the importance international comparisons play to participating nations’ policy development, it is important that data used be credible and fair.

Issues Include:


Given that education in any country is a system that is situated within a cultural context with different natural, economic and social environments, with differing histories and practices, arriving at a consensus on scientifically credible aspects for comparison can be a challenge.


The tests’ ability to measure what was intended (reliability) and the relevance of the instrument in capturing the defined target characteristics (validity) significantly affect the trustworthiness of the results. Questions of validity abound on whether or not a test would be correspondingly interpreted by a child in the United States when compared to a child in Jarkata, for example.

Sampling of skills and populations
To gauge the relative effectiveness of school systems at educating its youth, representative samples of students have to be tested. These sampling procedures could be done by age, grade level, or a combined stratification strategy that takes into account geographical areas – rural vs. urban – and types of schools – public vs. private (LLECE). If skewed, the results could show substantially different combinations of learning experiences between, and or even, within countries. For example, an oversampling of students from the most-disadvantaged schools in a recent U.S. international assessment is assumed to have led to lower average scores.

Also related to sampling . . .

These sampling cases also bring forth the persistent irony that many children most in need of better education face. Often times they are systematically excluded from measurement in international tests. The rationales vary from assessment to assessment, and from one national policy to another, and yet the result is the same – those least likely to succeed on tests, and those who are most disadvantaged, represent the groups most often excluded from the sample population for assessment.


Given that international and regional assessments are carried out on 3 or 5 year cycles, questions inevitably arise over how much data, and of which kind, to collect during the test implementation period. The idea that one collects ‘just enough’ data is easier said than done.

Practical issues

The practicalities at the school level that go into testing can at times be onerous. International assessment can be lengthy and require careful preparation and substantial control procedures. The program quite easily degenerates into a burdensome project that requires substantial resources from the school, its teachers and students. Continued burden inevitably lead to test weariness, and this might impact the quality of responses students give to studies by international organizations. Such issues might have serious consequences on the analysis and interpretation of the data acquired.


There are no magic bullets.

According to PISA, schools and countries where students work in a climate characterized by high expectations and a readiness to invest effort, good teacher-student relations, and high teacher morale tend to achieve better results. Throwing money at education by itself rarely produces results.

Secondly, as McKinsey & Company (2007) aptly noted, “The quality of an education system cannot exceed the quality of its teachers.” Good teachers are essential to high-quality education. Once students fall behind early in their educational careers due to ineffective teachers, it can be very difficult to catch up– particularly if they face this disadvantage early in primary school. Finding and retaining effective teachers is not necessarily a question of high pay. Instead, teachers need to be treated as the valuable professionals they are.

To improve student outcomes, countries will have to ensure that education reform initiatives will no longer receive prefabricated wisdom but on the basis of data and best practice. Whereas countries like the United States operate on a fragmented basis and utilize “hit and miss” policies to implement change, Finland operates a system that fosters strong approaches to leadership and encourages system wide networking and that builds lateral accountability.

Finally, a study by Hanushek and Woessmann (2010) found that when cognitive skills, as measured by PISA scores, are correlated with GDP, then the impact of total years of schooling became irrelevant. In other words, how long it took to learn was less important than that learning had occurred. Whatever the path each individual country takes, there is need to ensure that linkages exist between the school and the community, and the school and the economy, so that education should have a meaning in the context that it is practiced. Additionally, many of today’s job titles, and the skills needed to fill them, simply did not exist 20 years ago. Education systems need to consider what skills today’s students will need in future and teach accordingly.

external image international-students.jpg