In statistics, a population is the complete set of data which is to be analysed. A population may consist of people (e.g. those living in a particular city), or living things (e.g. the population of all humpback whales), but could be any set of objects with something in common (e.g. all cars travelling on a particular road in a 24 hour period). Usually, it isn’t possible to analyse a complete population. Why?
- It would take too long
- It would be too expensive
- It just isn’t possible to find all the members of a population
So instead we take a sample which is representative of the population and analyse that instead, using statistical techniques to extrapolate results for the whole population. Taking a sample is quite simple; but how to make it truly representative? there are various methods.
Simple random sample: In a simple random sample, every member of the population must have an equal chance of being chosen. In which case, all possible simple random samples (of a particular size) are equally likely to be selected. But if you consider a population of people, it is quite hard to ensure each person has an equal chance of being chosen. You can’t randomly choose numbers from a telephone book because lots of people aren’t in a phone book. You can’t randomly choose people on the street because you would miss people who were at work. One way of creating a genuinely random sample is to allocate a number to each member of the population, and then use a random number generator (or just pick numbers out of a box!) to create your sample.
Simple random samples aren’t very useful for large, geographically spread populations, but work for a smaller group, such as all the pupils in a school.
Systematic sample: A systematic sample works if you can create an ordered list of the population. You can then select, say, every tenth item in the list. If, for example, you have a list of 4,000 people and you want a sample size of 50, you make a list of all the people and then select every eightieth name. But there is a possibility of introducing bias. Suppose you want to carry out some analysis of monthly rainfall data over 100 years; if you choose every twelfth value, you would be getting data for the same month in successive years.
Stratified sample: It may be important to analyse data involving sub-groupings of a population. For example, when carrying out a survey of attitudes amongst pupils in a school, you may want to see separate results for boys and girls. If the school has 420 boys and 480 girls, and you want a sample size of 45, then the sample should include 21 boys and 24 girls. The selection process could then be simple random or systematic withing the groups. the stratified sample can be used for as many population sub-groups as required. In our example, you could further subdivide the sample into year groups.
Quota sample: You may have come across a researcher in a street asking selected people to answer some questions. The researcher is using a quota sample, usually selecting people with specific characteristics, until a quota (perhaps 30 interviewees) has been reached. The disadvantage of this method is that the samples will not be random, and there is also a possibility of selection bias – the researcher may favour people who look more approachable, for example; but quota sampling is cheap, easy and quick.
Cluster sample: A cluster is a sub-group of the population which, as far as possible, contains all the characteristics of the whole population. Once the population has been divided into clusters, one or more clusters is then fully analysed. The disadvantage again is that the resulting sample is not random. However, cluster sampling is particularly useful in a geographic context. For example, suppose you want to analyse academic results of all high school pupils in a country. It would be too expensive to take a selection from every school Instead, divide the country up into cities (clusters) and then choose one or more clusters by simple or systematic sampling. Then, withing a city, the schools form further clusters: you can either analyse the results of all pupils in a city or select a sample of schools in your chosen cities.