I'll try to deal with each question in turn: (a) The proportion of children aged between 6 and 59 month is usually taken to be 20% (this is usually a slight overestimate). In the example of a total population of 87000 this would give about: 87000 * 0.2 = 17400 Children aged between 6 and 59 months. Assuming uniformity of ages amongst these children we can expect about: ((23 - 6) / (59 - 6)) * 100 = 32.1% to be aged between 6 and 23 months. So the proportion of the population aged between 6 and 23 months will be about: ((23 - 6) / (59 - 6)) * 0.2 = 0.064 = 6.4% (this is the same as 32.1% * 20%) In the example the total population is 87,000 so the population aged between 6 and 23 months is about: 87000 * 0.064 = 5568 (b) Using GNU sampsize and the parameters of in the example but changing the population size: Population = 1000, n = 146 Population = 2000, n = 158 Population = 3000, n = 162 Population = 4000, n = 164 Population = 5000, n = 166 Population = 7500, n = 167 Population = 10000, n = 168 Population = 15000, n = 169 Population = 20000, n = 170 Population = 50000, n = 171 A population of 50,000 can be considered "infinite". So the answer to your question is that with large populations there is little difference. It might make a difference for narrower age-bands. For example the 0 - 6 month old kids might make up only 2% of the population. In our example of a population of 87,000 this would be about: 87000 * 0.02 = 1740 And the required sample size would be 156 (that's about a 10% saving). It also makes a difference with small proportions and high precision. I used to do a lot of work in Ophthalmic epidemiology. In low-vision and blindness surveys we have a small population (those aged over 50) and rare conditions (say 1%). It makes no sense to estimate 1% with a precision of +/- 10%. Better to estimate 1% +/- 0.25%. In a population of 100,000 you might have about 5,000 aged over 50 years. Not applying the finite population correction we would need to sample 5425 people. Oops! That's a sample of 5425 people from a population of 5,000! Applying the finite population correction gives n = 2745. This may seem an extreme example but looking at levels more commonly encountered might be instructive. Let's look as 10% +/- 3%: Population = 1000, n = 278 Population = 2000, n = 323 Population = 3000, n = 341 Population = 4000, n = 351 . . and so on. . Population = 50000, n = 382 (b) Yes. You do want to include them. I'd suggest that rather than have a special sample we can use the same sampling procedure and (1) sample children 0 - 6 months in households without children 6 - 59 months not collecting anthropometry, and (2) sample children 0 - 6 months in households with children 6 - 59 months not collecting anthropometry in the children 0 - 6 months. (d) This is a problem with multiple indicator surveys. Even the set of indicators that apply to the entire sample they will have different sample size requirements. The common practice is to pick a subset of the most important indicators and select a sample size that will estimate them with useful precision. The problem is worse when you have indicators that apply only to subsets of a survey sample as you can end up with a truly massive sample just to get a sufficient sample size in a single sub-group. There are four solutions that come to mind: (1) Keep expanding the sample size so that each sub-group in the sample has a sufficiently large sample size. This is not very efficient. (2) Take top-up or quota samples. I think that this may prove difficult to do in the field if there are many indicators requiring different sub-samples. (3) Perform a series of smaller but separate surveys. (4) Use an indicator such as the IYCF index from the 2000 Ethiopian DHS (Arimond and Ruel 2000, Arimond and Ruel 2002) which provides a weighted score (weights depending on age-group) of breastfed, dietary diversity, and meal frequency. Such an index applies to the whole sample (or a large part of the overall sample of a nutritional anthropometry survey). An advantage of this approach is that the index is a score that can be estimated with good precision with a small sample size. Shifting approach from estimation to classification might also prove useful. If you are able to set standards (i.e. good situation vs. bad situation) then you can classify with good accuracy with small sample sizes. The textbook example is measles vaccination where we know that < 50% is very bad (very unlikely that herd immunity will operate even over small areas) and > 80% is good (herd immunity operates well). To be able to classify vaccine coverage as good or bad using these standards with very low levels of error can be done with a sample size below 30 using LQAS techniques. Small sample classifiers for multiple levels have been developed and, whilst more complicated than LQAS, usually work well with sample sizes <= 50. This is a long answer and one that I am not sure answers the question. It is a complicated situation. My preference would be to go with (b) and (4) above. More specifically you ask "What I basically want is to know is if we do a normal cluster survey (900 children 6-29 months and thinking add 100 aged 0-5.9m as is proportional to the amount in the population and we figure will be ok) will the results that we get mean anything statistically [so we can say X% (CI X - X) receive semi-solid foods] and therefore is it worth doing?" This is a simple sample size issue. A simple random sample of 96 will estimate a proportion with a 95% CI of <= +/- 10% (e.g. 50% +/- 10%). If you can live with this level of precision then it will do. You could get away with a smaller sample size of you used a classification technique. I hope this helps. |