Data sampling criteria. Sample
It often happens that it is necessary to analyze a particular social phenomenon and obtain information about it. Jobs like this often come up...
Sampling is ... Definition, types, methods and results of sampling
By Masterweb
09.04.2018 16:00It often happens that it is necessary to analyze a particular social phenomenon and obtain information about it. Such tasks often arise in statistics and in statistical research. Verification of a fully defined social phenomenon is often impossible. For example, how to find out the opinion of the population or all residents of a certain city on any issue? Asking absolutely everyone is almost impossible and very laborious. In such cases, we need a sample. This is exactly the concept on which almost all research and analysis is based.
What is a sample
When analyzing a particular social phenomenon, it is necessary to obtain information about it. If we take any study, we can see that not every unit of the totality of the object of study is subject to research and analysis. Only takes into account certain part this entire collection. This process is sampling: when only certain units from the set are examined.
Of course, much depends on the type of sample. But there are also basic rules. The main one says that the selection from the population must be absolutely random. The population units to be used should not be selected due to any criterion. Roughly speaking, if it is necessary to collect a population from the population of a certain city and select only men, then there will be an error in the study, because the selection was not carried out randomly, but was selected according to gender. Almost all sampling methods are based on this rule.
Sampling rules
In order for the selected set to reflect the main qualities of the whole phenomenon, it must be built according to specific laws, where the main attention should be paid to the following categories:
- sample (sample population);
- general population;
- representativeness;
- representativeness error;
- population unit;
- sampling methods.
Features of selective observation and sampling are as follows:
- All the results obtained are based on mathematical laws and rules, that is, with the correct conduct of the study and with the correct calculations, the results will not be distorted on a subjective basis
- It makes it possible to get a result much faster and with less time and resources, studying not the entire array of events, but only a part of them.
- It can be used to study various objects: from specific questions, for example, age, gender of the group of interest to us, to the study public opinion or the level of material support of the population.
Selective observation
Selective is statistical observation, in which not the entire set of what is being studied is subjected to research, but only some part of it, selected in a certain way, and the results of studying this part apply to the entire set. This part is called the sampling frame. it the only way studying a large array of the object of study.
But selective observation can be used only in cases where it is necessary to investigate only small group units. For example, when studying the ratio of men to women in the world, selective observation will be used. For obvious reasons, it is impossible to take into account every inhabitant of our planet.
But with the same study, but not of all the inhabitants of the earth, but of a certain 2 "A" class in a particular school, a certain city, a certain country, selective observation can be dispensed with. After all, it is quite possible to analyze the entire array of the object of study. It is necessary to count the boys and girls of this class - that will be the ratio.
Sample and population
It's actually not as difficult as it sounds. In any object of study there are two systems: general and sample population. What is it? All units belong to the general. And to the sample - those units of the total population that were taken for the sample. If everything is done correctly, then the selected part will be a reduced layout of the entire (general) population.
If we talk about the general population, then we can distinguish only two of its varieties: definite and indefinite general population. Depends on whether the total number of units of a given system is known or not. If it is a certain population, then sampling will be easier due to the fact that it is known what percentage of the total number of units will be sampled.
This moment is very necessary in research. For example, if it is necessary to investigate the percentage of low-quality confectionery products at a particular plant. Assume that the population has already been defined. It is known for sure that this enterprise produces 1000 confectionery products per year. If we make a sample of 100 random confectionery products from this thousand and send them for examination, then the error will be minimal. Roughly speaking, 10% of all products were subject to research, and based on the results, taking into account the representativeness error, we can talk about poor quality of all products.
And if you take a sample of 100 confectionery products from an indefinite general population, where there were actually, say, 1 million units, then the result of the sample and the study itself will be critically implausible and inaccurate. Feel the difference? Therefore, the certainty of the general population in most cases is extremely important and greatly affects the result of the study.
Population representativeness
So, now one of the most important questions - what should be the sample? This is the most important point of the study. At this stage, it is necessary to calculate the sample and select units from total number into her. The population was selected correctly if certain features and characteristics of the general population remain in the sample. This is called representativeness.
In other words, if, after selection, a part retains the same tendencies and characteristics as the entire quantity of the examined, then such a population is called representative. But not every specific sample can be selected from a representative population. There are also such objects of research, the sample of which simply cannot be representative. This is where the concept of representativeness error comes from. But let's talk about this a little more.
How to make a sample
So, in order to maximize representativeness, there are three basic sampling rules:
- The most unique indicator of the sample number is considered to be 20%. A statistical sample of 20% will almost always give a result as close to reality as possible. At the same time, there is no need to transfer to the assembled most the general population. 20% of the sample is the figure that has been developed by many studies. Let's take a look at some more theory. The larger the sample, the smaller the error of representativeness and the more accurate the result of the study. The closer the sample population is to the general population in terms of the number of units, the more accurate and correct the results will be. After all, if you examine the entire system, then the result will be 100%. But there is no selection here. These are those studies in which the entire array is examined, all units, so this does not interest us.
- In case of inexpediency of processing 20% of the general population, it is allowed to study units of the population in an amount of at least 1001. This is also one of the indicators of the study of the array of the object of study, which has developed over time. Of course, it will not give accurate results with large arrays of research, but it will bring it as close as possible to the possible accuracy of the sample.
- There are many formulas and tabulations in statistics. Depending on the object of study and on the sampling criterion, it is expedient to choose one or another formula. But this item is used in complex and multi-stage studies.
Error (error) of representativeness
The main characteristic of the quality of the selected sample is the concept of "representativeness error". What is it? These are certain discrepancies between the indicators of selective and continuous observation. According to the error indicators, the representativeness is divided into reliable, ordinary and approximate. In other words, deviations of up to 3%, from 3 to 10% and from 10 to 20%, respectively, are acceptable. Although in statistics it is desirable that the error does not exceed 5-6%. Otherwise, there is reason to talk about the insufficient representativeness of the sample. To calculate representativeness error and how it affects a sample or population, many factors are taken into account:
- The probability with which an accurate result is to be obtained.
- Number of sampling units. As mentioned earlier, the smaller the number of units in the sample, the greater the representativeness error will be, and vice versa.
- Homogeneity of the study population. The more heterogeneous the population, the greater the representativeness error will be. The ability of a population to be representative depends on the homogeneity of all its constituent units.
- A method of selecting units in a sample population.
In specific studies, the percentage error of the mean is usually set by the researcher himself, based on the observation program and according to data from previous studies. As a rule, the maximum sampling error (error of representativeness) within 3-5% is considered acceptable.
More is not always better
It is also worth remembering that the main thing in the organization of selective observation is to bring its volume to an acceptable minimum. At the same time, one should not strive to excessively reduce the sampling error limits, since this can lead to an unjustified increase in the amount of sample data and, consequently, to an increase in the cost of sampling.
At the same time, the size of the representativeness error should not be excessively increased. After all, in this case, although there will be a decrease in the sample size, this will lead to a deterioration in the reliability of the results obtained.
What questions are usually asked by the researcher?
Any research, if carried out, is for some purpose and to obtain some results. When conducting a sample survey, as a rule, the initial questions are:
- Determination of the required number of sampling units, that is, how many units will be examined. In addition, for an accurate study, the population must be representative.
- Calculation of the error of representativeness with the established level of probability. It should be noted right away that selective studies do not happen with a 100% probability level. If the authority that conducted the study of a particular segment claims that their results are accurate with a probability of 100%, then this is a lie. Many years of practice has already established the percentage of probability of a correctly conducted sample study. This figure is 95.4%.
Methods for selecting research units in the sample
Not every sample is representative. Sometimes one and the same sign is differently expressed in the whole and in its part. To achieve the requirements of representativeness, it is advisable to use various sampling methods. Moreover, the use of one method or another depends on the specific circumstances. Some of these sampling methods include:
- random selection;
- mechanical selection;
- typical selection;
- serial (nested) selection.
Random selection is a system of activities aimed at random selection of population units, when the probability of being included in the sample is equal for all units of the general population. This technique is advisable to apply only in the case of homogeneity and a small number of its inherent features. Otherwise, some characteristic features run the risk of not being reflected in the sample. Features of random selection underlie all other methods of sampling.
With mechanical selection of units is carried out at a certain interval. If it is necessary to form a sample of specific crimes, it is possible to remove every 5th, 10th or 15th card from all the statistical records of recorded crimes, depending on their total number and available sample sizes. The disadvantage of this method is that before the selection it is necessary to have a complete account of the units of the population, then it is necessary to conduct a ranking, and only after that it is possible to sample with a certain interval. This method takes a lot of time, so it is not often used.
Typical (regional) selection is a type of sample in which the general population is divided into homogeneous groups according to a certain attribute. Sometimes researchers use other terms instead of "groups": "districts" and "zones". Then, from each group, a certain number of units is randomly selected in proportion to the share of the group in the total population. A typical selection is often carried out in several stages.
Serial sampling is a method in which the selection of units is carried out in groups (series) and all units of the selected group (series) are subject to examination. The advantage of this method is that sometimes it is more difficult to select individual units than series, for example, when studying a person who is serving a sentence. Within the selected areas, zones, the study of all units without exception is applied, for example, the study of all persons serving sentences in a particular institution.
Kievyan street, 16 0016 Armenia, Yerevan +374 11 233 255
Population- a set of units that have mass character, typicality, qualitative uniformity and the presence of variation.
The statistical population consists of materially existing objects (Employees, enterprises, countries, regions), is an object.
Population unit- each specific unit of the statistical population.
The same statistical population can be homogeneous in one feature and heterogeneous in another.
Qualitative uniformity- the similarity of all units of the population on any basis and dissimilarity on all the rest.
In a statistical population, the differences between one unit of the population and another are more often of a quantitative nature. Quantitative changes in the values of the attribute of different units of the population are called variation.
Feature Variation- quantitative change of a sign (for a quantitative sign) during the transition from one unit of the population to another.
sign is a property feature or other feature of units, objects and phenomena that can be observed or measured. Signs are divided into quantitative and qualitative. The diversity and variability of the value of a feature in individual units of the population is called variation.
Attributive (qualitative) features are not quantifiable (composition of the population by sex). Quantitative characteristics have a numerical expression (composition of the population by age).
Index- this is a generalizing quantitatively qualitative characteristic of any property of units or a set as a whole in specific conditions of time and place.
Scorecard is a set of indicators that comprehensively reflect the phenomenon under study.
For example, consider salary:- Sign - wages
- Statistical population - all employees
- The unit of the population is each worker
- Qualitative homogeneity - accrued salary
- Feature variation - a series of numbers
General population and sample from it
The basis is a set of data obtained as a result of measuring one or more features. Really observed set of objects, statistically represented by a series of observations random variable, is sampling, and the hypothetically existing (thought-out) - general population. The general population can be finite (number of observations N = const) or infinite ( N = ∞), and a sample from the general population is always the result of a limited number of observations. The number of observations that make up a sample is called sample size. If the sample size is large enough n→∞) the sample is considered big, otherwise it is called a sample limited volume. The sample is considered small, if, when measuring a one-dimensional random variable, the sample size does not exceed 30 ( n<= 30 ), and when measuring simultaneously several ( k) features in a multidimensional space relation n to k less than 10 (n/k< 10) . The sample forms variation series if its members are order statistics, i.e., sample values of the random variable X are sorted in ascending order (ranked), the values of the attribute are called options.
Example. Almost the same randomly selected set of objects - commercial banks of one administrative district of Moscow, can be considered as a sample from the general population of all commercial banks in this district, and as a sample from the general population of all commercial banks in Moscow, as well as a sample of commercial banks in the country and etc.
Basic sampling methods
The reliability of statistical conclusions and meaningful interpretation of the results depends on representativeness samples, i.e. completeness and adequacy of the presentation of the properties of the general population, in relation to which this sample can be considered representative. The study of the statistical properties of the population can be organized in two ways: using continuous and discontinuous. Continuous observation includes examination of all units studied aggregates, a non-continuous (selective) observation- only parts of it.
There are five main ways to organize sampling:
1. simple random selection, in which objects are randomly extracted from the general population of objects (for example, using a table or a random number generator), and each of the possible samples has an equal probability. Such samples are called actually random;
2. simple selection through a regular procedure is carried out using a mechanical component (for example, dates, days of the week, apartment numbers, letters of the alphabet, etc.) and the samples obtained in this way are called mechanical;
3. stratified selection consists in the fact that the general population of volume is subdivided into subsets or layers (strata) of volume so that . Strata are homogeneous objects in terms of statistical characteristics (for example, the population is divided into strata by age group or social class; enterprises by industry). In this case, the samples are called stratified(otherwise, stratified, typical, zoned);
4. methods serial selection are used to form serial or nested samples. They are convenient if it is necessary to examine a "block" or a series of objects at once (for example, a consignment of goods, products of a certain series, or the population in the territorial-administrative division of the country). The selection of series can be carried out in a random or mechanical way. At the same time, a continuous survey of a certain batch of goods, or an entire territorial unit (a residential building or a quarter) is carried out;
5. combined(stepped) selection can combine several selection methods at once (for example, stratified and random or random and mechanical); such a sample is called combined.
Selection types
By mind there are individual, group and combined selection. At individual selection individual units of the general population are selected in the sample set, with group selection are qualitatively homogeneous groups (series) of units, and combined selection involves a combination of the first and second types.
By method selection distinguish repeated and non-repetitive sample.
Unrepeatable called selection, in which the unit that fell into the sample does not return to the original population and does not participate in the further selection; while the number of units of the general population N reduced during the selection process. At repeated selection caught in the sample, the unit after registration is returned to the general population and thus retains an equal opportunity, along with other units, to be used in the further selection procedure; while the number of units of the general population N remains unchanged (the method is rarely used in socio-economic studies). However, with a large N (N → ∞) formulas for unrepeated selection are close to those for repeated selection and the latter are used almost more often ( N = const).
The main characteristics of the parameters of the general and sample population
The basis of the statistical conclusions of the study is the distribution of a random variable , while the observed values (x 1, x 2, ..., x n) are called realizations of the random variable X(n is the sample size). The distribution of a random variable in the general population is theoretical, ideal in nature, and its sample analogue is empirical distribution. Some theoretical distributions are given analytically, i.e. them options determine the value of the distribution function at each point in the space of possible values of the random variable . For a sample, it is difficult, and sometimes impossible, to determine the distribution function, therefore options are estimated from empirical data, and then they are substituted into an analytical expression describing the theoretical distribution. In this case, the assumption (or hypothesis) about the type of distribution can be both statistically correct and erroneous. But in any case, the empirical distribution reconstructed from the sample only roughly characterizes the true one. The most important distribution parameters are expected value and dispersion.
By their very nature, distributions are continuous and discrete. The best known continuous distribution is normal. Selective analogs of the parameters and for it are: the mean value and the empirical variance. Among the discrete in socio-economic studies, the most commonly used alternative (dichotomous) distribution. The expectation parameter of this distribution expresses the relative value (or share) units of the population that have the characteristic under study (it is indicated by the letter ); the proportion of the population that does not have this feature is denoted by the letter q (q = 1 - p). The variance of the alternative distribution also has an empirical analog.
Depending on the type of distribution and on the method of selecting population units, the characteristics of the distribution parameters are calculated differently. The main ones for the theoretical and empirical distributions are given in Table. one.
Sample share k n is the ratio of the number of units of the sample population to the number of units of the general population:
k n = n/N.
Sample share w is the ratio of units that have the trait under study x to sample size n:
w = n n / n.
Example. In a batch of goods containing 1000 units, with a 5% sample sample fraction k n in absolute value is 50 units. (n = N*0.05); if 2 defective products are found in this sample, then sample fraction w will be 0.04 (w = 2/50 = 0.04 or 4%).
Since the sample population is different from the general population, there are sampling errors.
Table 1. Main parameters of the general and sample populationsSampling errors
With any (solid and selective) errors of two types can occur: registration and representativeness. Mistakes registration can have random and systematic character. Random errors are made up of many different uncontrollable causes, are unintentional in nature, and usually balance each other out together (for example, changes in instrument readings due to temperature fluctuations in the room).
Systematic errors are biased, as they violate the rules for selecting objects in the sample (for example, deviations in measurements when changing the settings of the measuring device).
Example. To assess the social status of the population in the city, it is planned to examine 25% of families. If, however, the selection of every fourth apartment is based on its number, then there is a danger of selecting all apartments of only one type (for example, one-room apartments), which will introduce a systematic error and distort the results; the choice of the apartment number by lot is more preferable, since the error will be random.
Representativeness errors inherent only in selective observation, they cannot be avoided and they arise as a result of the fact that the sample does not fully reproduce the general one. The values of the indicators obtained from the sample differ from the indicators of the same values in the general population (or obtained during continuous observation).
Sampling error is the difference between the value of the parameter in the general population and its sample value. For the average value of a quantitative attribute, it is equal to: , and for the share (alternative attribute) - .
Sampling errors are inherent only in sample observations. The larger these errors, the more the empirical distribution differs from the theoretical one. The parameters of the empirical distribution and are random variables, therefore, sampling errors are also random variables, they can take different values for different samples, and therefore it is customary to calculate average error.
Average sampling error is a value expressing the standard deviation of the sample mean from the mathematical expectation. This value, subject to the principle of random selection, depends primarily on the sample size and on the degree of variation of the trait: the larger and the smaller the variation of the trait (hence, the value of ), the smaller the value of the average sampling error . The ratio between the variances of the general and sample populations is expressed by the formula:
those. for sufficiently large, we can assume that . The average sampling error shows the possible deviations of the parameter of the sample population from the parameter of the general population. In table. 2 shows expressions for calculating the average sampling error for different methods of organizing observation.
Table 2. Mean error (m) of the sample mean and proportion for different types of sampleWhere is the average of the intragroup sample variances for a continuous feature;
The average of the intra-group dispersions of the share;
— number of series selected, — total number of series;
,
where is the average of the th series;
- the general average over the entire sample for a continuous feature;
,
where is the proportion of the trait in the th series;
— the total share of the trait over the entire sample.
However, the magnitude of the average error can only be judged with a certain probability Р (Р ≤ 1). Lyapunov A.M. proved that the distribution of sample means, and hence their deviations from the general mean, with a sufficiently large number, approximately obeys the normal distribution law, provided that the general population has a finite mean and limited variance.
Mathematically, this statement for the mean is expressed as:
and for the fraction, expression (1) will take the form:
where - there is marginal sampling error, which is a multiple of the average sampling error , and the multiplicity factor is Student's criterion ("confidence factor"), proposed by W.S. Gosset (pseudonym "Student"); values for different sample sizes are stored in a special table.
The values of the function Ф(t) for some values of t are:Therefore, expression (3) can be read as follows: with probability P = 0.683 (68.3%) it can be argued that the difference between the sample and the general mean will not exceed one value of the mean error m(t=1), with probability P = 0.954 (95.4%)— that it does not exceed the value of two mean errors m (t = 2) , with probability P = 0.997 (99.7%)- will not exceed three values m (t = 3) . Thus, the probability that this difference will exceed three times the value of the mean error determines error level and is not more than 0,3% .
In table. 3 shows the formulas for calculating the marginal sampling error.
Table 3. Marginal sampling error (D) for the mean and proportion (p) for different types of sample observationExtending Sample Results to the Population
The ultimate goal of sample observation is to characterize the general population. With small sample sizes, empirical estimates of the parameters ( and ) may deviate significantly from their true values ( and ). Therefore, it becomes necessary to establish the boundaries within which the true values ( and ) lie for the sample values of the parameters ( and ).
Confidence interval of any parameter θ of the general population is called a random range of values of this parameter, which with a probability close to 1 ( reliability) contains the true value of this parameter.
marginal error samples Δ allows you to determine the limit values of the characteristics of the general population and their confidence intervals, which are equal to:
Bottom line confidence interval obtained by subtracting marginal error from the sample mean (share), and the top one by adding it.
Confidence interval for the mean, it uses the marginal sampling error and, for a given confidence level, is determined by the formula:
This means that with a given probability R, which is called the confidence level and is uniquely determined by the value t, it can be argued that the true value of the mean lies in the range from , and the true value of the share is in the range from
When calculating the confidence interval for the three standard confidence levels P=95%, P=99% and P=99.9% value is selected by . Applications depending on the number of degrees of freedom. If the sample size is large enough, then the values corresponding to these probabilities t are equal: 1,96, 2,58 and 3,29 . Thus, the marginal sampling error allows us to determine the marginal values of the characteristics of the general population and their confidence intervals:
The distribution of the results of selective observation to the general population in socio-economic studies has its own characteristics, since it requires the completeness of the representativeness of all its types and groups. The basis for the possibility of such a distribution is the calculation relative error:
where Δ % - relative marginal sampling error; , .
There are two main methods for extending a sample observation to the population: direct conversion and method of coefficients.
Essence direct conversion is to multiply the sample mean!!\overline(x) by the size of the population .
Example. Let the average number of toddlers in the city be estimated by a sampling method and amount to a person. If there are 1000 young families in the city, then the number of places required in the municipal nursery is obtained by multiplying this average by the size of the general population N = 1000, i.e. will be 1200 seats.
Method of coefficients it is advisable to use in the case when selective observation is carried out in order to clarify the data of continuous observation.
In this case, the formula is used:
where all variables are the size of the population:
Required sample size
Table 4. Required sample size (n) for different types of sampling organizationWhen planning a sampling survey with a predetermined value of the allowable sampling error, it is necessary to correctly estimate the required sample size. This amount can be determined on the basis of the allowable error during selective observation based on a given probability that guarantees an acceptable error level (taking into account the way the observation is organized). Formulas for determining the required sample size n can be easily obtained directly from the formulas for the marginal sampling error. So, from the expression for the marginal error:
the sample size is directly determined n:
This formula shows that with decreasing marginal sampling error Δ significantly increases the required sample size, which is proportional to the variance and the square of the Student's t-test.
For a specific method of organizing observation, the required sample size is calculated according to the formulas given in Table. 9.4.
Practical Calculation Examples
Example 1. Calculation of the mean value and confidence interval for a continuous quantitative characteristic.
To assess the speed of settlement with creditors in the bank, a random sample of 10 payment documents was carried out. Their values turned out to be equal (in days): 10; 3; fifteen; fifteen; 22; 7; eight; one; 19; twenty.
Required with probability P = 0.954 determine marginal error Δ sample mean and confidence limits of the average calculation time.
Solution. The average value is calculated by the formula from Table. 9.1 for the sample population
The dispersion is calculated according to the formula from Table. 9.1.
The mean square error of the day.
The error of the mean is calculated by the formula:
those. mean value is x ± m = 12.0 ± 2.3 days.
The reliability of the mean was
The limiting error is calculated by the formula from Table. 9.3 for reselection, since the size of the population is unknown, and for P = 0.954 confidence level.
Thus, the mean value is `x ± D = `x ± 2m = 12.0 ± 4.6, i.e. its true value lies in the range from 7.4 to 16.6 days.
Use of Student's table. The application allows us to conclude that for n = 10 - 1 = 9 degrees of freedom, the obtained value is reliable with a significance level a £ 0.001, i.e. the resulting mean value is significantly different from 0.
Example 2. Estimate of the probability (general share) r.
With a mechanical sampling method of surveying the social status of 1000 families, it was revealed that the proportion of low-income families was w = 0.3 (30%)(the sample was 2% , i.e. n/N = 0.02). Required with confidence level p = 0.997 define an indicator R low-income families throughout the region.
Solution. According to the presented function values Ф(t) find for a given confidence level P = 0.997 meaning t=3(see formula 3). Marginal share error w determine by the formula from Table. 9.3 for non-repeating sampling (mechanical sampling is always non-repeating):
Limiting relative sampling error in % will be:
The probability (general share) of low-income families in the region will be p=w±Δw, and the confidence limits p are calculated based on the double inequality:
w — Δw ≤ p ≤ w — Δw, i.e. the true value of p lies within:
0,3 — 0,014 < p <0,3 + 0,014, а именно от 28,6% до 31,4%.
Thus, with a probability of 0.997, it can be argued that the proportion of low-income families among all families in the region ranges from 28.6% to 31.4%.
Example 3 Calculation of the mean value and confidence interval for a discrete feature specified by an interval series.
In table. 5. The distribution of applications for the production of orders according to the timing of their implementation by the enterprise is set.
Table 5. Distribution of observations by time of occurrenceSolution. The average order completion time is calculated by the formula:
The average time will be:
= (3*20 + 9*80 + 24*60 + 48*20 + 72*20)/200 = 23.1 months
We get the same answer if we use the data on p i from the penultimate column of Table. 9.5 using the formula:
Note that the middle of the interval for the last gradation is found by artificially supplementing it with the width of the interval of the previous gradation equal to 60 - 36 = 24 months.
The dispersion is calculated by the formula
where x i- the middle of the interval series.
Therefore!!\sigma = \frac (20^2 + 14^2 + 1 + 25^2 + 49^2)(4) and the standard error is .
The error of the mean is calculated by the formula for months, i.e. the mean is!!\overline(x) ± m = 23.1 ± 13.4.
The limiting error is calculated by the formula from Table. 9.3 for reselection because the population size is unknown, for a 0.954 confidence level:
So the mean is:
those. its true value lies in the range from 0 to 50 months.
Example 4 To determine the speed of settlements with creditors of N = 500 enterprises of the corporation in a commercial bank, it is necessary to conduct a selective study using the method of random non-repetitive selection. Determine the required sample size n so that with a probability P = 0.954 the error of the sample mean does not exceed 3 days, if the trial estimates showed that the standard deviation s was 10 days.
Solution. To determine the number of necessary studies n, we use the formula for non-repetitive selection from Table. 9.4:
In it, the value of t is determined from for the confidence level Р = 0.954. It is equal to 2. The mean square value s = 10, the population size N = 500, and the marginal error of the mean Δ x = 3. Substituting these values into the formula, we get:
those. it is enough to make a sample of 41 enterprises in order to estimate the required parameter - the speed of settlements with creditors.
In statistics, there are two main methods of research - continuous and selective. When conducting a sample study, it is mandatory to comply with the following requirements: representativeness of the sample population and a sufficient number of observation units. When choosing units of observation, it is possible Offset errors, i.e., such events, the occurrence of which cannot be accurately predicted. These errors are objective and natural. In determining the degree of accuracy of a sampling study, the amount of error that can occur in the sampling process is estimated − Random representativeness error (M) — It is the actual difference between the average or relative values obtained from a sample study and similar values that would be obtained from a study on the general population.
The assessment of the reliability of the results of the study involves the determination of:
1. errors of representativeness
2. confidence limits of average (or relative) values in the general population
3. reliability of the difference of average (or relative) values (according to the criterion t)
Calculation of the error of representativeness(mm) arithmetic mean value (M):
Where σ is the standard deviation; n is the sample size (>30).
Calculation of the error of representativeness (mР) of the relative value (Р):
Where P is the corresponding relative value (calculated, for example, in %);
Q = 100 - P% is the reciprocal of P; n — sample size (n>30)
In clinical and experimental work quite often used small sample, When the number of observations is less than or equal to 30. When the sample is small, to calculate representativeness errors, both mean and relative values , The number of observations decreases by one, i.e.
; .
The magnitude of the error of representativeness depends on the sample size: the larger the number of observations, the smaller the error. To assess the reliability of a sample indicator, the following approach was adopted: the indicator (or average value) should be 3 times higher than its error, in which case it is considered reliable.
Knowing the magnitude of the error is not sufficient to be confident in the results of a sampling study, since a particular sampling error may be significantly greater (or less) than the value of the mean representativeness error. To determine the accuracy with which a researcher wishes to obtain a result, statistics uses such a concept as the probability of an error-free forecast, which is a characteristic of the reliability of the results of selective biomedical research. statistical studies. Usually, when conducting biomedical statistical studies, the probability of an error-free prediction of 95% or 99% is used. In the most critical cases, when it is necessary to draw particularly important conclusions in theoretical or practical terms, the probability of an error-free forecast of 99.7% is used.
A certain value corresponds to a certain degree of probability of an error-free forecast The marginal error of a random sample (Δ - delta), which is determined by the formula:
Δ=t * m, where t is the confidence coefficient, which, with a large sample and a probability of an error-free forecast of 95%, is 2.6; with a probability of an error-free forecast of 99% - 3.0; with a probability of an error-free forecast of 99.7% - 3.3, and with a small sample it is determined by a special table of Student's t values.
Using the marginal sampling error (Δ), one can determine Confidence boundaries, in which, with a certain probability of an error-free forecast, the real value of the statistical quantity , Characterizing the entire population (average or relative).
The following formulas are used to determine the confidence limits:
1) for average values:
Where Mgen is the confidence limits medium size in the general population;
Msample - average value , Obtained during the study on a sample population; t is a confidence coefficient, the value of which is determined by the degree of probability of an error-free forecast with which the researcher wishes to obtain a result; mM is the representativeness error of the mean.
2) for relative values:
Where Rgen - confidence limits of the relative value in the general population; Rsb is the relative value obtained during the study on a sample population; t is the confidence factor; mP is the representativeness error of the relative value.
Confidence limits show the extent to which the size of the sample indicator can fluctuate depending on the causes of a random nature.
With a small number of observations (n<30), для вычисления доверительных границ значение коэффициента t находят по специальной таблице Стьюдента. Значения t расположены в таблице на пересечении с избранной вероятностью безошибочного прогноза и строки, Indicating the number of degrees of freedom available (n) , Which is equal to n-1.
Sampling in 1C 8.2 and 8.3 is a specialized way of sorting through records of infobase tables. Let's take a closer look at what sampling is and how to use it.
What is a sample in 1C?
Sample- a way to sort through information in 1C, which consists in sequentially placing the cursor on the next record. A selection in 1C can be obtained from the query result and from the object manager, for example, documents or directories.
An example of getting and iterating from an object manager:
Selection = Directories. Banks. Choose() ; While the selection. Next() Cycle EndCycle ;
An example of getting a selection from a query:
Get 267 1C video lessons for free:
Request = New Request( "Select Link, Code, Name From Directory. Banks") ; Sample = Request. Execute() . Choose() ; While the selection. Next() Loop //perform interesting actions with the "Banks" directory EndCycle ;Both of the above examples get the same data sets to iterate over.
Sampling Methods 1C 8.3
The selection has a large number of methods, let's consider them in more detail:
- Choose()- a method by which a sample is obtained directly. From the selection, you can get another, subordinate, selection if the bypass type "by grouping" is specified.
- Owner() is the reverse method of Select(). Allows you to get the "parent" query selection.
- Next()- a method that moves the cursor to the next record. Returns True if the record exists, False if there are no more records.
- FindNext()- a very useful method with which you can iterate over only the necessary fields by the value of the selection (selection - field structure).
- NextByFieldValue()- allows you to get the next record with a value different from the current position. For example, it is necessary to sort through all records with a unique value of the "Account" field: Selection.NextBy FieldValue ("Account").
- Reset()- allows you to reset the current location of the cursor and set it to its original position.
- Quantity()- returns the number of records in the selection.
- Get()- using the method, you can set the cursor on the desired record by the index value.
- Level() - level in the hierarchy of the current entry (number).
- RecordType()— displays the record type — DetailRecord, GroupTotal, HierarchyTotal, or GrandTotal
- grouping()- returns the name of the current grouping, if the record is not a grouping - an empty string.
If you are starting to learn 1C programming, we recommend our free course (do not forget
Empirical are considered one of the main means of studying social relations and processes. They provide reliable, complete and representative information.
Specificity of techniques
Empirical provide obtaining fact-fixing knowledge. They contribute to the establishment and generalization of circumstances through indirect or direct registration of events inherent in the studied relations, objects, phenomena. Empirical methods differ from theoretical ones in that the subject of analysis is:
- Behavior of individuals and their groups.
- Products of human activity.
- Verbal actions of individuals, their judgments, views, opinions.
Sample studies
Empirical study is always focused on obtaining objective and accurate information, quantitative data. In this regard, when it is carried out, it is necessary to ensure the representativeness of the information. Accordingly, correct sampling set. it This means that the selection must be carried out in such a way that the data obtained from a narrow group reflect the trends that take place in the general mass of respondents. For example, when polling 200-300 people, the data obtained can be extrapolated to the entire urban population. The indicators of the sample set allow a different approach to the study of socio-economic processes in the region, in the country as a whole.
Terminology
In order to better understand the issues related to sample surveys, some definitions need to be clarified. The unit of observation is the direct source of information. It can be an individual, a group, a document, an organization, and so on. The general population is set of observation units. They should all be relevant to the problem being studied. subject to direct analysis. The study is carried out in accordance with the developed methods of collecting information. To determine this proportion of the entire array of respondents, use the concept of "sample". Its property to reflect the key parameters of the total mass of people is called representativeness. In some cases there are no matches. Then one speaks of a representativeness error.
Ensuring representativeness
The issues related to it are considered in detail in the framework of statistics. The problems are complex because, on the one hand, we are talking about providing a quantitative representation that gives the general population. it means, in particular, that the groups of respondents should be represented in an optimal number. The quantity must be sufficient for a normal representation. On the other hand, it also means qualitative representation. It presupposes a certain subject composition, which forms sampling set. it means that, for example, representativeness cannot be discussed if only men or only women, the elderly or young people are interviewed. The study should be carried out within all the groups represented.
Sample characteristic
This term is considered in two aspects. First of all, it is defined as a complex of elements from the general array of people whose opinion is being studied - this is sampling set. it also the process of creating a certain category of respondents with the required representativeness. In practice, there are several types and types of selection. Let's consider them.
Types
There are three of them:
- spontaneous sampling set. it a set of respondents selected on a voluntary basis. At the same time, the accessibility of the entry of units from the total mass of people into a specific study group is ensured. Spontaneous selection in practice is used quite often. For example, in surveys in the press, by mail. However, this approach has a significant drawback. It is impossible to qualitatively represent the entire volume of the general sample. This technique is applied with regard to economy. In some surveys, this option is the only possible one.
- spontaneous sampling set. it one of the main methods used in the study. The key principle of such selection is the provision of an opportunity for each unit of observation to get from the general mass of individuals into a narrow group. For this, different methods are used. For example, it can be a lottery, mechanical selection, a table of random numbers.
- Stratified (quota) sampling. It is based on the formation of a qualitative model of the total mass of respondents. After that, the selection of units in the sample population is carried out. For example, it is performed according to age or gender, according to population groups, and so on.
Kinds
There are the following selections:
Additionally
Samples can also be dependent and independent. In the first case, the procedure of the experiment and the results that will be obtained in the course of it for one group of respondents have a certain impact on the other. Accordingly, independent samples do not imply such an impact. Here, however, one important point should be noted. One group of subjects, in respect of which the psychological examination was carried out twice (even if it was aimed at studying different qualities, features, signs), by default, will be considered dependent.
Probabilistic selections
Consider some types of samples:
- Random. It assumes the homogeneity of the total population, one probability of the availability of all components, as well as the presence of a complete list of elements. As a rule, a table with random numbers is used in the selection process.
- Mechanical. This kind of random sampling involves ordering according to a certain attribute. For example, by phone number, alphabetically, by date of birth, and so on. The first component is chosen randomly. Next, each k element is selected with a step n. The value of the total population will be N=k*n.
- Stratified. This sample is used when the total population is heterogeneous. The latter is divided into strata (groups). In each of them, the selection is carried out mechanically or randomly.
- Serial. Groups are selected randomly. Inside them, objects are studied all the way.
Incredible selections
They involve sampling not on the basis of randomness, but on subjective grounds: typicality, accessibility, equal representation, and so on. Selections in this category include:
Nuance
An accurate and complete list of population units is needed to ensure representativeness. The objects of observation, as a rule, are one person. Selection from the list is best done by numbering units and using a table with random numbers. But the quasi-random method is also often used. It assumes selection from the list of each n element.
Influencing factors
The volume of a population is the number of its units. According to experts, it does not have to be large. Undoubtedly, the larger the number of respondents, the more accurate the result. However, at the same time, a large volume does not always guarantee success. For example, this happens when the total array of respondents is heterogeneous. Homogeneous will be considered such a set where the controlled parameter, for example, the level of literacy, is distributed evenly, that is, there are no voids or condensations. In this case, it will be enough to interview several people. Based on the results of the survey, it will be possible to conclude that the majority of people have a normal level of literacy. From this it follows that the representativeness of information is influenced not by quantitative characteristics, but by the qualitative characteristics of the population - the level of its homogeneity, in particular.
Mistakes
They represent the deviation of the average parameters of the sample population from the values of the total mass of respondents. In practice, errors are determined by matching. When surveying adults, data from censuses, statistical records, and the results of past surveys are usually used. The control parameters are usually the Comparison of the average values of the populations (general and sample), the determination of the error in accordance with this and the reduction of this deviation is called representativeness control.
conclusions
Sample research is a way of collecting data on people's attitudes and behavior through a survey of specially selected groups of respondents. This technique is considered reliable and economical, although it requires a certain technique. The sample is the basis. It acts as a certain proportion of the total mass of people. The selection is made using special techniques and is aimed at obtaining information about the entire population. The latter, in turn, is represented by all possible social objects or by the group that will be studied. Often the population is so large that it would be quite costly and cumbersome to conduct a survey of every member of the population. Therefore, a reduced model is used. The sample includes all those who receive questionnaires, who are called respondents, who, in fact, act as the object of study. Simply put, it is made up of many people who are being interviewed.
Conclusion
The objectives of the survey are determined by specific categories included in the population. As for a specific share of the total mass of people, it is made up of subjects included in groups using mathematical calculations. For the selection of units, a description of the object of the initial population is necessary. After determining the number of subjects, the reception or method of forming groups is determined. The results of the survey will allow us to describe the trait under study in relation to all representatives of the general mass of people. As practice shows, selective rather than continuous studies are mainly carried out.
- The displacement is called the vector connecting the start and end points of the trajectory The vector connecting the beginning and end of the path is called
- Trajectory, path length, displacement vector Vector connecting the initial position
- Calculating the area of a polygon from the coordinates of its vertices The area of a triangle from the coordinates of the vertices formula
- Acceptable Value Range (ODZ), theory, examples, solutions