Week 4: homework for my TA, comments I’ve made.
October 28, 2011, 7:41 pm
Filed under: Uncategorized






“Is it dishonest to remove outliers and / or transform data?”
October 18, 2011, 10:02 pm
Filed under: Uncategorized

An outlier is a data point that is far outside the norm of other values in a random sample from a population. If most of the data fit a particular trend an outlier is a data point that is radically outside of that trend, a point far away from the line of fit. An outlier is a unusually large or an unusually small value in comparison to others data points.

My initial thoughts on the removal of data was that it was cheating, but after considering how outliers are caused I can see that most are not valid data points and do not add anything to a study. Outliers make statistical analyses difficult, and can distort the interpretation of the data, influencing the mean and the variability, standard deviation and consequently the findings of a study.

An outlier maybe the result of an error in measurement, such as a human error in data collection, recording or entry. These errors may be corrected using the original data, double checking and recalculating, but if they cannot be corrected they should be removed as they do not represent valid data points.

Another cause could be a participant’s misinterpretation of the task, and their interpretation leads them to perform the task wrongly, in a different manner to all other participants, so their data is not a fair assessment of the participant performance of the same task, therefore their data is not valid. Participant may chose to behave in certain ways, such as purposely giving false, invalid data to appear socially acceptable. For example studies investigating sexual experience, educational achievement, the rate of truancy or financial income. Individual participant effects, such as enduring high levels of stress on the day of the test, illness or fatigue or perhaps immediate environmental effects such as a distracting noise outside the testing lab, can effect results. Participants may become bored with a task and answer any old how resulting again in data that is not valid.

Other causes of outliers are researcher effects, an attractive researcher may affect a participant’s answers or multiple researchers may record data in different ways. A participant may gather the true nature of the study and the desired outcome and may adjust their answers in accordance just to please or to oppose.

Outliers can also occur due an error in sampling. For example studying nurses and their income, some of the ward sisters with a considerable higher income could be mistakenly including in the sample. These could provide undesirable outliers, which should be removed as they do not reflect the target population.

Incorrect assumptions about the distribution of the data can also lead to outliers. Data may not fit the original assumption and may be affected by unanticipated long or short-term trends. For example, a study of library usage rates for the month of September finds outlying values at the beginning, low rates and end of the month, high rates. This data may have a legitimate place in the data set as it may reflect the return of students for the new semester, the low rate and the run up to midterm exams, the high rate.

An outlier can come from the population being sampled legitimately through random chance. Sample size is important in the probability of outlying values. Within a normally distributed population, it is more probable that a given data point will be drawn from the most densely concentrated area of the distribution, rather than one of the tails. As a data set becomes larger, the more the sample resembles the population from which it was drawn, increasing the likelihood of outlying values. There is only about a 1% chance you will get an outlying data point from a normally distributed population.

Before proceeding with any formal analysis researchers need to consider whether outlying data contains valuable information. If an outlier is a genuine result, it might indicate an extreme of behaviour that inspirers and requires further inquiry as to what makes these participants different and if can we learn from them.

Outliers can represent an error or genuine data, which must be examined carefully and should not be removed without justification.

Week 3: homework for my TA, comments I’ve made.
October 14, 2011, 7:46 pm
Filed under: Uncategorized








“Why do we use the scientific method – and are there other ways to go about the process of research?”
October 10, 2011, 9:28 pm
Filed under: Uncategorized

The scientific method is “a method of procedure that has characterized natural science since the 17th century, consisting in systematic observation, measurement, and experiment, and the formulation, testing, and modification of hypotheses.” (OxfordEnglish Dictionary.)

We use the scientific method with all its rigours of control to enable research to be systematic, objective, valid and reliable.  The scientific method is a set of principles and procedures that are used by researchers to develop questions collect data and reach conclusions.  The strict controls on possible variables allows for their elimination to ensure the effects seen, if any are the result of the effect of the independent variable on the dependent variable being measured and not due to other factors.

The goals of psychological studies are to identify, describe, explain and predict.  Psychologists may also strive to create research that can be used to predict and perhaps influence or change mental processes or behaviours.

Researchers must conduct a thorough review of the existing literature on the subject of interest. This background material will help with the first step in conducting research, formulating a hypothesis.  The research then devises a study, collect data, examine the data and reach a conclusion.  The process is universally understood and easily replicated.

The scientific method is the only method that is able to identify cause and effect relationships, but it has the disadvantage of lacking ecological validity and it may be difficult to generalise the results.

Other research methods used such as case studies, surveys, clinical interviews and observation, are examples of non-experimental methods.  These are examples of descriptive or correlation research methods.

Using these methods, researchers can describe different events, experiences, or behaviours and look for links between them, but do not identify the cause, the reasons for the behaviour.  Although they don’t answer the question of why behaviour occurs they still provide solid, scientific data when correctly executed and interpreted.  Non-experimental does not mean non-scientific.  These methods are useful in situations when you can’t conduct an experiment because you can’t manipulate the predictor variable, i.e. you can’t manipulate participants’ gender or age, or you can’t ethically conduct an experiment because you can’t ethically manipulate the predictor variable, i.e. illness or poverty, or if you want to describe or predict behaviour.   Although these methods do tell us whether two variables are related, they do not tell you which variable influences which.  They may suggest that one variable influences another, but they are never proof of causality, that changes in variable A cause changes in variable B.  Two factors may be related without one causing the other to occur.   A correlation is not the same as causation, but do tell you look for links between them.

These non-experimental methods have some limitations, as they may rely on self report methods, i.e. such as they honesty.  A participant may alter their genuine response in terms of social acceptance.  They could misinterpret the question and the information given not relevant to the test.   But they are an easy and inexpensive way to gather large quantities of data.  They gives information about characteristics such as personality traits, emotional states, aptitudes, interests, abilities, values, behaviours and important things such as what people think.  They may even lead us onto a hypothesis for an experiment.

Qualitative methods and quantitative methods of research can be used in conjunction with each other. Both have their advantages and disadvantages.  Neither is the better option.  It depends on the type of data required by the research study.  They are complimentary not contradictory.

“Do you need statistics to understand your data?”
October 7, 2011, 5:10 pm
Filed under: Uncategorized

Statistics are not necessary for qualitative data.  Qualitative data can give rich, vivid insight into phenomena, answering questions of why or how, but as they are not numerical, statistics are not used for analysis.

For quantitative data statistics are necessary to be able to effectively conduct research and for making sense of raw individual data.  Statistics are a powerful tool for condensing and analysing large quantities of data.   They are standardised techniques that are recognisable and understood throughout the scientific community.  They enable information to be presented and interpreted in both an accurate and informative way, so we can make informed decisions based on the data collected.

Statistics can tell us whether one variable has an effect on another, what that effect is and the size of that effect.  They show the distribution of the data enabling us to identify patterns in the data such as averages showing the norms.  They show the similarities and differences in data and what that these are and also relationships, correlations between variables.

With inductive reasoning, to study a relatively small sample and to draw a conclusion appropriate for people in general, one can never study a sample and expect conclusions to hold true for the entire population with absolute certainty.

With deductive reasoning, going from the general to the particular, uses the logic of mathematics so is therefore absolutely certain. General principles logically conclude to more specific relationships.

Inferential statistical procedures allow researchers to estimate, make probability predictions assessing the likelihood that the collected data occurred by chance factors, indicated by the level of significance.  The likelihood of the results occurring by chance must be very low for them to be deemed significant. Inferential statistics enable researchers to infer that the results obtained in data from a sample would also occur in the larger target population from which the sample was drawn, therefore can draw conclusions about a larger population from which samples were collected.

Statistics are a bridge between the inductive uncertainty of science and the deductive certainty of mathematics, and are a necessity to unravelling and understanding quantitative data.

Having said all that be aware statistics can be manipulated!