Site Home

 

Software for Data Entry

Entering your carefully collected data is an important step in your research project.  Wrongly entered data can lead to wrong results and false conclusions.  Choosing the most appropriate software to use is the first step in this process.

Why should I not enter my data in a statistical package such as Stata or SPSS?

Statistical packages such as Stata and SPSS provide a worksheet format in which it is possible to enter data in columns and rows.  They have no capacity to validate your data as you enter it (though you can purchase "SPSS Data Entry", a stand-alone product that will allow validation checks).

What about using Excel for data entry?

Excel has limited data checking facilities and so is better than a statistical package for data entry but not as good as a data base package.  It may be used for small, simple, stand-along data sets.  However, it is important to take care when entering dates.  All values in date columns must have the same date format.  For consistent date formats the regional setting on all computers used for data entry must be the same and must not be reset when a computer has been repaired.

Prior to entering data in Excel, we recommend you become familiar with features that will assist in reducing data entry errors, such as: freezing or splitting panes, and data validation.  To read about this and more, go to:

Database packages

We recommend that you use a database package to enter your data (see below for some options).  the advantages of a database package over a spreadsheet package are that they allow you to:-

  • create data entry forms that resemble your paper forms/questionnaires:
  • enforce uniqueness for an identifier:
  • link datasets:
  • set rules that check consistencies across a record:
  • set rules that check consistencies across linked datasets:
  • skip fields (or variables) - eg. where a set of fields are not applicable depending on the answer to a previous question:
  • reduce the time taken to perform data entry tasks:

Overview of database packages

In general, database packages allow for 'controlled' data entry meaning that they can  be set up to prevent the user from entering data that does not meet certain criteria thus reducing data entry error.  For example:-

  • only 1, 2 or 9 can be entered in the field gender:
  • age must be between 12 and 16, or 99 if missing:
  • ID number must be unique:
  • include jumps on branching questions, eg. if 'no' in question 8, skip to question 15:

Entered data can then be:-

  • edited:
  • queried:
  • printed or listed for documentation or error-checking:
  • exported for subsequent analysis in other packages, eg. Stata, SPSS and SAS:

Data entry packages used in CEBU

EpiData Entry (http://www.epidata.dk) is a free, windows based program that is quick and easy to learn.  CEBU offers a half day Introduction to EpiData course in using this software or alternatively, click here to download CEBU's EpiData Entry notes.  There is now an analysis tool available, EpiData Analysis, which is a free program that allows you to perform basic descriptive analysis.

ACCESS is a Microsoft Office 'relational' database management program that is particularly useful for large studies requiring sophisticated data entry and management systems.  It is powerful at linking tables of different types of data so that you can access data from within your database at one time.  For example, your database could contain a patient baseline data table, and a table detailing clinic visits for every patient.  ACCESS is able to create a linked 'relationship' between these two tables that will enable you to check and query your data.

 

Last Updated 03-Dec-2008. Authorised by: John Carlin. Enquiries: Donna De Sair.
webmaster. © RCH.