martes, 3 de noviembre de 2009

Ingresar y preparar datos con R

Ingresar/Preparar los datos
  • Lectura
    • read.table
    • Datos:
      • data types are integer, numeric (real numbers), logical (TRUE or FALSE), and character (alphanumeric strings)
      • data frame is a table of data that combines vectors (columns) of different types (e.g.
        character, factor, and numeric data). hybrid of two simpler data structures: lists, which can mix arbitrary types of data but have no other structure, and matrices, which have rows and columns but usually contain only one data type (typically numeric).
  • Organización o forma
    • stack and unstack are simple but basic functions — stack converts from wide to long format and unstack from long to wide; they aren’t
    • reshape is very flexible and preserves more information than stack/unstack,
      but its syntax is tricky: if long and wide are variables holding the
      data in the examples above, then
    • library(reshape): melt, cast, and recast functions, which are similar to reshape but sometimes easier to use
  • Chequeo
    • ˆ Is there the right number of observations overall? Is there the right number of observations in each level for factors?
    • Do the summaries of the numeric variables — mean, median, etc. — look reasonable? Are the minimum and maximum values about what you expected?
    • Are there reasonable numbers of NAs in each column? If not (especially if you have extra mostly-NA columns), you may want to go back a few steps and look at using count.fields or ill=FALSE to identify rows with extra fields . . .
      • str: tells you about the structure of an R variable
      • class: prints out the class (numeric, factor, Date, logical,etc.) of a variable.
      • head: prints out the beginning of a data frame;
      • table: command for cross-tabulation
      • NAs: identificarlos
Análisis exploratorio de los datos

No hay comentarios:

Publicar un comentario

Libros para descargar (gratis)