This package allows to process files and databases for statistical purposes, with focus on estimation of parameters for several types of samples (simple random, stratified and multistage sampling).
Features:
- Multiple Regression. Listwise analysis optimized with use of Alglib library. Pairwise analysis is executed on pure ruby and reports same values as SPSS
- Dominance Analysis. Based on Bodescu and Azen papers, DominanceAnalysis class can report dominance analysis for a sample and DominanceAnalysis. Bootstrap can execute bootstrap analysis to determine dominance stability, as recomended by Azen & Bodescu (2003).
- Classes for Vector, Datasets (set of Vectors) and Multisets (multiple datasets with same fields and type of vectors), and multiple methods to manipulate them
- Module Codification, to help to codify open questions
- Converters to and from database, csv and Excel files, and to output Mx and GGobi files
- Module Correlation provides covariance and pearson, spearman, point biserial, tau a, tau b and gamma correlations. Include methods to create correlation and covariance matrices.
- Module Crosstab provides function to create crosstab for categorical data
- Module HtmlReport provides methods to create a report for every class.
- Regression module provides linear regression methods
- Reliability analysis provides functions to analyze scales. Class ItemAnalysis provides statistics like mean, standard deviation for a scale, alpha and standarized alpha, and for each item: mean, correlation with total scale, mean if deleted, alpha is deleted. With HtmlReport, graph the histogram of the scale and the Item Characteristic Curve for each item
- Module SRS (Simple Random Sampling) provides a lot of functions to estimate standard error for several type of samples
- Interfaces to gdchart, gnuplot and SVG::Graph
Examples
Real work session
# Read a CSV file, using '' and 'error' as missing values and ommiting 1 lines
ds=RubySS::CSV.read('resultados_c1.csv',['','error'],1)
# Create a new vector (column), calculating the mean of 13 vectors. Accept 1 missing values on one of the vectors
indice_constructivismo_becker=ds.vector_mean(%w{fd_2_1 fd_2_2 fd_3_1 fd_3_2 fd_3_3},1)
# Add the vector to the dataset
ds.add_vector("ind_cons_becker",indice_constructivismo_becker)
# Verify data. Vecto 'de_3_sex' must have values 'a' or 'b'. Dataset#verify returns and array with all errors
t_sex=create_test("Sex must be a o b",'de_3_sex') {|v| v['de_3_sex']=="a" or v['de_3_sex']=="b")}
p ds.verify(t_sexo)
# Creates a new dataset, based on the names of vectors
ds_software=ds.dup(%w{pe1n1 pe1n2 pe1n3 pe1n4 pe1n5 })
# Creates an html report, add a correlation matrix with all the scale vectors and save the report into a file
hr=RubySS::HtmlReport.new(ds_software,"correlations")
hr.add_correlation_matrix()
hr.save("correlation_matrix.html")
# Saves the new dataset
RubySS::CSV.write(ds_software,"ds_software.csv",true)Simulation of sample distribution for SRS
require File.dirname(__FILE__)+"/../lib/rubyss"
require 'rubyss/srs'
require 'rubyss/multiset'
require 'gnuplot'
tests=10000
sample_size=100
a=[]
(-20..+20).to_a.each {|i|
z=i/10.0
a+=[GSL::Ran.ugaussian_pdf(z)*1000]*25
}
pop=a.to_vector(:scale)
s=pop.standard_deviation_population
puts "Parameters:"
puts "Mean:"+pop.mean.to_s
puts "SD:"+s.to_s
puts "SE with replacement:"+RubySS::SRS.standard_error_ksd_wr(s, sample_size, pop.size).to_s
puts "SE without replacement:"+RubySS::SRS.standard_error_ksd_wor(s, sample_size,pop.size).to_s
sd_with=[]
sd_without=[]
monte_with=RubySS::Resample.repeat_and_save(tests) {
sample= pop.sample_with_replacement(sample_size)
sd_with.push(RubySS::SRS.standard_error_esd_wr(sample.sds,sample_size,pop.size))
sample.mean
}
monte_without=RubySS::Resample.repeat_and_save(tests) {
sample= pop.sample_without_replacement(sample_size)
sd_without.push(RubySS::SRS.standard_error_esd_wor(sample.sds,sample_size,pop.size))
sample.mean
}
v_sd_with=sd_with.to_vector(:scale)
v_sd_without=sd_without.to_vector(:scale)
v_with=monte_with.to_vector(:scale)
v_without=monte_without.to_vector(:scale)
puts "=============="
puts "Sample distribution - with Replacement"
puts "Mean:"+v_with.mean.to_s
puts "Sd:"+v_with.sds.to_s
puts "Sd (estimated):"+v_sd_with.mean.to_s
puts "Sample distribution - without Replacement"
puts "Mean:"+v_without.mean.to_s
puts "Sd:"+v_without.sds.to_s
puts "Sd (estimated):"+v_sd_without.mean.to_s