Tuesday, April 9, 2013

Analytical derivation with Scilab

Today, I have discovered one useful function in Scilab, the derivat() function that works with expressions like
which consists of functions of linear combinations with integer exponents of one variable (in the example denoted by z).

The function derivat() implements the analytical derivation of p(z), giving the following result:

We can create this kind of expression with Scilab function poly(), like this:
-->p1 = poly([1 -2 1], 'x', 'coef')
 p1  =

                   2 
    1 - 2x + x  

-->p2 = poly([1 -4 2], 'y', 'coef')
 p2  =

                     2 
    1 - 4y + 2y  

-->p3 = poly(ones(1, 10), 'z', 'coef')
 p3  =

                 2    3    4    5    6    7    8    9 
    1 + z + z + z + z + z + z + z + z + z   

-->p4 = poly([-1 1], 't', 'roots')
 p4  =

           2 
  - 1 + t

-->s = %s; p5 = s^{-1} + 2 + 3*s
 p5  =

                     2 
    1 + 2s + 3s  
    -----------  
           s       


And so on.


Now, the derivat() function implements the analytical derivation of the given functions:

-->derivat(p1)
 ans  =

  - 2 + 2x  

-->derivat(p2)
 ans  =

  - 4 + 4y  

-->derivat(p3)
 ans  =

                    2      3      4      5      6      7      8 
    1 + 2z + 3z + 4z + 5z + 6z + 7z + 8z + 9z  

-->derivat(p4)
 ans  =

    2t  

-->derivat(p5)
 ans  =

             2 
  - 1 + 3s  
    ------  
       2    
      s     


There is a way for applying your knowledge about linear systems with a powerful open source simulation tool.

Thursday, April 4, 2013

Creation of artifical data for classification tests

In this semester, I'm teaching Artificial Intelligence discipline, and we are studying algorithms of classifying: Decision Trees and Neural Networks.

One important task of the discipline is to test the developed algorithm and estimate it accuracy. For that, I use to create artificial data which is controlled and simple to analyze.

The data consists of one table of N columns and many rows (let's use M rows). N - 1 first ones columns are of input data and the last column means the label (target), like presented below.

The variable x presented is a matrix (table) with 5 columns and 20 rows. Being 4 columns of input data and the last column a label for each row.

Label data are in a subset of natural numbers {1, 2, 3, 4, ....}, in the presented case {1, 2} where 1 means one class and 2 means the other.

N - 1 first columns are created through rand() function using M/P rows for each class of data, with it we created a equal distributed data set for classes representativeness (P means how many classes are in the data set).

For the variable x presented, it was created like following.

-->n = 10;

-->x = [[rand(n, 1); rand(n, 1) + 0.9] [1 + 2*rand(n, 1); rand(n, 1)*0.5 + 0.65] [rand(n, 1, "normal"); rand(n, 1) + 2.5] [rand(n, 1, "normal") - 2; rand(n, 1, "normal") + 2] [ones(n, 1); 2*ones(n, 1)]];

But it's possible to use only simpler forms of combined columns for creating overlapped input data.

Once created the matrix, we can write it to a file:

-->write("my_data.txt", x);


And later we can read the data again to a variable:

-->y = read("my_data.txt", -1, N);

Take a look at

http://usingscilab.blogspot.com.br/2009/03/using-files.html

http://usingscilab.blogspot.com.br/2009/08/basic-statistic.html

http://usingscilab.blogspot.com.br/2011/02/statistics-operators-mean-and-stdev.html

http://usingscilab.blogspot.com.br/search/label/matrix

for more details.