用 WEKA 进行数据挖掘,第 3 部分: 最近邻和服务器端库
2010-06-23 00:00:00 来源:WEB开发网实际上,您已经下载了这个 WEKA API JAR;它就是您启动 WEKA Explorer 时调用的那个 JAR 文件。为了访问此代码,让您的 Java 环境在此类路径中包含这个 JAR 文件。在您自己的代码中使用第三方 JAR 文件的步骤如常。
正如您所想,WEKA API 内的这个中心构建块就是数据。数据挖掘围绕此数据进行,当然所有我们已经学习过的这些算法也都是围绕此数据的。那么让我们看看如何将我们的数据转换成 WEKA API 可以使用的格式。让我们从简单的开始,先来看看本系列有关房子价值的第一篇文章中的那些数据。
注: 我最好提前告诫您 WEKA API 有时很难导航。首要的是要复核所用的 WEKA 的版本和 API 的版本。此 API 在不同的发布版间变化会很大,以至于代码可能会完全不同。而且,即便此 API 完备,却没有什么非常好的例子可以帮助我们开始(当然了,这也是为什么您在阅读本文的原因)。我使用的是 WEKA V3.6。
清单 4 显示了如何格式化数据以便为 WEKA 所用。
清单 4. 将数据载入 WEKA
// Define each attribute (or column), and give it a numerical column number
// Likely, a better design wouldn't require the column number, but
// would instead get it from the index in the container
Attribute a1 = new Attribute("houseSize", 0);
Attribute a2 = new Attribute("lotSize", 1);
Attribute a3 = new Attribute("bedrooms", 2);
Attribute a4 = new Attribute("granite", 3);
Attribute a5 = new Attribute("bathroom", 4);
Attribute a6 = new Attribute("sellingPrice", 5);
// Each element must be added to a FastVector, a custom
// container used in this version of Weka.
// Later versions of Weka corrected this mistake by only
// using an ArrayList
FastVector attrs = new FastVector();
attrs.addElement(a1);
attrs.addElement(a2);
attrs.addElement(a3);
attrs.addElement(a4);
attrs.addElement(a5);
attrs.addElement(a6);
// Each data instance needs to create an Instance class
// The constructor requires the number of columns that
// will be defined. In this case, this is a good design,
// since you can pass in empty values where they exist.
Instance i1 = new Instance(6);
i1.setValue(a1, 3529);
i1.setValue(a2, 9191);
i1.setValue(a3, 6);
i1.setValue(a4, 0);
i1.setValue(a5, 0);
i1.setValue(a6, 205000);
....
// Each Instance has to be added to a larger container, the
// Instances class. In the constructor for this class, you
// must give it a name, pass along the Attributes that
// are used in the data set, and the number of
// Instance objects to be added. Again, probably not ideal design
// to require the number of objects to be added in the constructor,
// especially since you can specify 0 here, and then add Instance
// objects, and it will return the correct value later (so in
// other words, you should just pass in '0' here)
Instances dataset = new Instances("housePrices", attrs, 7);
dataset.add(i1);
dataset.add(i2);
dataset.add(i3);
dataset.add(i4);
dataset.add(i5);
dataset.add(i6);
dataset.add(i7);
// In the Instances class, we need to set the column that is
// the output (aka the dependent variable). You should remember
// that some data mining methods are used to predict an output
// variable, and regression is one of them.
dataset.setClassIndex(dataset.numAttributes() - 1);
更多精彩
赞助商链接