Fixed a bug that was causing the pipes runner to incorrectly preprocess command line options.
Fixed several bugs triggered by using a local fs as the default fs for Hadoop. This happens when you set a file: path as the value of fs.default.name in core-site.xml. For instance:
<property>
<name>fs.default.name</name>
<value>file:///var/hadoop/data</value>
</property>
The HDFS API features new high-level tools for easier manipulation of files and directories. See the API docs for more info.
Examples have been thoroughly revised in order to make them easier to understand and run.
Several bugs were fixed; we also introduced a few optimizations, most notably the automatic caching of HDFS instances.
We have pushed our code to a Git repository hosted by Sourceforge. See the Installation section for instructions.
Pydoop now works with Hadoop 1.0.
Multiple versions of Hadoop can now be supported by the same installation of Pydoop. Once you’ve built the module for each version of Hadoop you’d like to use (see the installation page, and in particular the section on multiple Hadoop versions), the runtime will automatically and transparently import the right one for the version of Hadoop selected by you HADOOP_HOME variable. This feature should make migration between Hadoop versions easier for our users.
We have added a command line tool to make it trivially simple to write shorts scripts for simple problems. See the Pydoop Script page for details.
In order to work out-of-the-box, Pydoop now requires Pydoop 2.7. Python 2.6 can be used provided that you install a few additional modules (see the installation page for details).
We have dropped support for the 0.21 branch of Hadoop, which has been marked as unstable and unsupported by Hadoop developers.