CRS4

Pydoop: a python mapreduce and hdfs api for hadoop

Simone Leo, Gianluigi Zanetti
Proceedings Of The 19th ACM International Symposium On High Performance Distributed Computing, page 819--825 - 2010
MapReduce has become increasingly popular as a simple and efficient paradigm for large-scale data processing. One of the main reasons for its popularity is the availability of a production-level open source implementation, Hadoop, written in Java. There is considerable interest, however, in tools that enable Python programmers to access the framework, due to the language's high popularity. Here we present a Python package that provides an API for both the MapReduce and the distributed file system sections of Hadoop, and show its advantages with respect to the other available solutions for Hadoop Python programming, Jython and Hadoop Streaming.

Références BibTex

@InProceedings{LZ10a,
  author       = {Leo, S. and Zanetti, G.},
  title        = {Pydoop: a python mapreduce and hdfs api for hadoop},
  booktitle    = {Proceedings Of The 19th ACM International Symposium On High Performance Distributed Computing},
  series       = {HPDC 2010},
  pages        = {819--825},
  year         = {2010},
  publisher    = {ACM},
  note         = {isbn: 978-1-60558-942-8idxproject: ?},
  keywords     = {python,mapreduce},
  url          = {10.1145/1851476.1851594
}

Autres publications dans la base

» Simone Leo
» Gianluigi Zanetti