How to run TensorFlow on Hadoop

How to run TensorFlow on Hadoop

This document describes how to run TensorFlow on Hadoop. It will be expanded to describe running on various cluster managers, but only describes running on HDFS at the moment.

HDFS

We assume that you are familiar with reading data.

To use HDFS with TensorFlow, change the file paths you use to read and write data to an HDFS path. For example:

filename_queue = tf.train.string_input_producer([
    "hdfs://namenode:8020/path/to/file1.csv",
    "hdfs://namenode:8020/path/to/file2.csv",
])

If you want to use the namenode specified in your HDFS configuration files, then change the file prefix to hdfs://default/.

When launching your TensorFlow program, the following environment variables must be set:登录查看完整内容