MapReduce API

In this section, we focus on MapReduce APIs. Here, we learn about the classes and methods used in MapReduce programming.

MapReduce Mapper Class

In MapReduce, the role of the Mapper class is to map the input key-value pairs to a set of intermediate key-value pairs. It transforms the input records into intermediate records.

These intermediate records associated with a given output key and passed to Reducer for the final output.

Methods of Mapper Class

void cleanup(Context context) This method called only once at the end of the task.
void map(KEYIN key, VALUEIN value, Context context) This method can be called only once for each key-value in the input split.
void run(Context context) This method can be override to control the execution of the Mapper.
void setup(Context context) This method called only once at the beginning of the task.

MapReduce Reducer Class

In MapReduce, the role of the Reducer class is to reduce the set of intermediate values. Its implementations can access the Configuration for the job via the JobContext.getConfiguration() method.

Methods of Reducer Class

void cleanup(Context context) This method called only once at the end of the task.
void map(KEYIN key, Iterable values, Context context) This method called only once for each key.
void run(Context context) This method can be used to control the tasks of the Reducer.
void setup(Context context) This method called only once at the beginning of the task.

MapReduce Job Class

The Job class is used to configure the job and submits it. It also controls the execution and query the state. Once the job is submitted, the set method throws IllegalStateException.

Methods of Job Class

Methods Description
Counters getCounters() This method is used to get the counters for the job.
long getFinishTime() This method is used to get the finish time for the job.
Job getInstance() This method is used to generate a new Job without any cluster.
Job getInstance(Configuration conf) This method is used to generate a new Job without any cluster and provided configuration.
Job getInstance(Configuration conf, String jobName) This method is used to generate a new Job without any cluster and provided configuration and job name.
String getJobFile() This method is used to get the path of the submitted job configuration.
String getJobName() This method is used to get the user-specified job name.
JobPriority getPriority() This method is used to get the scheduling function of the job.
void setJarByClass(Class c) This method is used to set the jar by providing the class name with .class extension.
void setJobName(String name) This method is used to set the user-specified job name.
void setMapOutputKeyClass(Class class) This method is used to set the key class for the map output data.
void setMapOutputValueClass(Class class) This method is used to set the value class for the map output data.
void setMapperClass(Class class) This method is used to set the Mapper for the job.
void setNumReduceTasks(int tasks) This method is used to set the number of reduce tasks for the job
void setReducerClass(Class class) This method is used to set the Reducer for the job.
Next TopicWord Count Example




Contact US

Email:[email protected]

MapReduce API
10/30