Yarn/MRv2中MapReduce的启动过程之Client端

x-rip

浏览: 105340 次
性别:
来自: 杭州

最近访客更多访客>>

vigour36

ljlszq

superyang_xp

promiseloney

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Hadoop

Yarn/MRv2中MapReduce的启动过程之Client端

Hadoop版本0.23.1

Shell端

$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-0.23.1.jar wordcount input output

Client端

1、 bin/hadoop文件

（该文件主要用于解析hadoop的命令参数，并传给相应的Java类进行处理，其中与运行WordCount相关代码如下）

#将第一个参数即字符串jar传给参数COMMAND
 COMMAND=$1
#判断参数COMMAND的值，如果是jar，则将参数CLASS设为org.apache.hadoop.util.RunJar
 elif [ "$COMMAND" = "jar" ] ; then
 CLASS=org.apache.hadoop.util.RunJar
#执行java命令，相当于$JAVA_HOME/bin/java org.apache.hadoop.util.RunJar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-0.23.1.jar  wordcount input output
 exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"

2、 RunJar.java

（该java文件用于加载参数传递过来的jar包并执行，相关代码如下）

    int firstArg = 0;
//初始化加载jar包的参数，注意在这里fileName值为args[0]，++操作先赋值后递增
    String fileName = args[firstArg++];
    File file = new File(fileName);
    String mainClassName = null;

    JarFile jarFile;
    try {
      jarFile = new JarFile(fileName);
    } catch(IOException io) {
      throw new IOException("Error opening job jar: " + fileName)
        .initCause(io);
    }
 
/*获取jar包的mainClassName，用WinRAR打开hadoop-mapreduce-examples-0.23.1.jar，在META-INF目录下的MANIFEST.MF文件中可以看到Main-Class: org.apache.hadoop.examples.ExampleDriver，这是在打包时生成的。定义这个class在pom.xml中，代码如下    
<plugin>
    <groupId>org.apache.maven.plugins</groupId>
     <artifactId>maven-jar-plugin</artifactId>
      <configuration>
       <archive>
         <manifest>
           <mainClass>org.apache.hadoop.examples.ExampleDriver</mainClass>
         </manifest>
       </archive>
     </configuration>
    </plugin>*/
    Manifest manifest = jarFile.getManifest();
    if (manifest != null) {
      mainClassName = manifest.getMainAttributes().getValue("Main-Class");
    }
    jarFile.close();

    if (mainClassName == null) {
      if (args.length < 2) {
        System.err.println(usage);
        System.exit(-1);
      }
      mainClassName = args[firstArg++];
    }
    mainClassName = mainClassName.replaceAll("/", ".");

    File tmpDir = new File(new Configuration().get("hadoop.tmp.dir"));
    ensureDirectory(tmpDir);
//创建jar包运行的临时目录
    final File workDir;
    try { 
      workDir = File.createTempFile("hadoop-unjar", "", tmpDir);
    } catch (IOException ioe) {
      // If user has insufficient perms to write to tmpDir, default  
      // "Permission denied" message doesn't specify a filename. 
      System.err.println("Error creating temp dir in hadoop.tmp.dir "
                         + tmpDir + " due to " + ioe.getMessage());
      System.exit(-1);
      return;
    }

    if (!workDir.delete()) {
      System.err.println("Delete failed for " + workDir);
      System.exit(-1);
    }
    ensureDirectory(workDir);
//添加运行结束后执行hook，用于删除临时文件
    Runtime.getRuntime().addShutdownHook(new Thread() {
        public void run() {
          FileUtil.fullyDelete(workDir);
        }
      });

    unJar(file, workDir);
//初始化CLASSPATH
    ArrayList<URL> classPath = new ArrayList<URL>();
    classPath.add(new File(workDir+"/").toURI().toURL());
    classPath.add(file.toURI().toURL());
    classPath.add(new File(workDir, "classes/").toURI().toURL());
    File[] libs = new File(workDir, "lib").listFiles();
    if (libs != null) {
      for (int i = 0; i < libs.length; i++) {
        classPath.add(libs[i].toURI().toURL());
      }
    }
    
    ClassLoader loader =
      new URLClassLoader(classPath.toArray(new URL[0]));
//利用反射加载jar包中的mainclass
    Thread.currentThread().setContextClassLoader(loader);
    Class<?> mainClass = Class.forName(mainClassName, true, loader);
    Method main = mainClass.getMethod("main", new Class[] {
      Array.newInstance(String.class, 0).getClass()
    });
    String[] newArgs = Arrays.asList(args)
      .subList(firstArg, args.length).toArray(new String[0]);
    try {
      main.invoke(null, new Object[] { newArgs });
    } catch (InvocationTargetException e) {
      throw e.getTargetException();
    }
  }

3、 ExampleDriver.java

（在执行wordcount时，命令中并没有执行wordcount的类，只有一个字符串“wordcount”，ExampleDriver就是将这个字符串解析成对应的类，并通过ProgramDriver调用，相关代码如下）

//初始化ProgramDriver，并添加wordcount和其对应的类
ProgramDriver pgd = new ProgramDriver();
    try {
      pgd.addClass("wordcount", WordCount.class, 
                   "A map/reduce program that counts the words in the input files.");
          …
//执行传递进来的参数，即wordcount
      exitCode = pgd.driver(argv);
    }
    catch(Throwable e){
      e.printStackTrace();
    }

4、 ProgramDriver.java

（wordcount被传递给driver，在这里将真正执行WordCount.class）

public int driver(String[] args) 
    throws Throwable {
    …
//通过参数wordcount获取封装了WordCount.class的ProgramDescription
    ProgramDescription pgm = programs.get(args[0]);
    if (pgm == null) {
      System.out.println("Unknown program '" + args[0] + "' chosen.");
      printUsage(programs);
      return -1;
    }
	//通过反射调用WordCount.class的main方法
    // Remove the leading argument and call main
    String[] new_args = new String[args.length - 1];
    for(int i=1; i < args.length; ++i) {
      new_args[i-1] = args[i];
    }
    pgm.invoke(new_args);
    return 0;
  }

5、 WordCount.java

（WordCount没什么好说的，初始化job的一些参数，提交job）

public static void main(String[] args) throws Exception {
    …
    Job job = new Job(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
//在这里通过waitForCompletion(true)提交Job
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }

6、之后，WordCount将在Job中通过JobSubmitter提交到实现了ClientProtocol协议的类去真正提交Job。

见YARN/MRv2 Client端源码分析

分享到：

YARN/MRv2 Client端流程图 | YARN/MRv2 ResourceManager端源码分析1

2012-05-21 11:05
浏览 3032
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论