SparkLogge4j

Scala basic kownlege

Posted by lh on 2019-07-10

Spark-log4j

$控制日志输出

$1) 官网说明

1
2
3
4
5
6
7
8
9
地址:http://spark.apache.org/docs/latest/running-on-yarn.html

To use a custom log4j configuration for the application master or executors, here are the options:

1)upload a custom log4j.properties using spark-submit, by adding it to the --files list offiles to be uploaded with the application.

2)add -Dlog4j.configuration=<location of configuration file> to spark.driver.extraJavaOptions (for the driver) or spark.executor.extraJavaOptions (for executors). Note that if using a file, the file: protocol should be explicitly provided, and the file needs to exist locally on all the nodes.

3)update the $SPARK_CONF_DIR/log4j.properties file and it will be automatically uploaded along with the other configurations. Note that other 2 options has higher priority than this option if multiple options are specifie

$1)日志获取方式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
//注意 Logger.getLogger()使用这种方式的log4j不行
//必须是extends Logging
object ControlSparkLogger extends Logging{

def main(args: Array[String]): Unit = {

val conf = new SparkConf().setAppName("1").setMaster("local[2]")
val sc = new SparkContext(conf)

logInfo("--------------- 我的日志 ------------------")

//关闭sc
sc.stop()

}

}

$2)log4j.properties

1
2
3
4
5
6
7
8
9
10
#全局设置
# Set everything to be logged to the console
log4j.rootLogger=error,console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.out
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

##spark自定义日志,局部设置
log4j.logger.com.ruozedata.spark=INFO

$3)submit提交参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
bin/spark-submit \
--class com.ruoze.spark.core.homeworks.loggerControl.ControlSparkLogger \
--master yarn \
--deploy-mode client\
--driver-memory 1g \
--executor-memory 2g \
--executor-cores 2 \
--files /home/ruoze/data/myjars/log4j-driver.properties \
--driver-java-options "-Dlog4j.configuration=file:/home/ruoze/data/myjars/log4j-driver.properties" \
--conf spark.executor.extraJavaOptions="-Dlog4j.configuration=file:/home/ruoze/data/myjars/log4j-driver.properties"\
--conf spark.driver.extraJavaOptions="-Dlog4j.configuration=file:/home/ruoze/data/myjars/log4j-driver.properties"\
/home/ruoze/data/myjars/ruozedata-spark-core-1.0-SNAPSHOT.jar



结果展示:成功
20/04/21 00:56:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/04/21 00:56:44 INFO ControlSparkLogger: --------------- 我的日志 ------------------

$4)坑

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
//class not found
//你以为是找不到类吗,并不是,哈哈哈,可能是配置的错误连累的
20/04/20 23:34:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/04/20 23:34:18 WARN deploy.DependencyUtils: Local jar /home/ruoze/app/spark-2.4.5-bin-2.6.0-cdh5.16.2/spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties does not exist, skipping.
20/04/20 23:34:18 WARN deploy.SparkSubmit$$anon$2: Failed to load com.ruozedata.spark.core.homeworks.loggerControl.ControlSparkLogger.
java.lang.ClassNotFoundException: com.ruozedata.spark.core.homeworks.loggerControl.ControlSparkLogger

//原始语句:
bin/spark-submit \
--class com.ruoze.spark.core.homeworks.loggerControl.ControlSparkLogger \
--master yarn \
--deploy-mode client \
--executor-memory 1G \
--num-executors 1 \
--files /home/ruoze/data/myjars/log4j.properties \
--conf spark.driver.extraJavaOptions="-Dlog4j.configuration=log4j.properties"+空格 +spark.executor.extraJavaOptions="-Dlog4j.configuration=log4j.properties"\
/home/ruoze/data/myjars/ruozedata-spark-core-1.0-SNAPSHOT.jar

//错误分析:
查看jar包,包中缺失有这个类。第一,运行以前的包,spark环境没错。第二,打包到其他项目中。用以前的方式
提交,没错。对比以前的提交发现问题在--config导致的解析错误
--conf 多个参数不能用空格分隔,最好也不要用逗号,按照官方推荐来。

//官方实例
//标准配置 --conf
./bin/spark-submit \
--name "My app" \
--master local[4] \
--conf spark.eventLog.enabled=false \
--conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \
--conf spark.hadoop.abc.def=xyz \
myApp.jar

$参考文章