tensorflow模型压缩

lijingle 深度学习 2020-11-24 12:15 3992人围观

在网络训练完成后一般网络模型是比较大的，对于部署是很不友好的，由于模型过大，导致运行速度远远达不到所期待的效果。所以tensorflow官网给了一个模型压缩的工具，有利于模型在嵌入式上的部署。

一，环境搭建

本文是基于ubuntu18.04系统的基础上进行各种操作的。在进行模型压缩之前，需要对tensorflow的部分源码进行编译，编译工具为bazel。

bazel安装：

官方网站推荐了两种方法进行安装，经测试，第一种方法(官方推荐)会出现一定的问题，主要是google源问题，国内一般无法下载。本文采用的是第二种方式

1.安装依赖包

#安装依赖
sudo apt install g++ unzip zip

如果是ubuntu16.04或者ubuntu18.04的话安装下面对应的包

# Ubuntu 16.04 (LTS) uses OpenJDK 8 by default:
sudo apt-get install openjdk-8-jdk
# Ubuntu 18.04 (LTS) uses OpenJDK 11 by default:
sudo apt-get install openjdk-11-jdk

2.运行可执行文件

从github上下载相应的软件包，一般下载最新的软件包。名字一般为 bazel-<version>-installer-linux-x86_64.sh

执行下面命令：

chmod +x bazel-<version>-installer-linux-x86_64.sh
./bazel-<version>-installer-linux-x86_64.sh --user

--usr的意思是安装到用户下面，而不是系统下面，

3.添加用户的环境变量

一般环境变量在~/. bashrc 文件里，只要用vim打开文件将环境变量添加进去(添加到最后一行即可)

#添加环境变量
export PATH="$PATH:$HOME/bin"

二，编译tensorflow部分源码

首先从github上下载tensorflow源码，然后进入tensorflow文件夹，接着使用上面安装的bazel工具进行编译tensorflow的两个工具 summarize_graph和 transform_graph 。编译命令如下：

bazel build tensorflow/tools/graph_transforms:summarize_graph
bazel build tensorflow/tools/graph_transforms:transform_graph

summarize_graph是查看网络的工具，主要查看网络输入输出等，transform_graph是对网络进行压缩的工具。编译过程需要一段时间。

三，模型压缩

这里我进行压缩的模型为densenet模型，网络训练为tensorflow里的Keras。所以保存的模型为densenet.h5文件，在进行模型压缩时首先要对模型进行转换，转换为pb文件，然后进行压缩。

1，首先查看网络的结构，主要是输入和输出

bazel-bin/tensorflow/tools/graph_transforms/summarize_graph --in_graph=../keras_to_tensorflow-master/DenseNet-40-12-CIFAR10.pb
#输出结果为：
Found 1 possible inputs: (name=input_1, type=float(1), shape=[?,32,32,3]) 
No variables spotted.
Found 1 possible outputs: (name=dense_1/Softmax, op=Softmax) 
Found 1078056 (1.08M) const parameters, 0 (0) variable parameters, and 0 control_edges
Op types used: 234 Const, 197 Identity, 39 Conv2D, 39 FusedBatchNorm, 39 Relu, 36 ConcatV2, 2 AvgPool, 1 BiasAdd, 1 MatMul, 1 Mean, 1 Placeholder, 1 Softmax
To use with tensorflow/tools/benchmark:benchmark_model try these arguments:
bazel run tensorflow/tools/benchmark:benchmark_model -- --graph=../keras_to_tensorflow-master/DenseNet-40-12-CIFAR10.pb --show_flops --input_layer=input_1 --input_layer_type=float --input_layer_shape=-1,32,32,3 --output_layer=dense_1/Softmax

可以看出输入为input_1，输出为dense_1/Softmax

2.模型进行压缩

#压缩
bazel-bin/tensorflow/tools/graph_transforms/transform_graph --in_graph=../keras_to_tensorflow-master/DenseNet-40-12-CIFAR10.pb --inputs='input_1' --outputs='dense_1/Softmax' --out_graph=../keras_to_tensorflow-master/DenseNet_fronze-CIFAR10.pb --transforms='remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true) fold_batch_norms quantize_weights strip_unused_nodes merge_duplicate_nodes sort_by_execution_order'

最终模型原来大小为5M压缩后的模型大小为1M。压缩还是比较明显的。

remove_node : 该参数表示删除节点，后面的参数表示删除的节点类型，注意该操作有可能删除一些必须节点
fold_constans: 查找模型中始终为常量的表达式，并用常量替换他们。
fold_batch_norms: 训练过程中使用批量标准化时可以优化在Conv2D或者MatMul之后引入的Mul。需要在fold_cnstans之后使用。（fold_old_batch_norms和他的功能一样，主要是为了兼容老版本）
quantize_weights：将float型数据改为8位计算方式（默认对小于1024的张量不会使用），该方法是压缩模型的主要手段。
strip_unused_nodes：除去输入和输出之间不使用的节点，对于解决移动端内核溢出存在很大的作用。
merge_duplicate_nodes: 合并一些重复的节点
sort_by_execution_order: 对节点进行排序，保证给定点的节点输入始终在该节点之前

更多的压缩方法参见官方github介绍