TensorFlow2 库和扩展

lijingle 深度学习框架 2022-1-28 11:15 2878人围观

在本文中，我们将介绍 TensorFlow 2.x (TF 2.x) 中的一些关键扩展和库。这将包括 TF Datasets、TF Hub、XLA、模型优化、TensorBoard、TF Probability、神经结构化学习、TF Serving、TF Federated、TF Graphics 和 MLIR。

TensorFlow 数据集

它支持加载许多流行的数据集。如需完整列表，请查看 TF 数据集类别。这是加载 MINST 数据的代码示例。

import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow.data as tfd

# Construct a tf.data.Dataset
# Data loaded into ~/tensorflow_datasets/mnist
ds = tfds.load('mnist', split='train', shuffle_files=True)

# Build your input pipeline
ds = ds.shuffle(1024).batch(32).prefetch(tfd.experimental.AUTOTUNE)
for example in ds.take(1):
    image, label = example["image"], example["label"]

assert image.shape == (32, 28, 28, 1)
assert label.shape == (32,)

TensorFlow Hub

TensorFlow Hub 是训练有素的机器学习模型（如 BERT）的存储库，用于微调和可部署模型。如需完整列表，请查看 TF hub。在下面的代码中，我们加载了一个模型，用于在英语 Google News 200B 语料库上训练的基于标记的文本嵌入。

#pip install --upgrade tensorflow_hub
import tensorflow_hub as hub

model = hub.KerasLayer("https://tfhub.dev/google/nnlm-en-dim128/2")
embeddings = model(["The rain in Spain.", "falls",
                      "mainly", "In the plain!"])

assert embeddings.shape == [4, 128]

模型优化

使用权重裁剪和/或量化可以进一步优化训练模型，而不会损失或损失很小的准确性。下面的代码使用权重裁剪优化了一个已经训练好的“模型”。

使用权重聚类也可以减少内存占用。它首先将每一层的权重分组为 N 个簇，然后为属于该簇的所有权重共享该簇的质心值。

TensorBoard

TensorBoard 是 TF 应用程序的可视化工具。它显示应用程序记录的标量指标（如准确性和损失）、输入数据、计算图以及可训练参数的分布和直方图。

在 TF 应用程序中，我们将信息存储到可以从 tensorboard 应用程序读取的文件中。

TensorFlow 概率 (TFP)

TFP 提供了一个库来建模概率分布、变分推理、马尔可夫链蒙特卡罗等……

下面的代码从正态分布中采样 100K 数据并对其进行操作以采样 100K 伯努利分布数据。收集到数据后，代码用这些数据拟合伯努利分布并找到模型参数。

mport tensorflow as tf
import tensorflow_probability as tfp

# Pretend to load synthetic data set.
features = tfp.distributions.Normal(loc=0., scale=1.).sample(int(100e3))
labels = tfp.distributions.Bernoulli(logits=1.618 * features).sample()

# Specify model.
model = tfp.glm.Bernoulli()

# Fit model given data.
coeffs, linear_response, is_converged, num_iter = tfp.glm.fit(
    model_matrix=features[:, tf.newaxis],
    response=tf.cast(labels, dtype=tf.float32),
    model=model)
# ==> coeffs is approximately [1.618] (We're golden!)

Neural Structured Learning (NSL)

在计算机视觉中，信息被编码在图像中。在 NLP 中，它包含在文本中。但是，可以在图形中编码丰富的信息来描述样本之间的关系。 Cora 数据集是一个引文图，其中节点代表机器学习论文，边代表论文对之间的引文。我们可以同时利用节点（论文内容）和链接（引文）将每篇论文更好地分类为七个类别之一。在下图中，我们希望邻居之间的嵌入特征相似。