将Aspose.Words与Azure Data Lake集成
Aspose.Words是一种高级Word文档处理API,用于执行各种文档管理和操作任务。API支持生成,修改,转换,呈现和打印文档,而无需在跨平台应用程序中直接使用Microsoft Word。
Aspose API支持流行文件格式处理,并允许将各类文档导出或转换为固定布局文件格式和最常用的图像/多媒体格式。
Aspose.Words可以与Microsoft Azure Data Lake服务集成:Azure Data Lake Analytics(ADLA)和Azure Data Lake Storage(ADLS)。这允许你将 Azure Data Lake 云存储解决方案的大数据分析功能与 Aspose.Words 的强大功能相结合,使应用程序能够以编程方式执行各种文档处理任务,例如生成、修改、呈现、读取或转换不同格式之间的文档。
本文介绍如何使用 ADLA 在 Visual Studio 中配置 C# 项目,并提供一个演示 Aspose.Words 和 Azure Data Lake 集成的示例。
Active Microsoft Azure 订阅。如果您没有免费帐户,请在开始之前创建一个免费帐户。
安装了 Azure 开发的 Visual Studio 2019 或 Visual Studio 2017。
安装了 Azure Data Lake Tools for Visual Studio。
使用 ADLA 帐户配置了 Visual Studio。
使用来自 Azure 数据湖的数据创建文档
本主题演示如何使用 Aspose.Words 从 Azure Data Lake 上的数据库生成包含表的文档。这需要创建一个简单的数据库并实现IOutputter接口来创建用户定义的输出器,该输出器以Aspose.Words支持的格式从ADLS输出数据。
在 Azure 数据湖存储 (ADLS) 中创建数据库
客户示例表驻留在 ADLS 上的sample_db数据库中。若要创建此示例数据库,请登录到 ADLA 帐户,单击“新建作业”,然后提交以下脚本:
CREATE DATABASE IF NOT EXISTS sample_db; USE DATABASE sample_db; CREATE SCHEMA IF NOT EXISTS dbo; DROP TABLE IF EXISTS dbo.Customers; CREATE TABLE dbo.Customers ( Customer_id int, Customer_name string, Customer_domain string, Customer_city string, INDEX idx_customer_id CLUSTERED (Customer_id ASC) ) DISTRIBUTED BY RANGE (Customer_id); INSERT INTO sample_db.dbo.Customers (Customer_id, Customer_name, Customer_domain, Customer_city) VALUES (1, "John Smith", "History", "Boston"), (2, "Lisa Jaine", "Chemistry", "LA"), (3, "James Johnson", "Heraldry", "Milwaukee"), (4, "Sara Soyer", "IT", "Miami");
实现 IOutputter 接口
在 Visual Studio 中,通过添加 C# 类库(对于 U-SQL 应用程序)来创建新项目,并将 NuGet 引用添加到 Aspose.Words。
下面的代码示例演示如何实现 IOutputter 接口:
using Microsoft.Analytics.Interfaces; using System; using System.IO; using System.Linq; using Aspose.Words; namespace AsposeWordsOutputterUSql { [SqlUserDefinedOutputter(AtomicFileProcessing = true)] public class AsposeWordsOutputer : IOutputter { public AsposeWordsOutputer(SaveFormat saveFormat) { // Pass the specified save format. mSaveFormat = saveFormat; // Create an instance of DocumentBuilder, which will be used to build the document. mDocumentBuilder = new DocumentBuilder(); } /// <summary> /// The Close method is used to write the document to the file. It is executed only once, after all rows. /// </summary> public override void Close() { // End the table. mDocumentBuilder.EndTable(); // The stream passed from IUnstructuredWriter.BaseStream does not support seeking. // This causes an exception when saving to PDF. // To avoid problems, save the output document into MemoryStream first // and then write its content to the IUnstructuredWriter.BaseStream. using (BinaryWriter writer = new BinaryWriter(mOutputStream)) { // Save the document and close the stream. using (MemoryStream ms = new MemoryStream()) { mDocumentBuilder.Document.Save(ms, mSaveFormat); writer.Write(ms.ToArray()); } } } public override void Output(IRow row, IUnstructuredWriter output) { // Table with header row output--runs only once. if (mIsHeaderRow) ProcessHeaderRow(row.Schema); ProcessRow(row); // Reference to the instance of the IO.Stream object for saving document. mOutputStream = output.BaseStream; } /// <summary> /// Create HeaderRow of the table. /// </summary> private void ProcessHeaderRow(ISchema schema) { // Start the table before building it. mDocumentBuilder.StartTable(); // Build the table. for (int i = 0; i < schema.Count(); i++) { IColumn col = schema[i]; mDocumentBuilder.InsertCell(); // Write a header with bold font. mDocumentBuilder.Font.Bold = true; mDocumentBuilder.Write(col.Name); } mDocumentBuilder.EndRow(); // Write data with normal font. mDocumentBuilder.Font.Bold = false; // Table with header row output--runs only once. mIsHeaderRow = false; } /// <summary> /// Create Row of the table. /// </summary> private void ProcessRow(IRow row) { // Metadata schema initialization to enumerate column names. ISchema schema = row.Schema; // Data row output. for (int i = 0; i < schema.Count(); i++) { IColumn col = schema[i]; string val = ""; Type type = col.Type; // Get the cell value in the current row by column name and cast it to the column type. if (type == typeof(string)) val = row.Get<string>(col.Name); else if (type == typeof(int)) val = row.Get<int>(col.Name).ToString(); else val = "Column type is not supported."; mDocumentBuilder.InsertCell(); mDocumentBuilder.Write(val); } mDocumentBuilder.EndRow(); } private readonly DocumentBuilder mDocumentBuilder; private readonly SaveFormat mSaveFormat; private Stream mOutputStream; private bool mIsHeaderRow = true; static AsposeWordsOutputer() { // Note: The Aspose.Words license needs to be applied only once before any Document instance is created. // To execute the code only once, a static constructor is used. The below code will find and activate the license. // Uncomment the following code and add your license file as an embedded resource in the project. // Aspose.Words.License lic = new Aspose.Words.License(); // lic.SetLicense("Aspose.Words.lic"); } } }
在 Azure 数据湖分析 (ADLA) 中注册程序集
若要将项目的 C# 类库与 ADLA 帐户集成,请将程序集注册到 ADLA 帐户:
- 在 Visual Studio 中,右键单击项目名称,然后选择“注册程序集”。
- 选择 ADLA 帐户名称和数据库名称。
- 展开“托管依赖项”面板并选中 Aspose.Words,如下面的屏幕截图所示。
在 Azure 门户中运行 U-SQL 作业
若要启动应用程序,需要在 ADLA 中运行以下 U-SQL 代码,该代码包含必要的引用并调用用户定义的输出器:
USE DATABASE [sample_db];
REFERENCE ASSEMBLY AsposeWordsOutputterUSQL; REFERENCE ASSEMBLY [Aspose.Words]; @test = SELECT * FROM dbo.Customers; OUTPUT @test TO "/output/Customers_AW.docx" USING new AsposeWordsOutputterUSql.AsposeWordsOutputer(Aspose.Words.SaveFormat.Docx);
您可以使用适用于特定项目的各种格式输出文档,例如 Docx、Doc、Pdf、Rtf、文本、Jpeg 等。有关详细信息,请参阅保存格式枚举。
在 ADLS 的输出文件夹中找到该文件并下载它。