Pentaho Data Integration支持准备及融合数据,为您的业务创建一幅完整的画面,以进行更好的分析。完整的数据集成平台为任何来源的终端用户提供精确的,可实时分析的数据。由于可视化工具消除了编码并减小了复杂度,Pentaho将大数据和所有的数据源放在了商业和IT用户最容易获得的位置。
Pentaho data integration prepares and blends data to create a complete picture of your business that drives actionable insights. The complete data integration platform delivers accurate, "analytics ready" data to end users from any source. With visual tools to eliminate coding and complexity, Pentaho puts Big Data and all data sources at the fingertips of business and IT users alike.
针对拖拽式开发的简单可视化设计器
开发人员使用可视化工具能最大限度的缩减代码,并且达到更高的效率。
拖拽可视化设计方法
- 图形提取-转换-加载(ETL)工具,以常规方式来加载和处理大数据源。
- 丰富的预建组件库能访问和转换来自广泛数据源的数据。
- 可视化界面调用自定义代码,分析图像和视频文件以创建有意义的元数据。
- 动态转换,使用变量决定映射域,验证和改进规则。
- 集成调试器用以检测和调试任务执行过程。
零编码要求的大数据集成
Pentaho直观的工具加速了大数据分析方案的设计、开发和部署,速度提升了高达15倍。
大数据集成变得很容易
- 完整的可视化开发工具消除了SQL编码或编写MapReduce Java函数。
- 通过本地支持的Hadoop、NoSQL和分析数据库可广泛的链接到任何类型数据或数据源。
- 并行处理引擎确保高效的性能和企业可扩展性。
- 支持提取和融合现有的多元数据,以生成高质量的实时分析数据。
本地灵活支持所有大数据源
深层本地连接和自适应大数据数据层的结合,加速了对领先的Hadoop分布,NoSQL数据库以及其他大数据源的访问。
最广泛和最深层次的大数据支持
- 支持从Cloudera,Hortonworks、MapR到Intel等最新的Hadoop分布。
- 包含针对Cassandra、MongoDB等NoSQL数据库的插件,也可以连接到Amazon Redshift和 Splunk等专业的数据商店。
- 当使用新的版本和功能时,自适应大数据层为企业节省了大量的开发时间。
- 高度的灵活性,降低了大数据体系变化所带来的风险和孤立区。
- 反馈和分析增加的用户和机器数据的数量,包括网页内容、文档、社交媒体和日志文件。
- 通过灵活的集群分布,可以将Hadoop数据任务集成到全面的IT/ETL/BI解决方案中。
- 支持并行批量数据加载工具,以高效的加载数据。
强大的管理
包含简单实时可用的功能,可完成大数据集成项目等相关操作。
易于使用的进度管理
- 管理用户和任务的安全权限。
- 从最近成功检查点上重启任务,并从当前失败中回滚作业执行。
- 集成了LDAP和Active Directory中现有的的安全术语。
- 设置用户的操作权限: 读取、执行或创建。
- 进度数据集成过程实现了有序的流程管理。
- 监测和分析数据集成处理的性能。
数据剖析数据质量信息
剖析数据,并结合完整的数据管理功能保证了数据的质量。
数据质量管理
- 识别不遵守商业规则和标准的数据。
- 规范、验证和清除不一致的或冗余的数据。
- 借助人类推理和Melissa数据进行数据质量管理。
Simple Visual Designer for Drag and Drop Development
Empower developers with visual tools to minimize coding and achieve greater productivity.
Drag and Drop Visual Design Approach
- Graphical extract-transform-load (ETL) tool to load and process big data sources in familiar ways.
- Rich library of pre-built components to access and transform data from a full spectrum of sources.
- Visual interface to call custom code, analyze images and video files to create meaningful metadata.
- Dynamic transformations, using variables to determine field mappings, validation and enrichment rules.
- Integrated debugger for testing and tuning job execution.
Big Data Integration with Zero-Coding Required
Pentaho's intuitive tools accelerate the time it takes to design, develop and deploy big data analytics by as much as 15x.
Big Data Integration made easy
- Complete visual development tools eliminate coding in SQL or writing MapReduce Java functions.
- Broad connectivity to any type or source of data with native support for Hadoop, NoSQL and analytic databases.
- Parallel processing engine to ensure high performance and enterprise scalability.
- Extract and blend existing and diverse data to produce consistent high quality ready-to-analyze data.
Native and Flexible Support for all Big Data Sources
A combination of deep native connections and an adaptive big data data layer ensures accelerated access to the leading Hadoop distributions, NoSQL databases, and other big data stores.
Broadest and Deepest Big Data Support
- Support for latest Hadoop distributions from Cloudera, Hortonworks, MapR and Intel.
- Simple plugins to NoSQL databases such as Cassandra and MongoDB, as well as connections to specialized data stores like Amazon Redshift and Splunk.
- Adaptive big data layer saves enterprises considerable development time as they leverage new versions and capabilities.
- Greater flexibility, reduced risk, and insulation from changes in the big data ecosystem.
- Reporting and analysis on growing amounts of user and machine generated data, including web content, documents, social media and log files.
- Integration of Hadoop data tasks into overall IT/ETL/BI solutions with scalable distribution across the cluster.
- Support for parallel bulk data loader utilities for loading data with maximum performance.
Powerful Administration and Management
Simplified out-of-the-box capabilities to manage the operations in a data integration project.
Easy to Use Schedule Management
- Manage security privileges for users and roles.
- Restart jobs from last successful checkpoint and roll back job execution on failure.
- Integrate with existing security definitions in LDAP and Active Directory.
- Set permissions to control user actions: read, execute or create.
- Schedule data integration flows for organized process management.
- Monitor and analyze the performance of data integration processes.
Data Profiling and Data Quality
Profile data and ensure data quality with comprehensive capabilities for data managers.
Data Quality Management
- Identify data that fails to comply with business rules and standards.
- Standardize, validate, de-duplicate and cleanse inconsistent or redundant data.
- Manage data quality with partners such as Human Inference and Melissa Data.