• Skip to content
  • Skip to primary sidebar
  • Home
  • Magazine
    • Current Issue
    • Archives
  • News
  • Subscribe
  • Advertise
    • Media Planner
    • Editorial Calendar
    • Online Advertising & Content Marketing
    • Media Planner Request Form
  • About
  • Contact

Solver International

Solver International aims to educate and empower the business analyst who likely uses spreadsheets, business intelligence and visualization tools, but is increasingly being asked to apply predictive and prescriptive analytics methods to solve business problems.

Home / Departments / Terms of The Trade: Glossary

Terms of The Trade: Glossary

Ambari
A web interface for managing Hadoop services and components

Apache Kafka
A distributed streaming platform for building real-time data pipelines and streaming apps.

Apache Spark
Open-source cluster computing framework with highly performant in-memory analytics and a growing number of related projects

Cassandra
A distributed database system

Cubes
A cube is a set of related measures and dimensions that is used to analyze data.
• A measure is a transactional value or measurement that a user may want to aggregate. Measures are sourced from columns in one or more source tables, and are grouped into measure groups.
• A dimension is a group of attributes that represent an area of interest related to the measures in the cube, and which are used to analyze the measures in the cube. The attributes within each dimension can be organized into hierarchies to provide paths for analysis.

Edge computing
Edge computing is a method of optimizing cloud systems by performing data processing at the edge of the network, near the source of the data. Edge computing covers a range of technologies that includes mobile data acquisition and signature analysis, wireless sensor networks, and cooperative distributed peer-to-peer ad hoc networking and processing.

Flume
Software for streaming data into HDFS

Google BigQuery
BigQuery is Google’s fully managed, petabyte scale, enterprise data warehouse for analytics. BigQuery is serverless; there is no infrastructure to manage or a database administrator.

Hadoop
The Apache Hadoop software library is a framework that allows the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

Hadoop Distributed File System (HDFS)
the scalable system that stores data across multiple machines without prior organization.

HBase
A non-relational, distributed database that runs on top of Hadoop

HCatalog
A table and storage management layer

Hive
A data warehousing and SQL-like query language

MapReduce
A parallel processing software framework that takes inputs, partitions them into smaller problems and distributes them to worker nodes

ODBC
ODBC stands for Open Data Base Connectivity, a connection method to data sources.

Oozie
A Hadoop job scheduler

Pig
A platform for manipulating data stored in HDFS

Python
Python is a high-level programming language for general-purpose programming. Python emphasizes code readability and a syntax which allows programmers to express concepts in fewer lines of code than might be used in languages such as C++ or Java.

R
R is a language and environment for statistical computing and graphics. It is a GNU project similar to the S language and environment, The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

Solr
A scalable search tool

Sqoop
Moves data between Hadoop and relational databases

Welch’s Test
Welch’s Test for Unequal Variances (also called Welch’s t-test, Welch’s adjusted T or unequal variances t-test) is used to see if two sample means are significantly different. The null hypothesis for the test is that the means are equal. The alternate hypothesis for the test is that means are not equal.

YARN
(Yet Another Resource Negotiator) provides resource management for the processes running on Hadoop

Zookeeper
An application that coordinates distributed processing

Filed Under: Departments, February 2018

Primary Sidebar

Copyright © 2019 Frontline Systems Inc.
Frontline Systems Inc. | P.O. Box 4288 | Incline Village, NV 89450
Lionheart Publishing, Inc. | 1635 Old​ 41 Hwy. | Suite 112-361 | Kennesaw, GA 30152 | Ph: 770.431.0867 | Fax: 770.432.6969
TERMS OF SERVICE & PRIVACY POLICY | CONTACT