By Dan Fylstra
“Analytics in the Cloud” is a hot topic these days in news articles, blog posts, and vendor webinars. But what does this mean for you? What exactly is “the Cloud,” anyway? What are its advantages and drawbacks for analytic modelers?
What is “the Cloud?”
“The Cloud” refers to computing services that you can use remotely over the internet, usually – but not always – through a web browser. The term “cloud” in computer networking dates back to at least the mid-1990s, but was popularized in 2006 when Amazon introduced its “Elastic Compute Cloud” service. Inside “the cloud,” there’s a vast array of computers, memory storage, communication channels and more – but you typically don’t have to deal with any of that complexity to use a cloud-based service.
When you run analytics software “on premise,” you are using your own, or your company’s computer hardware, network lines, electricity, physical space, and technicians who must maintain that equipment. When you run analytics software “in the cloud,” all those elements are provided as part of the service – though your keyboard input, display output, and some computing still happens on your own computer.
You are most likely to use specific software applications that are “hosted in the cloud” – that’s called SaaS or “Software as a Service.” In a 2017 survey from Okta, the top three general-purpose SaaS offerings were Office 365, Salesforce.com, and Box.com – we’ll discuss analytic SaaS offerings later.
These applications run on “virtual servers” (more on that later) in a public cloud service – referred to as “PaaS” (Platform as a Service”) or “IaaS” (Infrastructure as a Service”) offerings. In a 2017 survey from Synergy Research, Amazon Web Services was the market leader, following by Microsoft Azure, Google Cloud Platform, and IBM Cloud.
Cloud Benefits and Drawbacks
Cloud computing is having a huge impact. For most companies, it’s a lot of trouble to purchase, install, and maintain computer equipment – they would rather focus on their own business. Because cloud computing services are effectively “rented” by the hour, day, month, or year, what was once a “capital investment” is now an “operating expense” for many firms. The major public cloud providers reap economies of scale by operating millions of server computers, and competition means that many of these economies are “passed through” to cloud users.
Information security – once thought of as a major drawback of cloud computing, since it relies heavily on the internet – is now emerging as a benefit, when public cloud services are realistically compared to on-premise information systems. In eight major security breaches in 2013-2015, suffered by leading firms such as Target, Home Depot, JP Morgan, and eBay as well as the federal government, every breach occurred in on-premise data centers – not one in a public cloud. Even the CIA has been using a private version of the Amazon cloud for the last three years.
This doesn’t mean you can ever take security for granted! The public cloud services provide leading information security features, but you and your application provider have to use them. Using easily-guessed passwords, or the same login and password on multiple systems, makes you the weak link in information security. Paying attention to the “padlock” icon in your browser, using https: (SSL or Secure Sockets Layer encryption) instead of http: (no encryption), not opening “phishing” emails, and keeping your own antivirus software up to date are other, everyday measures you need to take.
Availability is another possible drawback – or benefit – of cloud computing. When internet access is “down,” or if your cloud service provider needs to perform maintenance, you cannot use a public cloud service. But this must be compared to “downtime” on your own equipment, and availability when you are traveling. All in all, cloud computing is a “better idea,” and its impact continues to grow.
How Does it Work?
Cloud computing became feasible because of two key technologies – high-speed data communications and the internet, and virtualization of computing hardware – plus industry standards. People over 50 can probably remember terminals and modems that operated at “1200 baud”, about 120 characters per second; today’s data flows 1,000 to 10,000 times faster, even for consumers. IT professionals can remember when a physical computer could run only one operating system, and one or two applications at a time; today, a physical computer, which occupies less than an inch vertically in a “rack,” can run eight to 16 “virtual computers,” each with its own operating system and applications.
Virtualization means that your analytic problem – say data mining, optimization, or simulation – can be run in a remote data center, using only a “slice” of the CPU time and memory on a physical computer – and high-speed communications means that results can be brought back to you in seconds.
Where is the Data?
Applications of all kinds – but especially analytics applications – make use of models and data, and both must be accessible to the analytics software, wherever it is running – on your own computer or server, or in the cloud. If your model and data are on your own desktop PC, you can either run the analytics software on that same PC, or transfer them (for example via file upload) to a cloud-based application.
High-speed data communications has made this an easy, everyday experience for many users – who hasn’t used Office 365, Google Docs, Box.com, or Dropbox.com? Excel workbooks and CSV (Comma Separated Value) files are very common ways to store modest-size data sets. And increasingly, large databases and other data sources are maintained online (“in the cloud”), so it’s as easy – or easier – to access such data in a cloud application as in a desktop application. This is true even of “Big Data” – data sets too large to fit in a single traditional database. For example, Frontline Systems, sponsor of this magazine, operates an Apache Spark Big Data cluster on Amazon Web Services, and currently offers free use of this cluster to universities teaching analytics using Frontline’s other tools.
Business Intelligence in the Cloud
Analytics models often use data found in BI or “business intelligence” systems, or in “data warehouses.” These systems collect, and usually summarize data drawn from day-to-day “transactional” database systems. They may offer a relational, tabular, or multidimensional “view” of the data. In the BI world, it is common to speak of “analytics,” but this usually means “slicing and dicing” and “drilling down into” data – the analysis is usually limited to “sum and group by.” To clarify this, Gartner has begun referring to mathematical methods as “advanced analytics.”
BI systems were traditionally run in on-premise servers, but increasingly they are moving to the cloud. Amazon Web Services offers a highly scalable relational database called Amazon Redshift, and recently introduced a BI service called Amazon QuickSight. Microsoft offers Power BI, a cloud-based service that works with desktop Excel and Power BI Designer. Tableau offers a cloud service called Tableau Online, that complements its Tableau Server and Tableau Desktop products. All of these tools can “slice and dice” and “drill down into” data, and create sophisticated visualizations and dashboards.
Advanced Analytics in the Cloud
With cloud computing services, data increasingly hosted in the cloud, and BI and data visualization tools available in the cloud, it makes sense that analytics tools – for forecasting, data mining, simulation and risk analysis, decision analysis, and mathematical optimization – should move to the cloud as well. Partly because they are more compute-intensive than BI or general office applications, advanced analytics tools have remained “on-premise” for somewhat longer. But the shift is clearly underway.
Microsoft offers a popular cloud-based service called Azure ML for data mining and machine learning. IBM’s offerings include “BigInsights on Cloud” and “IBM Analytics for Apache Spark.” SAS Institute offers SAS Cloud Analytics, and FICO Inc. offers FICO Analytic Cloud to its large customers.
Frontline Systems, sponsor of this magazine, operates AnalyticSolver.com, a cloud-based SaaS offering that is hosted on Microsoft Azure. It includes tools for forecasting, data mining and text mining; Monte Carlo simulation and risk analysis; decision tree analysis; and conventional and stochastic optimization.
Developing Cloud-Based Applications
You might be wondering: How are cloud-based applications developed? What about mobile apps? Can an application developed for a desktop or laptop run in the cloud, or on a mobile device? A complete answer would require a full-length article, but in brief: Developers need new skills to create cloud and mobile apps – but those skills can be learned, and modern developer tools ease the way. Where a desktop or server application typically uses callable libraries through an API (Application Programming Interface), a cloud-based app typically uses services through a REST – Representational State Transfer – API. (Older apps use a SOAP – Simple Object Access Protocol – API). Cloud-based APIs for analytics are relatively new, but Microsoft and others offer REST APIs for machine learning, and Frontline Systems’ Rason.com service is a developer portal offering data mining, simulation and optimization REST APIs.
What about Desktop Analytic Software?
The advent of cloud-based analytics doesn’t mean that desktop-based analytics is going away – after all, modern desktops and laptops are more powerful and capable than ever, and you certainly can use them to create and solve analytic models. But you don’t have to choose – you can use the best of both.
Chances are good – especially if you work in a large company – that you’re already using both desktop Microsoft Office and cloud-based Office 365. If so, you’ve already seen that it’s easy to work on the same Excel workbooks, Word documents, or PowerPoint presentations in both environments. If you use Google Sheets, you know that you can upload and download Excel workbooks from/to the desktop. Tableau Desktop and Tableau Online, and Power BI Designer and PowerBI.com also inter-operate easily.
Frontline Systems’ AnalyticSolver.com cloud service is designed to work easily with its desktop Analytic Solver software for Microsoft Excel – the same forecasting and data mining, simulation and risk analysis, and optimization models work in both versions.
Analytics in the cloud isn’t just coming, it’s here. If you’re reading this magazine, there’s a decent chance that you’ve heard about or tried Solver, Risk Solver, or XLMiner Analysis ToolPak for Excel Online or Google Sheets, XLMiner.com, or AnalyticSolver.com – more than 300,0000 users have tried them when this article was written, and the total is rising every day.
You can count on cloud-based analytic software getting more and more capable, and desktop software doing the same. So by all means, use them! If you haven’t already, now’s the time to add cloud-based analytics to your arsenal of tools.