Cloud-Native Data Science Blog

Have you ever wondered what cloud-native actually is? Are you confused about how to use data science to improve your business? Check out these informative blogs to help you get started.

Local Jenkins Development Environment on Minikube on OSX

Developing Jenkinsfile pipelines is hard. I think my world record for the number of attempts to get a working Jenkinsfile is around 20. When you have to continually push and run your pipeline on a managed Jenkins instance, the feedback cycle is long. And the primary bottleneck to developer productivity is the length of the feedback cycle.

Scikit Learn to Pandas: Data types shouldn't be this hard

Nearly everyone using Python for Data Science has used or is using the Pandas Data Analysis/Preprocessing library. It is as much of a mainstay as Scikit-Learn. Despite this, one continuing bugbear is the different core data types used by each: pandas.DataFrame and np.array. Wouldn’t it be great if we didn’t have to worry about converting DataFrames to numpy types and back again? Yes, it would. Step forward Scikit Pandas.

7 Reasons Why You Shouldn't Use Helm in Production

Helm is billed as “the package manager for Kubernetes”. The goal was to provide a high-level package management-like experience for Kubernetes. This was a goal for all the major containerisation platforms. For example, Apache Mesos has Mesos Frameworks. And given the standardisation on package management at an OS level (yum, apt-get, brew, choco, etc.) and an application level (npm, pip, gem, etc.), this makes total sense, right?

A Comparison of Serverless Frameworks for Kubernetes: OpenFaas, OpenWhisk, Fission, Kubeless and more

The term Serverless has become synonymous with AWS Lambda. Decoupling from AWS has two benefits; it avoids lock in and improves flexibility.

The misnomer Serverless, is a set of techniques and technologies that abstract away the underlying hardware completely. Obviously these functions still run on “servers” somewhere, but the point is we don’t care. Developers only need to provide code as a function. Functions are then used or consumed via an API, usually REST, but also through message bus technologies (Kafka, Kinesis, Nats, SQS, etc.).

This provides a comparison and recommendation for a Serverless framework for the Kubernetes platform.

How to Test Terraform Infrastructure Code

Infrastructure as code has become a paradigm, but infrastructure scripts are often written and run only once. This works for simplistic infrastructure requirements (e.g. k8s deployments). But when there is a requirement for more varied infrastructure or greater resiliency then testing infrastructure code becomes a requirement. This blog post introduces a current project that has found tools and patterns to deal with this problem.

Cloud Native Data Science: Best Practices

Following the Cloud Native best practices of immutability, automation and provenance will serve you well in a CNDS project. But working with data brings its own subtle challenges around these themes.

Cloud Native Data Science: Technology

Technology choices in data-driven products are, as you would expect, largely directed by the type and amount of data. The first and most crucial decision to make is whether the data will be processed in a batch or streaming fashion.

Cloud Native Data Science: Strategy

Data Science has become an important part of any business because it provides a competitive advantage. Very early on, Amazon’s data on book purchases allowed them to deliver personalised recommendations whilst customers were browsing their site. Their main competitor in the US at the time was Borders, who mainly operated in physical stores. This physicality prevented them from seamlessly providing customers with personalised recommendations [1]. This example highlights how strategic business decisions and data science are inextricably linked.

How to List all AMIs for each region in AWS

A current project required a list of Amazon Machine Images (AMIs) for all regions for use in terraform. I couldn’t find a script to do this for me, so here you will find one that uses the aws cli, jq and a bit of Bash.

Introduction to Monitoring Microservices with Prometheus

https://prometheus.io is an open source time series database that focuses on capturing measurements and exposing them via an API. I love Prometheus because it it so simple; it’s minimalism is its greatest feature. It achieves this by pulling metrics from instrumented applications, not pulling like many of its competitors. In other words Prometheus “scrapes” the metrics from the application.

This means that it works very well in a distributed, cloud-native environment. All of the services are unburdened by load on the monitoring system. This has knock on effects meaning that HA is supported through simple duplication and scaling is supported through segmentation.

EMail

web@WinderResearch.com

Registered Address

Winder Research and Development Ltd.,

Adm Accountants Ltd, Windsor House,

Cornwall Road,

Harrogate,

North Yorkshire,

HG1 2PW,

UK

Registration Number

08762077

VAT Number

GB214263735
© Winder Research and Development Ltd. 2016-2018; all rights reserved.