AWS Partner Network (APN) Blog

Three new AWS Training specialty courses now available

AWS Training can help APN Partners deepen AWS knowledge and skills and better serve customers. We are adding three of our most popular training bootcamps from events to our permanent instructor-led training portfolio based on feedback from our customers. These one-day courses are intended for individuals who would like to dive deeper into a specialized topic with an expert trainer. The three new courses are:

Building a Serverless Data Lake: Teaches you how to design, build, and operate a serverless data lake solution with AWS services.
Secrets to Successful Cloud Transformations: Teaches you how to select the right strategy, people, migration plan, and financial management methodology needed when moving your workloads to the cloud. Does not require advanced technical expertise.
Running Container-Enabled Microservices on AWS: Teaches you how to manage and scale container-enabled applications by using Amazon EC2 Container Service (ECS).

You can explore our complete course catalog here, and you can search for a public class near you by logging into the AWS Training and Certification Portal with your APN Portal credentials. APN Partners are eligible for a 20% discount on public AWS Training delivered by AWS. You can also request a private onsite training for your team by contacting us.

New AWS Marketplace IoT Discovery Webpage Accelerates IoT Innovation

AWS Marketplace now has an IoT discovery webpage that makes it easier for you to buy IoT software from popular software vendors that’s integrated with, or running on, AWS Cloud services. This page features 17 IoT software providers.

IoT is a complex industry represented by connected devices and the data they produce, supported by a variety of interrelated technologies across hardware and software platforms. The IoT value chain consists of several categories, including hardware (sensors, edge devices, gateways), connectivity, cloud and infrastructure, applications, and professional services. The IoT space is growing at a rapid-fire pace, and presents a nearly overwhelming selection for customers who want to find the right products to integrate into their AWS IoT projects. Customers look to AWS Marketplace for IoT software solutions, and the new IoT discovery webpage will help them make sense of the fragmented environment of products and software by placing these services in one easy-to-find location.

AWS Marketplace is a sales channel that software companies use to offer software solutions to AWS customers. You can easily find and buy software as a service (SaaS) products, Amazon Machine Images (AMIs), or AWS CloudFormation template-based software deployments from popular software vendors. The software solutions listed on the IoT discovery webpage integrate with AWS IoT or other AWS services, and are billed to the customer’s AWS account rather than being billed by the vendor.

AWS Marketplace vendors offer over 60 products with IoT use cases, across networking, security, database, business intelligence, and other categories. The AWS Marketplace IoT discovery webpage helps customers select the right products faster by showcasing products within the following subcategories, to reduce the time and resources required to discover, procure, and implement an IoT project:

Edge, gateway, and connectivity: Includes software to manage data ingestion, device certificates/security, edge processing on the gateway, and global connectivity.
Development tools: Offers solutions to help partners and customers build best-in-class applications, reducing the friction developers face today when building IoT applications.
Data analytics and machine learning: Offers solutions to turn data into meaningful information to support business insights and outcomes.

Today’s featured partners who have earned the AWS IoT Competency include Eseye, Bsquare, ThingLogix, Splunk and Bright Wolf.

Pinacl is a Consulting Partner that leveraged the AWS Marketplace IoT selection to deliver IoT services quickly to Newport City Council, in Wales.

“ConnectThing.io on AWS Marketplace made it possible for Pinacl to very quickly launch a smart city proof of concept for Newport that is powered by AWS,” says Mark Lowe, strategic relations director at Pinacl. “If you’re setting up infrastructure the traditional way, in phase one, you have to set up to handle thousands of sensors when you might only want to start with 10. Using ConnecThing.io on AWS meant Newport could start small with very little investment or risk and figure out which projects delivered the most value.”

“Our experience in dealing with industrial IoT deployments across a number of market segments shows data is the primary determinant in achieving the business outcomes our customers seek,” said Dave McCarthy, Bsquare Senior Director of Products. “By making DataV Discover available in AWS Marketplace, businesses can quickly determine IoT use cases that their data will support, thereby reducing risk and maximizing the probability of success.”

Now, you can more easily navigate, discover, and purchase the software and services needed as we build successful IoT solutions and applications to fuel innovation and their business.

Get started building your IoT solution by visiting: https://aws.amazon.com/mp/iot.

Inside SendGrid’s Expanded Relationship with AWS

In today’s world of technology dependencies, ensuring your application’s messages reach the intended recipients is critical. That’s why some of the largest and most successful software-defined organizations such as Airbnb, Spotify, Uber, and Yelp rely on SendGrid as their customer communication platform for their most valuable email.

Unlike most high-growth tech start-ups, SendGrid was born in the data center. Following the company’s launch in 2009, it built a high-performing and highly scalable private, internal cloud, achieving cost-efficient price-to-performance and four-hour server deploy times. Despite this, SendGrid chose to aggressively expand their relationship with AWS as their single, strategic cloud partner. Important factors in this decision included the business and marketing advantages of the AWS Partner Network (APN), AWS Marketplace, and the technology enablement advantages derived from elastic capabilities and managed services. So we are excited to officially announce SendGrid is joining the AWS family as an APN Partner and customer.

Inside SendGrid’s Expanded Relationship with AWS

By JR Jasperson, Chief Architect at SendGrid, an APN Technology Partner

I am excited to unveil two related blockbuster projects we’ve been working on for a long time. The first is a robust, expanded relationship with AWS. At SendGrid, we have been running many critical workloads with many cloud providers – including AWS – for many years. However, through our relationship with AWS, we will be deepening and broadening our collaboration and aggressively expanding our footprint with AWS. Below, we’ll share additional details into why, and what this will mean for our customers.

The second announcement is that we’ve been quietly working on a complete re-architecture of our mail processing pipeline for more than a year, and will roll these changes out throughout the remainder of 2017. While worthy of its own blog post, the new architecture was created and intended from day one to lay the groundwork for the SendGrid-AWS relationship. Our new pipeline is truly cloud-native and designed from the ground up to run optimally on AWS. In so doing, we’ve avoided several pitfalls that have beset other mail providers, so it’s helpful to tell both stories to highlight the advantages of AWS that we will be passing along to our customers.

Some Background

When SendGrid was founded in 2009, we created the notion of a transactional Email Service Provider (ESP)—sometimes referred to as a Cloud Mail Transfer Agent (MTA). Over the years, we have developed unique expertise at solving for the interesting and unique challenges that arise from serving an incredibly high volume (we surpassed the 1 trillion emails sent mark in March 2017), multi-tenant SaaS email delivery platform. Many of these challenges are unique to this sector, when running in a cloud environment such as the AWS Cloud. Experience with single-tenant and/or on-premise solutions is only a fraction of what it takes to be successful as a true Cloud MTA.

At SendGrid we strive to live by our 4 H values: Humble, Honest, Hungry and Happy and to be unwavering champions for our customers. So we’re proud that our customers put more value on our differentiators: ease of use, simple integration, deliverability, proven availability, scalability, and performance—we feel that these numbers speak for themselves.

Why Now?

The adage “success lives at the intersection of preparation and opportunity” immediately comes to mind while considering the question of “why become an APN Partner now”? In our case, preparation meant re-architecting to a cloud-native platform. Two examples immediately stand out to illustrate why we viewed this as a prerequisite.

Email is delivered through a store-and-forward mechanism which naturally implies statefulness in the system. As other “Cloud MTAs” have discovered, stateful compute is a poor fit for the cloud as it erodes many of its advantages. This is why we have decoupled state from our MTAs in the new architecture which allows us to take full advantage of AWS’ compute elasticity.
In regards to IPs, deliverability is a measure of how many emails dispatched by a sender actually arrive in the recipient’s inbox as intended. It is perhaps the most important metric for an Email Service Provider. Surprisingly, deliverability is arduous to maintain, and one area in which SendGrid especially excels. Deliverability is not only highly dependent on the reputation of the specific IP address from which a given email is sent, but can also be negatively impacted due to bleed-over effects from dubious activity or poor reputation from other IPs in the same subnet. This is why SendGrid controls the entirety of the subnets from which we send and have ensured this will always remain the case as we transition additional workloads into AWS. We’ve spent years cultivating our IPs, and the 50,000 plus paying customers who use these IPs are a testament to their viability. Our move to AWS takes into account that reputations are hard to build and easy to destroy, hence why we will retain full control of all of our IPs during and after our move to AWS. Surprisingly, this is not the case with all “Cloud MTAs.”

Customer Advantages

With much of the prep work now behind us, we are well positioned to turbocharge our mission to become the world’s most trusted communication platform—our relationship with AWS has been born not of necessity, but opportunity. While the most commonly-cited benefits of AWS certainly apply in our case, I’ll skip these well-covered facets and instead highlight several that are more nuanced to a Cloud MTA and most importantly how we expect our customers to benefit from these advantages.

For years, SendGrid has maintained distributed Points of Presence (PoPs) around the globe to quickly respond to API and SMTP requests. The performance and responsiveness of these PoPs is a function of their proximity to our customers. We will be leveraging AWS’ immense global reach to create additional PoPs in many AWS regions, further reducing latency and increasing throughput for our customers.
Running a high-scale Cloud MTA requires near real-time analysis of a massive flow of data behind the scenes—challenges that require stream processing and machine learning technologies. We manage the infrastructure for these systems today, which naturally require overhead to build and maintain. Leveraging AWS capabilities such as Amazon Kinesis Streams, Amazon EMR, and Amazon Machine Learning will enable us to do more faster, allowing us to spend more time developing features for our customers.
The ability to visualize or deeply analyze facets of messaging campaigns (such as deliverability by recipient domain or click/open rates of A/B tests) is critical to maximize the value that SendGrid provides to our customers. Supporting and continuously enhancing these capabilities represents a nontrivial engineering challenge for SendGrid  due both to our substantial volume (finding and associating discrete events in an enormous sea of data) and providing the flexibility necessary to meet diverse requirements. AWS offers a number of analytics capabilities such as Amazon Athena, Redshift, and AWS Data Pipeline that we have already begun to explore as more efficient and advanced ways for our customers to use SendGrid to optimize their messaging.

We will be leveraging AWS to allow us to focus on delivering even more value to our customers at an accelerated rate. While we’ve already been working on these projects for quite some time, we know these are just the first steps in an exciting journey— if you haven’t already, come join us and help shape the future of messaging!

Partner SA Roundup – July 2017

This month, Juan Villa, Pratap Ramamurthy, and Roy Rodan from the Emerging Partner SA team highlight a few of the partners they work with. They’ll be exploring Microchip, Domino, and Cohesive Networks.

Microchip Zero Touch Secure Provisioning Kit, by Juan Villa

AWS IoT is a managed cloud platform that enables connected devices to easily and securely interact with cloud applications and other devices. In order for devices to interact with AWS IoT via the Message Queue Telemetry Transport (MQTT) protocol, they must first authenticate using Transport Layer Security (TLS) mutual authentication. This process involves the use of X.509 certificates on both the devices and in AWS IoT. A single certificate contains a private and a public key component. An IoT device needs to store the private key that corresponds to its certificate in order to establish a TLS connection with mutual authentication.

Private keys can be somewhat difficult to store securely on IoT devices. It’s easy to simply store data on a device’s local memory, but this is not enough to protect the key from tampering. It’s quite easy, and affordable, to purchase the necessary hardware to read the content of the memory from most microcontrollers and memory components used on IoT devices. This means that private keys used for authentication and establishing trust need to be stored in a secure manner.

This is where a secure element chip comes in! Microchip, an APN Advanced Technology Partner, is a silicon manufacturer that makes a secure element chip called the ATECC508A. This chip has a hardware-based secure key storage mechanism that is tamper-proof. In fact, once a key is stored in the ATECC508A, its contents cannot be read. The chip accomplishes this with hardware-based cryptographic acceleration features that allow it to perform cryptographic operations very quickly and with power efficiency. When considering ATECC508A for your product, keep in mind that Microchip can preload certificates on the secure element during manufacturing, before delivery. Combining this feature with AWS IoT’s support for custom certificate authorities and just-in-time registration can streamline device provisioning and security.

To make this secure element chip easy for you to try out, Microchip makes an evaluation kit called the Zero Touch Secure Provisioning Kit. This kit includes a SAM G55 Cortex-M4 microcontroller, the ATECC508A secure element, and an ATWINC1500 power-efficient 802.11 b/g/n module, and comes with instructions on how to get started with AWS IoT. With this combination of silicon products you can begin testing and developing your next IoT product in a secure fashion.

Before you work on your next IoT project, I recommend that you consider a secure element in your design. For more information on ATECC508A, please read the datasheet on the Microchip website.

Domino Data Science Platform, by Pratap Ramamurthy

Machine learning, artificial intelligence, and predictive analytics are all data science techniques. Data scientists analyze data, search for insights that can be extracted, and build predictive models to solve business problems. To help data scientists with these tasks, a new set of tools, like Jupyter notebooks, as well as a wide variety of software packages ranging from deep learning neural network frameworks, like MXNet, to CUDA drivers, are becoming popular. Data science as a field is growing rapidly as companies increase their reliance on these new technologies.

However, supporting a team of data scientists can be challenging. They need access to different tools and software packages, as well as a variety of servers connected to the cloud. They want to collaborate by sharing projects, not just code or results. They want to be able to publish models with minimal friction. While data scientists want flexibility, companies need to ensure security and compliance. Companies also need to understand resource how resources like data and compute power are being used.

Domino, an APN Advanced Technology Partner, solves these challenges by providing a convenient platform for data scientists to spin up interactive workspaces using the tools that they already know and love e.g., Jupyter, RStudio, Zeppelin, as well as commercial languages like SAS and Matlab, as seen in the diagram below.

Image used with permission

In the Domino platform, users can run experiments on a wide variety of instances that mirror the latest Amazon EC2 options provided by AWS, as seen in the screenshot. Customers can run a notebook on instances with up to 2 TB of RAM with the AWS X1 instance family. If more computational power is needed, you can switch the same notebook to GPU instances as necessary or connect to a Spark cluster.

Because the software used for data science and machine learning has several layers, and new software technologies are introduced and adopted rapidly, the data science environment is often difficult to deploy and manage. Domino solves this problem by storing the notebooks, along with the software dependencies, inside a Docker image. This allows the same code to be rerun consistently in the future. There is no need to manually reconstruct the software, and this saves valuable time for data scientists.

Domino helps data scientists share and collaborate. They have introduced the software development concepts of code sharing, peer review, and discussions seamlessly into the data science platform.

For companies that have not yet started their cloud migration, Domino on AWS makes data science an excellent first project. Domino runs entirely on AWS and integrates well into many AWS services. Customers who have stored large amounts of data in Amazon S3 can easily access it from within Domino. After training their models by using this data, they can easily deploy their machine learning model into AWS with a click of a button, and within minutes access it using an API. All of these features help data scientists focus on data science and not the underlying platform.

Today, Domino Data Science Platform is available as a SaaS offering at the Domino website. Additionally, if you prefer to run the Domino software in your own virtual private cloud (VPC), you can install the supporting software by using an AWS CloudFormation template that will be provided to you. If you prefer a dedicated VPC setting, Domino also offers a managed service offering, which runs Data Science Platform in a separate VPC. Before considering those options, get a quick feel for the platform by signing up for a free trial.

Cohesive Networks, by Roy Rodan

Many AWS customers have a hybrid network topology where part of their infrastructure is on premises and part is within the AWS Cloud. Most IT experts and developers aren’t concerned with where the infrastructure resides—all they want is easy access to all their resources, remote or local, from their local networks.

So how do you manage all these networks as a single distributed network in a secure fashion? The configuration and maintenance of such a complex environment can be challenging.

Cohesive Networks, an APN Advanced Technology Partner, has a product called VNS3:vpn, which helps alleviate some of these challenges. The VNS3 product family helps you build and manage a secure, highly available, and self-healing network between multiple regions, cloud providers, and/or physical data centers. VNS3:vpn is available as an Amazon Machine Image (AMI) on the AWS Marketplace, and can be deployed on an Amazon EC2 instance inside your VPCs.

One of the interesting features of VNS3 is its ability to create meshed connectivity between multiple locations and run an overlay network on top. This effectively creates a single distributed network across locations by peering several remote VNS3 controllers.

Here is an example of a network architecture that uses VNS3 for peering:

The VNS3 controllers act as six machines in one, to address all your network needs:

Router
Switch
SSL/IPsec VPN concentrator
Firewall
Protocol redistributor
Extensible network functions virtualization (NFV)

The setup process is straightforward and well-documented with both how-to videos and detailed configuration guides.

Cohesive Networks also provides a web-based monitoring and management system called VNS3:ms in a separate server, where you can update your network topology, fail over between VNS3 controllers, and monitor your network and instances’ performance.

See the VNS3 family offerings from Cohesive Networks in AWS Marketplace, and start building your secured, cross-connected network. Also, be sure to head over to the Cohesive Networks website to learn more about the VNS3 product family.

APN Partner Webinar Series – APN Partners in DevOps, Security, Storage, and Big Data

Learn how businesses take advantage of AWS technologies to solve business problems in this series of upcoming webinars. They feature a broad selection of live, online presentations at varying technical levels that cover a wide range of topics, including Migration, Storage, Security, Big Data, and DevOps .

Each webinar is hosted by an AWS Solutions Architect and an AWS Competency Partner, going into a technical deep dive with live demonstrations, customer examples, and expert Q&A sessions. Join one of the webinars below to learn more!

DevOps Webinars

New Relic: New Relic helped MLBAM utilize the scalability of AWS and the visibility provided by New Relic to create the “gold standard” for digital streaming video infrastructure.

Register for Upcoming Webinar: July 25th, 2017 | 11am-12pm PST

Security Webinars

Splunk: Splunk offers a leading platform for Operational Intelligence, enabling AWS users to look closely at machine data and gain actionable insights that can help make your organization more productive, profitable, competitive, and secure.

Register for Upcoming Webinar: July 26th, 2017 | 9am-10am PT

Threat Stack: Threat Stack helps organizations detect possible intrusions in seconds and protect against data loss with continuous visibility.

Register for Upcoming Webinar: August 3rd, 2017 | 10am-11am PT

Trend Micro: Trend Micro Deep Security on AWS provides organizations with an automated, scalable solution with maximum workload security.

Register for Upcoming Webinar: August 16th, 2017 | 10am-11am PT

Sophos: Sophos UTM on Amazon Web Services(AWS) provides organizations with an all-in-one security solution that enables them to easily enforce usage policies, control outbound access, filter content, defend against malware, and more.

Register for Upcoming Webinar: August 17th, 2017 | 10am-11am PT

F5: F5 BIG-IP Virtual Edition addresses these three areas while also optimizing application traffic throughout your network. When paired with AWS, F5 BIG-UP VE enables customers to enhance security while balancing and managing comprehensive application traffic.

Register for Upcoming Webinar: August 23rd, 2017 | 10am-11am PT

Storage Webinars

CTERA: The CTERA Enterprise File Services Platform for Amazon Simple Storage Service (Amazon S3) provides enterprise IT-as-a-Service organizations and public cloud service providers a quick, uncomplicated way to leverage their cloud storage to launch a variety of storage-as-a-service offerings.

Register for Upcoming Webinar: July 25th, 2017 | 10am-11am PT

NetApp: NetApp ONTAP Cloud is a data management solution that provides protection, visibility, and control for your cloud-based workloads in a hybrid cloud environment.

Register for Upcoming Webinar: July 26th, 2017 | 10am-11am PT

Panzura: Using Panzura Freedom, enterprises can transition from an expensive and unwieldy traditional storage model to a simple, economical, and secure hybrid cloud storage infrastructure

Register for Upcoming Webinar: August 2nd, 2017 |10am-11am PT

Avere: Avere offers an agile, enterprise hybrid cloud platform that lets you leverage cloud compute, storage, or both and keep data where it makes the most sense—without lofty storage costs, latency, and security concerns.

Register for Upcoming Webinar: August 3rd, 2017 | 8am-9am PT

Big Data Webinars

Tableau, Matillion, 47Lining, NorthBay: This IPC highlights newly developed Quick Start with Tableau, consulting offer Jumpstarts with 47 Lining and with NorthBay.

Register for Upcoming Webinar: August 22nd, 2017 | 10am-11am PT

Automation in the Cloud

Continuing our MSP Partner Spotlight series from last week’s post, Unlocking Hybrid Architecture’s Potential with DevOps, automation is another critical area of capability for next generation Managed Service Providers (MSPs). Automation incorporating elements such as configuration templates, code deployment automation, and self-healing infrastructure reduces the need for manual interventions, the potential for errors, and the operating costs for MSPs. This week we hear from Cloudreach (APN Premier and MSP Partner, with numerous AWS Competencies) and their perspective on the value of automation in the cloud.

Automation in the Cloud

By: Neil Stewart, Cloud Systems Engineer, Cloudreach

Before my life at Cloudreach, my understanding of a lot of relevant technologies and terminologies were non-existent. I was inspired by a recent Cloudreach blog post about our placement as a Leader in the Gartner Managed Services Magic Quadrant, as well as the blog post about the flexibility of working here, and it got me thinking about my experience so far and how things have progressed.

I joined Cloudreach fresh out of University in May 2014. From there I was given the opportunity to show what I could do with a little time and bright people around me to learn from. Quickly, I began to learn the tricks of the trade when working in the cloud, and more importantly, while working in a managed services environment such as a Cloud Operations team. I learned how to do a variety of things that were totally new to me, such as how to navigate and use Linux, diagnose a Microsoft SQL Server mirroring setup, and write my first Ruby script to delete old AMI’s in AWS. I was able to learn to appreciate command line over GUI and how much you could do with code and scripting. Which leads me to the point of this post.

Automating all the things

I love automation. I have smart lights, smart speakers, and a smart kettle that all have automation involved at home. It can be as simple as turning a light on when I walk into a room or boiling a kettle in the morning when I wake up. Automation is fantastic.

While automation in my personal life is fun, efficient, useful and awesome, automation in the cloud, especially from a cloud operations perspective, is essential. For example, rebooting an instance after an update is fine for the first time for a single instance, but doing it more than 30 times is painful!

I love to approach these asks with a “Let’s automate that” frame of mind. Some examples of automation we often use at Cloudreach include running a script on a fresh AWS account that will identify all default VPC’s in every region and delete associated resources as well as the VPC itself. Sounds simple? It is. However, as AWS adds more regions, this task takes longer. Repeat that across lots of new customer accounts and… you get where this is going.

Writing some code to perform a task like this is not difficult; when you approach other tasks in this way, it only becomes easier. Consider the example below:

import boto3

client = boto3.client('ec2',region_name='eu-west-1')  
regions = [region['RegionName'] for region in client.describe_regions()['Regions']]

for region in regions:

    print "Finding VPCs in {}".format(region)
    client = boto3.client('ec2', region_name=region)
    vpcs = client.describe_vpcs()['Vpcs']

    default_vpc = [x for x in vpcs if x['IsDefault'] == True]

    if len(default_vpc) > 0:
        default_vpc = default_vpc[0]
        print "Found Default VPC {}".format(default_vpc['VpcId'])

        delete = raw_input("Would You like to delete {}?(Y/N)".format(default_vpc['VpcId'])).lower()

        if delete == 'y':
            print "Deleting {}".format(default_vpc['VpcId'])

            subnets = [x['SubnetId'] for x in client.describe_subnets(
                Filters=[{
                    'Name': 'vpc-id',
                    'Values': [
                        default_vpc['VpcId']
                    ]
                }]
            )['Subnets']]

            internet_gateways = [x['InternetGatewayId'] for x in client.describe_internet_gateways(
                Filters=[{
                    'Name': 'attachment.vpc-id',
                    'Values': [
                        default_vpc['VpcId']
                    ]
                }]
            )['InternetGateways']]

            for internet_gateway in internet_gateways:
                client.detach_internet_gateway(
                    VpcId=default_vpc['VpcId'],
                    InternetGatewayId=internet_gateway
                )
                client.delete_internet_gateway(
                    InternetGatewayId=internet_gateway
                )

            for subnet in subnets:
                client.delete_subnet(
                    SubnetId=subnet
                )

            client.delete_vpc(
                VpcId=default_vpc['VpcId']
            )

        else:
            print "Not Deleting {}".format(default_vpc['VpcId'])

    else:
        print "No Default VPC found in {}".format(region)

Ok, this could go on for pages, but you get the idea. Easy, right? There are a lot of improvements you could make to this, but in its simplest form, this is a great example of automating a small and simple task that you don’t need to do manually. Lovely.

Automation at an MSP Level

Simple scripts are great. The power of automation in an MSP environment really shines through when you have lots of these simple scripts that all trigger and run when they need to. This is the difference between working on simple and small environments versus the management and monitoring of multiple large-scale, growing, and sophisticated environments. As our customers shift towards highly scalable and serverless applications and away from more monolithic architecture, automation is less “nice to have” and more “you had better get on the wagon before the wagon runs you over.”

Looking at this in a more real world sense, let us imagine we have some applications running in the cloud that we want to apply automation to.

Backup taking and retention

Backups and retention of backups is automation 101. We need to be able to back up servers that are not stateless, such as database servers. This can be as simple or as sophisticated as you like. Implementing something like AWS Lambda and an Amazon CloudWatch event to trigger a backup function as often as needed is simple. A function that generates a list of required instances to be backed up, and then fires a process to back each of them up in parallel, is more effective.

As part of this solution, retention of backups is important too. This can be another AWS Lambda function. It could be configured to run daily to check all backups that have already been taken to determine whether or not it has passed its use-by date. If it has, delete it.

Without much effort, you can have a quick and simple backup solution in place—no manual manual work required once in place and it scales. You could tie all of this together with an AWS API Gateway and a Describe function and you have a new backup taking and reporting API.

At Cloudreach, we work with customers to implement backup solutions that work within their requirements. This might take shape as AWS Lambda functions as explained above, as third-party products, or custom solutions developed for the customer. Within the Cloud Operations team, we also use in-house tools that allow us to easily automate backups and deal with retention too.

Security Compliance

Automation and security are a perfect match. Where you enable automation within security can vary greatly. A great example of this in place would be security group auditing.

Keeping your resources secure in AWS is important and there are plenty of ways to do it. Security groups and their rules are one of the simplest but also one of the most powerful security features in AWS and an important layer to control. Whether it be accidentally leaving remote access open to any IP address, or a developer opening access from a coffee shop IP address so they can work can work more easily—these situations are not just bad, they can also potentially violate security policies and compliance standards.

These are both examples of where we can automate to mitigate.

Cloudreach has helped implement functionality for customers where we can alert and report on security group changes. We can restrict users from an IAM perspective so that security group creation has to go through an approval process. This works well but can be time consuming to implement. More simply, we can implement a AWS Lambda function that is triggered each time a security group is created or changed using AWS Config or Cloudwatch Events. Once triggered, the function checks that security group, checks if the ports and sources in that rule are valid—possibly against a configuration file in Amazon S3 or against an RDS database table of allowed IPs/Sources. If a rule in a group is not allowed, it removes the rule if it is an addition to an existing group, or deletes it from the group if the rule was added as part of the group creation.

Either way, we can report on the “breach” through something like SNS or a logging tool such as Splunk. Most importantly, the time spent in violation of security policy is minimized to seconds, rather than waiting on an alert to be triggered and investigated by an engineer.

Code deployments

Introducing a CI/CD platform to integrate with your source control system is an awesome way to introduce automation into your development cycle. This is an area that is exciting to get involved in. An effective and deep pipeline integration can enable your team to push minor code changes to dev/pre-prod but can also be expanded to full on deployments to production.

Cloudreach helps customers manage their CI/CD pipelines by working with them to ensure the infrastructure behind the scenes is running as it should and, if issues arise, they are resolved. We also work with customers from very early on in a cloud enablement or agile ops project to figure out where we can incorporate CI/CD automation as well as how they can manage the risk of moving to automated deployments. We encourage our customers to keep this in mind from day one and push the subject as a must-have rather than a nice-to-have.

AWS Infrastructure changes

Similar to code deployments, infrastructure changes paired with Jenkins and a source control system are powerful and fast.

Here you want to look at using AWS CloudFormation as much as possible; we recommend adopting Sceptre, Cloudreach’s open-source tool for AWS CloudFormation template development and deployment. It has commands that can be used in the testing, approval, and deployment of new and updated infrastructure in AWS.

This setup is useful for changes to sensitive resources, such as IAM, Security Groups or VPC components. With a CD pipeline in place, you can restrict changes to these resources to only people who are allowed and only to changes that pass a set of standards and approval.

Moving on

I hope it has been helpful to see how you can easily automate some key areas in working in the cloud. Focusing on automation helps deliver financial, security, and innovation benefits to a business and its teams. Pipelines allow you to control how changes are implemented and to what environments, keeping things secure and costs down when it comes to rolling back changes if something goes wrong. Imagine the revenue that could be lost if a production change that was manually deployed caused your application to fail. From an innovation perspective, automation of tasks allows for your teams to focus on more challenging and exciting work, such as improving application features or fixing bugs that may have been looked over when teams are stretched to focus on those tedious and often boring tasks.

Hopefully, this post will encourage you to implement automation in your cloud environments or at least look into how automation can help your business work more effectively in the cloud. At Cloudreach, automation is fundamental to working successfully and keeping up with the pace of change in the cloud. We’d love to hear some examples of how automation has been implemented by others and also hear your thoughts on where you think automation could be seen next.

Neil Stewart

Cloud Systems Engineer

Cloudreach

Financial Services meets FRTB Regulations using AWS

Learn how Financial Services partners like IHS Markit are using AWS to address financial regulatory requirements for financial risk management.

FRTB Regulation

New Basel financial risk regulations released in 2016 are transforming global bank risk systems. In January 2016, the Basel Committee on Banking Supervision published the Fundamental Review of the Trading Book (FRTB) standards, which introduced a new approach to evaluating risk across asset holdings, and will be mandated by 2020.

The FRTB regulation establishes a new expected shortfall (ES) measurement of risk under stress, which replaces previous value-at-risk (VaR) based models for internally modeled assets. This ES measurement is expected to require a tenfold increase in calculations, in part due to the requirement that risk factor shifts be assessed over multiple time horizons based on their liquidity. Furthermore, FRTB mandates daily risk attribution and back-testing requirements as well as intra-day monitoring of risk exposures that may put pressure on existing platforms as computational needs increase.

Meanwhile, capital restrictions are driving banks to seek increasingly cost-effective solutions that provide the agility to keep pace with regulatory changes while keeping costs under control.

In this post, we will explore some of the ways that the scalability and elasticity of the AWS Cloud are being used to support the need for large-scale financial calculations that may be required by many banks as they prepare to support FRTB and other financial risk requirements.

Key Changes to FRTB and their implications

A fundamental change from existing Basel 2.5/3 calculations is the use of more granular liquidity time horizons when determining risk via the Internal Models Approach (IMA). An asset held on a trading desk requiring IMA will now need profit and loss vectors at the risk factor level, with additional calculations aggregating time horizon permutations. Furthermore, the Standardized Approach (SA) to risk calculation is required even for desks using IMA.

Another challenge comes with the required validation of IMA calculations against real-world economic impact. Banks will need to prove that internal models are within a strict tolerance, or risk being held to the more conservative SA capital requirement.

These changes are estimated to require 10 times as many calculations for the average portfolio, which means that a significant increase in server capacity will be required to produce results in the expected timeframe.

FRTB Solution Suite from IHS Markit

One solution provider is taking a cloud-first approach to meet this challenge. IHS Markit, an Advanced Technology Partner and an APN Partner with AWS Financial Services Competency, launched its FRTB Solution Suite to address the significant expansion of risk measures and more stringent modeling methodologies required by this regulation. The FRTB Solution Suite is a next-generation risk management platform that uses the scalable computational power provided by AWS.

Why Amazon Web Services?

When developing the FRTB Solution Suite, IHS Markit evaluated AWS and other prominent cloud providers. “AWS meets our needs in terms of scalability, affordability, security, and its track record of successful implementations,” says Andrew Eisen, managing director, global head of cloud strategy and hosting services, at IHS Markit. “Using AWS, we can instantly deploy capacity for risk scenarios and dynamically scale the system to account for the complexity and data volumes those scenarios demand, providing us with the flexibility we need. Furthermore, being able to leverage a combination of reserved instances and spot pricing allows us to do so at a reasonable cost.” The FRTB Solution Suite uses AWS services such as Amazon Elastic Compute Cloud (Amazon EC2), Auto Scaling, Amazon EMR, and Amazon Simple Storage Service (Amazon S3) to optimize its platform.

Scalable, Secure, and Cost Efficient

Although banks require more compute capacity to meet regulations like FRTB, capacity demand is not static. Risk calculations may spike overnight and in highly volatile markets, and may be moderate otherwise. Auto Scaling enables dynamic resource use by monitoring CPU and memory utilization to detect times when additional horsepower is needed, and turn it off when it isn’t. And because you only pay for what you use, costs are kept under control.

Assessing liquidity for an asset at the risk factor level requires mining petabytes of historical transactions and prices. IHS Markit selected Apache Spark for its analytics engine because Spark is a distributed processing system that scales horizontally and is optimized for high-performance processing of large datasets. To ensure that computation costs are optimized and to free up developers to focus on developing the business rules specific to FRTB risk, Amazon EMR was chosen to run and manage the Spark cluster. Amazon EMR offers increased speed and agility by automatically scaling to meet demand, adding nodes when performance is constrained, and removing nodes that are underutilized. Amazon EMR is also integrated with the Amazon EC2 Spot Instance market, which keeps costs to a minimum by accessing unused Amazon EC2 capacity at steep discounts relative to On-Demand Instance prices.

IHS Markit wanted to ensure that their risk results were durable and available across Availability Zones, so the company chose Amazon S3 to store their risk data. They were able to take advantage of Amazon EMRs decoupling of storage and compute, by using Spark to query data directly in Amazon S3 without the need to load data into on-cluster HDFS.

By operating Spark as a cluster using Amazon EMR, IHS Markit avoids the administrative complexity of capacity management, data security management, and software deployment. These capabilities enable IHS Markit to focus time and energy on delivering business value to its customers rather than the undifferentiated heavy lifting of operating foundational components at scale.

Summary

As Financial Services organizations are using AWS to simplify access to technologies such as big data analytics, high performance computing and deep learning, a qualitative change to addressing business needs has become possible. Capitalizing on this newfound elasticity and velocity, firms are able to respond to regulatory and customer needs at an unprecedented pace. IHS Markit’s FRTB Solution Suite demonstrates how organizations are taking advantage of these capabilities to solve key financial market drivers.

If you have any questions or comments about cloud solutions for financial services, please contact us at apn-blog@amazon.com.

New AWS CloudFormation Stack Quick-Create Links further simplify customer onboarding

Post by Ian Scofield and Erin McGill

We recently wrapped up a series (Parts 1, 2, 3, and 4) on using AWS CloudFormation to ease the creation of cross-account roles during customer onboarding. It takes the reader through creating custom launch stack URLs for AWS CloudFormation, using an AWS Lambda function to generate a custom template with individualized parameters, and automatically sending the Amazon Resource Name (ARN) of the created cross-account role back to the SaaS owner. The process removes many of the manual steps involved in the creation of a cross-account role and the associated policy documents, reducing the chances of failure.

Although this solution simplified the workflow and helped reduce failure rates during onboarding, there were still two areas open to improvement:

We required the SaaS owner to customize each customer’s template and hardcode values. These templates needed to be stored, shared publicly, and then promptly deleted.
The AWS CloudFormation wizard contained multiple pages, and partners told us they wanted to streamline this process.

At AWS, we listen to our customers and partners to learn where we can improve, and our roadmap is almost exclusively driven by customer feedback. Based on the feedback we received on the customer onboarding process, I am pleased to announce that the AWS CloudFormation team has added the Stack Quick-Create Links feature which solves the issues we outlined above.

Embedding parameters in the launch stack URL – The AWS CloudFormation team has removed the need to store customized templates by adding the ability to embed parameter values directly in the launch stack URL.
Streamlined launch stack wizard – Users will now be directed to an AWS CloudFormation wizard that has been reduced to a single page.

Embedding Parameters in the launch stack URL

A launch stack URL makes it easy for customers to launch AWS CloudFormation templates by sending them straight to the AWS CloudFormation wizard with the template location and stack name pre-populated.

As a refresher, the URL looks like this:

https://console.aws.amazon.com/cloudformation/home?region=region#/stacks/new?
stackName=stack_name&templateURL=template_location

In the scenario we outlined in our series, we used a launch stack URL to help customers launch an AWS CloudFormation template and create a cross-account role in their AWS account. The template associated with the URL contained unique, customer-specific values for the trusted account ID and external ID, and needed to be generated for each customer. The template was then hosted in an S3 bucket until the customer launched it. We also required a cleanup method to ensure that templates didn’t remain accessible post-launch. However, this process was burdensome on the partner and required additional infrastructure, including multiple Lambda functions and an S3 bucket, to execute.

We discussed these challenges with the AWS CloudFormation service team, and they worked hard to resolve this problem and released a feature that lets you embed parameter values in the launch stack URL. This enables us to specify unique values for the trusted account ID and the external ID directly in the URL, which allows for the template to be generated on the fly. The partner no longer has to create, store, and ultimately delete the templates. In order to embed your parameters, just prepend the parameter name with param_ followed by your name=value pair.

The new syntax looks like this:

https://console.aws.amazon.com/cloudformation/home?region=region#/stacks/create/review?
stackName=stack_name&templateURL=template_location&param_name1=value1&param_name2=value2…

Here’s an example URL that we can use in our Cross-Account Role scenario:

https://console.aws.amazon.com/cloudformation/home?region=region#/stacks/create/review?
stackName=stack_name&templateURL=template_location&param_ExternalId=abcd1234&param_TrustedAccount=123456789012

Streamlined Launch Stack Wizard

You may have noticed above that another part of the URL has changed. The /stacks/new part of the URL has changed to /stacks/create/review. This new feature streamlines the AWS CloudFormation wizard to remove additional pages for certain use cases. Every partner strives to make the onboarding experience as quick and smooth as possible to reduce the risk that the customer will abandon the onboarding process.

Our earlier process required the customer to navigate through four separate sections, like this:

Figure 1: Traditional AWS CloudFormation launch stack wizard

When you change /stacks/new part of the URL to /stacks/create/review in your launch URL, customers will be greeted by a single review screen that doesn’t require them to click Next on any pages. If there are any additional parameters that don’t need to be pre-populated, they will be present here as well for the user to fill out. All they need to do is click the Create button at the bottom of the screen.

Figure 2: Streamlined AWS CloudFormation launch stack review page

As you can see, this drastically streamlines the process and enables a much quicker and smoother workflow for onboarding customers.

Here’s an example URL that we can use in our cross-account role scenario to generate the screenshot in Figure 2:

https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/create/review?templateURL=https://s3-us-west2.amazonaws.com/isco/wizard.yml&stackName=CrossAccountRoleSetup&param_TrustedAccount=123456789012&param_ExternalId=abcd1234

Note: This feature doesn’t currently support NoEcho or password parameters for security reasons.

Try it Out

With the addition of embedding parameters in the URL and the streamlined wizard, customers have a faster, smoother onboarding experience, and partners need less infrastructure to manage custom workflows. To learn more about these features, check out the AWS CloudFormation documentation.

Feel free to try out these two new features and let us know your thoughts in the comments below. If you have any ideas on how to make any of our services better or to improve the customer experience, please reach out and let us know!

Unlocking Hybrid Architecture’s Potential with DevOps

Last week in our MSP Partner Spotlight series, we heard from Jeff Aden at 2nd Watch and learned about the value that next gen MSPs can bring to their customers through well managed migrations and through 2nd Watch’s Four Key Steps of Optimizing Your IT Resources. Another area of new value that AWS MSPs can bring to their customers is management of their hybrid IT architecture, allowing customers at any stage of the cloud adoption journey to best leverage the AWS Cloud. This week we hear from Datapipe (APN Premier Partner, MSP Partner and holder of several AWS Competencies and AWS Service Delivery designations) as they discuss their approach and considerations in supporting their customers’ hybrid architectures.

Unlocking Hybrid Architecture’s Potential with DevOps

By David Lucky, Director of Product Management at Datapipe

Hybrid IT architecture, or what many customers call hybrid cloud, is increasingly prevalent in today’s fast-paced technology industry. Over the past few years, Datapipe has seen an initial reluctance towards cloud adoption transform into excitement, and hybrid architecture is emerging as a go-to solution for enterprise organizations looking for a way to manage their complex operations and run AWS as a seamless extension of their on-premises infrastructure.

Hybrid architecture gives organizations Application Programming Interface (API) accessibility, providing developers with programmatic access to control their environments through well-defined methods. APIs, commonly defined as “code that allows two software programs to communicate with each other,” are increasing in popularity in part due to the rise of cloud computing, and have steadily improved software quality over the last decade. Now, instead of having to custom develop software for a specific purpose, software is often written referencing APIs with widely useful features, which reduces development time and cost, and alleviates risk of error.

With API accessibility, developers can easily repurpose proven APIs to build new applications instead of having to manage them manually. This gives them more room to experiment and innovate and creates a culture of curiosity. In this way, the API accessibility of hybrid architecture leads to a necessary rebalancing of development and operations teams looking to solve problems earlier and more automatically than was previously possible with purely on-premises solutions.

To maintain the culture of curiosity that’s enabled by API accessibility through hybrid environments, we recommend organizations remove the silos that traditionally separate development and operations teams, and encourage open communication and collaboration – better known as DevOps. Implementing a DevOps culture helps organizations take advantage of a hybrid infrastructure to increase efficiencies along the entire software development lifecycle (SDLC). At Datapipe, we understand how critical the adoption of DevOps methodologies and agile workflows are for IT organizations to remain competitive and respond to the constantly evolving technology landscape. It’s the reason we expanded our professional services to include DevOps, and why we help organizations make the cultural switch to DevOps the right way, starting with people.

Individuals Over Tools

While many people conflate DevOps with an increase in automation tools, an organization can’t fully realize DevOps culture without starting with its people. A DevOps culture fosters open communication and constant collaboration between team members. It dissolves barriers between operations and development departments, giving everyone ownership over the SDLC as a whole, beyond their traditional, individual responsibilities. Being able to see the big picture allows team members to transition from being reactive to being proactive. That, in turn, involves shifting away from addressing problems as they arise to determining the root cause of the problem and finding a solution as a part of a continuous improvement mindset. Organizations that fully embrace this full-stack DevOps approach can provision a server in minutes instead of weeks, which is a vast improvement on the traditional SLDC model.

This mindset also means moving from a reactionary approach and solving problems through “closing tickets” to a proactive approach that involves consistently searching for inefficiencies and addressing them in real-time, so an organization’s software is continually improving at the most fundamental levels. Of course, addressing inefficiencies in the software also means addressing inefficiencies in workflows, which leads to the use of DevOps tools such as automation and writing reusable utilities.

However, productivity tools won’t increase efficiency on their own. An effective DevOps culture starts with open collaboration between team members, and then is reinforced by tools. At Datapipe, we see incorporating a DevOps culture through the lens of the “Agile Manifesto,” which promotes “individuals and interactions over processes and tools.” When you combine agile working practices with DevOps, you can manage change in a feature-focused manner, providing faster interaction and response. Managing change in this way means that organizations achieve their goals through a strong DevOps culture that automates the majority of the overall development and delivery process, enabling teams to focus on areas that create a differential experience. This takes time – and collaboration among team members – to set up. The real-time collaboration that marks a full-stack DevOps approach reduces the number of handoffs in a SDLC, thus accelerating the entire process, and decreasing an applications’ time-to-market.

Looking Ahead

Hybrid architecture growth is expected to continue. Industry analyst firm IDC predicts that 80 percent of all enterprise IT organizations will commit to hybrid architecture by the end of this year. This prediction is in line with what we’re seeing from our customers. As a next-gen MSP, we’ve seen an increase in enterprise companies looking for guidance on incorporating a DevOps culture to complement their digital transformations.

Take our work with British Medical Journal (BMJ), for example. BMJ started out over 170 years ago as a medical journal. Now, as a global online brand, BMJ has expanded to encompass 60 specialist medical and allied sciences journals with millions of readers. As a result of their dramatic growth, their old infrastructure could no longer support their application release process. In addition, as an increasingly global organization, BMJ’s capacity for allowing downtime – scheduled or otherwise – was diminishing. To solve this problem, BMJ needed to move to a sustainable development cycle of continuous integration and automation, which is only possible through a shift to a DevOps type culture. We helped BMJ implement this culture while assisting with changes to their infrastructure. The switch to a more open, collaborative culture not only allowed BMJ to implement a sustainable development cycle, complete with continuous integration and automation, but it also made them feel better prepared to take their next planned step of moving workloads to the AWS Cloud and embracing a hybrid environment. (More about how we helped BMJ move to a DevOps-oriented culture can be found here).

If you’re interested in leveraging DevOps to get the most out of your hybrid environment, we recommend starting with the following considerations:

Leverage object-oriented programming principles such as abstraction and encapsulation to build re-usable and parameterized components that can be assembled like building blocks. This can be done in configuration management with Chef Recipes, Puppet Modules, and Ansible Roles, or through infrastructure building blocks like Terraform Modules and AWS CloudFormation scripts.
When automating infrastructure management, test destruction as deeply as the creation process. This will give you the ability to iterate and test cleanly.
Balance the effort being put into upfront engineering versus operational management activities. More upfront engineering unlocks some great features with Auto Scaling on AWS. For more steady-state applications, the resources needed to set up and configure can sometimes be much less than the effort of working through automation. This makes it worthwhile to look for open-source modules to help you in your infrastructure and configuration management workflows.
For Auto Scaling groups within AWS, consider, as you engineer your process, the time tolerances your workload has from the time when AWS detects the need for a new instance to when they are fully operational. Fully-baked Amazon Machine Images tend to be the fastest time to operational, but this would require building an image for every version of your application. Packer is a great tool for this purpose. In addition, the more you embed user data or configuration management processes, the longer your instance will take to reach an operational state. Finally, keep in mind processes like domain joins and renaming of instances, which require reboots, can add time to the launch process and use them as sparingly as possible.
For a low-latency link between your resources in and out of the cloud, consider taking advantage of higher-level services like AWS Direct Connect, which provides a virtual interface directly to public AWS services and allows you to bypass Internet service providers in your network path. Datapipe client ScreenScape used Direct Connect to link their on-premises environment to Amazon CloudFront for a cloud environment that’s highly available, fully managed, and able to scale over time with proven capability. (Learn more here.)

Hybrid architecture offers organizations the power of both on-premises and cloud environments like AWS, giving them the tools to grow and innovate at a lower cost. For companies to fully capitalize on the benefits of these mixed environments, a culture change is necessary. By shifting to a DevOps culture and enabling teams to work together in a full-stack perspective, organizations can not only increase efficiency in their SDLCs, but also open up opportunities for immense engagement and creativity – qualities necessary for innovation. A next-generation MSP, with DevOps and Software-as-a-Service (SaaS) capabilities, can be a valuable guide for IT teams on their hybrid cloud journey. At Datapipe, we pride ourselves on being a next-generation MSP, and our proficiency with DevOps was a key differentiator that led to our position as a leader in the 2017 Gartner Magic Quadrant for Public Cloud Infrastructure Managed Service Providers, Worldwide. By partnering with a next-gen MSP, like those included in AWS Managed Service Partner program, organizations don’t have to make the shift to DevOps on their own.

To get started or for assistance on your cloud journey, contact us at www.datapipe.com

David Lucky

Director of Product Management

www.Datapipe.com

Why Next-Generation MSPs Need Next-Generation Monitoring

We wrote a couple of months ago about how ISVs are rapidly evolving their capabilities and products to meet the growing needs of next generation Managed Service Providers (MSPs), and we heard from Cloud Health Technologies about how they are Enabling Next-Generation MSPs with cloud management tools that span the breadth of customer engagements from Plan & Design to Build & Migrate to Run & Operate and to Optimize. Today we are sharing a guest post from APN Advanced Technology and SaaS Partner, Datadog, as they address the shift from traditional to next gen monitoring and how these capabilities elevate the level of value that an MSP can deliver to their customers.

Let’s hear from Emily Chang, Technical Author at Datadog.

Why Next-Generation MSPs Need Next-Generation Monitoring

To stay competitive in today’s ever-changing IT landscape, managed service providers (MSPs) need to demonstrate that they can consistently deliver high-performance solutions for their customers. Rising to that challenge is nearly impossible without the help of a comprehensive monitoring platform that provides insights into customers’ complex environments.

Many next-generation MSPs team with Datadog to gain insights into their customers’ cloud-based infrastructure and applications. In this article, we’ll highlight a few of the ways that MSPs use Datadog’s monitoring and alerting capabilities to proactively manage their customers’ increasingly dynamic and elastic workloads with

Full visibility into rapidly scaling infrastructure and applications.
Alerting that automatically detects abnormal changes.
Analysis of historical data to gain insights and develop new solutions.
Continuous compliance in an era of infrastructure-as-code.

Full visibility into rapidly scaling infrastructure and applications

As companies continuously test and deploy new features and applications, MSPs need to be prepared to monitor just about any type of environment and technology at a moment’s notice. Whether their customers are running containers, VMs, bare-metal servers, or all of the above, Datadog provides visibility into all of these components in one place.

Datadog’s integration for Amazon Web Services (AWS) automatically collects default and custom Amazon CloudWatch metrics from dozens of AWS services, including Amazon Elastic Compute Cloud (Amazon EC2), Elastic Load Balancing, and Amazon Relational Database Service (Amazon RDS). In total, Datadog offers more than 200 turn-key integrations with popular infrastructure technologies. Many integrations include default dashboards that display key health and performance metrics, such as the AWS overview dashboard shown below.

MSPs need the ability to monitor every dimension of their customers’ modern applications—as well as their underlying infrastructure. As customers continuously deploy new features and applications in the cloud, MSPs can consult a global overview of the infrastructure with Datadog, and then drill down into application-level issues with Application Performance Monitoring (APM), without needing to switch contexts. Datadog APM traces individual requests across common libraries and frameworks, and enables users to identify and investigate bottlenecks and errors by digging into interactive flame graphs like this one:

Infrastructure-aware APM gives MSPs full-stack observability for their customers’ applications, which is critical for troubleshooting bottlenecks in complex environments.

Alerting that automatically detects abnormal changes

Because today’s dynamic cloud environments are constantly in a state of flux, MSPs can benefit immensely from sophisticated alerts that can distinguish abnormal deviations from normal, everyday fluctuations. As customers’ infrastructure rapidly scales to accommodate changing workloads, what constitutes a normal/healthy threshold often will need to scale accordingly. Customers may also wish to track critical business metrics, such as transactions processed, which often exhibit normal, user-driven fluctuations that correlate with the time of day or the day of the week.

Both of these scenarios explain why threshold-based alerts, while helpful for many types of metrics, are not ideal solutions for detecting more complex issues with modern-day applications. To accommodate these challenges, next-generation MSPs need a monitoring solution that uses machine learning to automatically detect issues in their customers’ metrics. Datadog’s anomaly detection algorithms are designed to distinguish between normal and abnormal trends in metrics while accounting for directional trends, such as a steady increase in transaction volume over time and seasonal fluctuations.

Datadog also uses machine learning for outlier detection—algorithms that determine when a host or group of hosts behaves differently from its peers. This effectively enables MSPs to make sense of how resources are being used within a customer’s infrastructure, even as it rapidly scales to accommodate varying workloads. Whenever an outlier monitor is triggered, MSPs can consult the monitor status page, like the one shown below, to quickly understand when the outlier was detected, and which component(s) of the infrastructure it may impact.

Analyzing historical data to gain insights and develop new solutions

As their customers’ environments scale and grow increasingly complex, MSPs need an effective way to visualize how all of those components change over time. For historical analysis, all data is retained at full granularity for 15 months. This allows MSPs to analyze how their customers’ infrastructure and applications have evolved and develop strategies that help them make strategic decisions going forward. In addition to visualizing AWS services and other common infrastructure technologies in default dashboards, MSPs can create custom visualizations that deliver deeper insights into their metrics. These visualizations include:

Trend lines: Use regression functions to visualize metric trends
Change graphs: Display how a metric’s value has changed compared to a point in the past (an hour ago, a week ago, etc.)
Heat maps: Use color intensity to identify patterns and deviations across many separate entities. In the example below, a Datadog heat map shows Docker CPU usage steadily trending upward across a large ensemble of containers

Ensuring continuous compliance in an era of infrastructure-as-code

Infrastructure-as-code has revolutionized the way that organizations deploy new assets and manage their existing resources, enabling them to become more agile, continuously deploy new features, and quickly scale resources to respond to changing workloads. However, as these tools are more widely adopted, they also require organizations to monitor their assets more carefully, in order to meet compliance requirements.

Datadog integrates with key infrastructure-as-code tools like Chef, Puppet, and Ansible to provide MSPs with a real-time record of configuration changes to each customer’s infrastructure. Datadog also ingests AWS CloudTrail logs to help MSPs track API calls made across AWS services and aggregates them in the event stream for easy reference. In the example below, you can see that CloudTrail reports any successful and failed logins to the AWS Management Console, as well as any EC2 instances that have been terminated—and who terminated them.

With all of this data readily available, MSPs can track critical changes as they occur in real time and set up monitors to proactively audit and enforce continuous compliance of their customers’ AWS environments. They can also search and filter for specific types of changes in the event stream and then overlay them on dashboards for correlation analysis, as shown below.

Event-based alerts help MSPs automatically detect unexpected changes and/or immediately notify their customers about events that may endanger compliance requirements. These alerts can also be configured to trigger actions in other services through custom webhooks. By making all of this information available in one central location, Datadog prepares MSPs with the data they need to respond quickly to compliance issues.

Next steps for next-generation MSPs

Datadog is pleased to be able to provide monitoring capabilities that help MSPs navigate the challenges of delivering high-performance solutions for dynamic infrastructure and applications. To learn more about how Datadog helps fulfill AWS MSP Partner Program checklist items needed to apply for the AWS Managed Service Program, download our free eBook. You can also view a recording of our recent webinar with AWS and CloudHesive, “What is Means to be a Next-Generation Managed Service Provider” here.