From bare-metal to Kubernetes-渣渣翻译-从虚拟主机到k8s

原文地址:http://highscalability.com/blog/2019/4/8/from-bare-metal-to-kubernetes.html

This is a guest post by Hugues Alary, Lead Engineer at Betabrand,  a retail clothing company and crowdfunding platform, based in San Francisco. This article was originally published here.

retail :零售
crowdfunding :群众募资

这是Hugues Alary 写的一篇客座博文(客座博文是什么?),Hugues Alary是Betabrand的首席工程师。Betabrand是一家位于旧金山的衣服零售公司,也是一家众筹平台。这篇文章就是在这里发出的。

  • Early infrastructure
    • VPS
    • Rackspace
    • OVH
  • Hardware infrastructure
    • The scalability and maintainability issue
    • Scaling development processes
  • The advent of Docker
  • Kubernetes
    • Learning Kubernetes
    • Officially migrating
    • The development/staging environments
    • A year after

How migrating Betabrand's bare-metal infrastructure to a Kubernetes cluster hosted on Google Container Engine solved many engineering issues—​from hardware failures, to lack of scalability of our production services, complex configuration management and highly heterogeneous development-staging-production environments—​and allowed us to achieve a reliable, available and scalable infrastructure.

This post will walk you through the many infrastructure changes and challenges Betabrand met from 2011 to 2018.

migrating :迁移
bare :adj. 光秃秃的, 无遮蔽的,赤裸,刚好够的, 勉强 vt. 使赤裸, 使露出, 使暴露
metal :n. 金属
infrastructure :基础设施,基础结构
Kubernetes cluster :K8s 集群
failures :n. 失败;故障;失败者;破产
scalability :n. 可量测性,可伸缩性
heterogeneous :adj. 多种多样的;混杂的
staging :n. 分段运输;脚手架;上演;乘驿马车的旅行,v. 表演;展现;分阶段进行;筹划(stage的ing形式)

Betabrand的主机设施是如何一步步迁移到K8s 集群的,K8s 集群是一种可以帮我们解决各种工程问题,比如软件运行故障,生产服务可扩展性弱,复杂的配置管理,提供高异质性的 开发环境 - 测试环境 - 生产环境,为我们实现一个可靠的、可用、可扩展的虚拟主机。

这篇博客将会带你走过从2011年到2018年,Betabrand在虚拟主机的迁移过程中遇到的许许多多的改变和挑战。


Early infrastructure

VPS

Betabrand’s infrastructure has changed many times over the course of the 7 years I’ve worked here.

In 2011, the year our CTO hired me, the website was hosted on a shared server with a Plesk interface and no root access (of course). Every newsletter send—​to at most a few hundred people—​would bring the website to its knees and make it crawl, even completely unresponsive at times.

My first order of business became finding a replacement and move the website to its own dedicated server.

course :n. 课程 进程, 过程航向, 航线   一道菜
newsletter :时事通讯;业务通讯,内部通讯;新闻信札
dedicated :专用的,专注的

Rackspace

After a few days of online research, we settled on a VPS—​8GB RAM, 320GO Disk, 4 Virtual CPU, 150Mbps of bandwidth—​at Rackspace . A few more days and we were live on our new infrastructure composed of… 1 server; running your typical Linux, Apache, PHP, MySQL stack, with a hint of Memcached.

Unsurprisingly, this infrastructure quickly became obsolete.

Not only didn’t it scale at all but, more importantly, every part of it was a Single Point Of Failure. Apache down? Website down. Rackspace instance down? Website down. MySQL down… you get the idea.

Another aspect of it was its cost.

Our average monthly bill quickly climbed over $1,000. Which was quite a price tag for a single machine and the—​low—​amount of traffic we generated at the time.

After a couple years running this stack, mid-2013, I decided it was time to make our website more scalable, redundant, but also more cost effective.

I estimated, we needed a minimum of 3 servers to make our website somewhat redundant which would amount to a whopping $14400/year at Rackspace. Being a really small startup, we couldn’t justify that "high" of an infrastructure bill; I kept looking.

The cheapest option ended up to be running our stack on bare-metal servers.

Rackspace :全球三大云计算中心之一,1998年成立,是一家全球领先的托管服务器及云计算提供商,公司总部位于美国,在英国,澳大利亚,瑞士,荷兰及香港设有分部
settled :adj. 固定的;稳定的;v. 解决;定居(settle的过去分词)
settled on :决定
composed :adj. 镇静的,沉着的,vt. 组成, 构成
typical :adj. 典型的;特有的;象征性的
Unsurprisingly :不出所料
obsolete :adj. 老式的;废弃的n. 废词;陈腐的人vt. 废弃;淘汰
scale :n. 刻度;比例;数值范围;天平;规模;鳞
aspect :n. 方面
generate :发生
redundant :adj. 因人员过剩而被解雇的;不需要的; 多余的,在这里,作者应该是想要添加备份服务器,让系统可以更稳定
estimate :vi. 估计,估价;n. 估计,估价;判断,看法;vt. 估计,估量;判断,评价
whopping :adj. 巨大的;天大的;adv. 非常地;异常地
justify :vt. 证明…有理; 为…辩护
cheapest :最便宜的

OVH

I had worked in the past with OVH and had always been fairly satisfied (despite mixed reviews online). I estimated that running 3 servers at OVH would amount to $3240/year, almost 5 times less expensive than Rackspace.

Not only was OVH cheaper, but their servers were also 4 times more powerful than Rackspace’s: 32GB RAM, 8 CPUs, SSDs and unlimited bandwidth.

To top it off they had just opened a new datacenter in North America.

A few weeks later Betabrand.com was hosted at OVH in Beauharnois, Canada.

最早的基础架构

VPS

我在这里工作的7年时间里,Betabrand的虚拟主机建设更新换代过很多次。

在2011年,我们的CTO把我招了进来,我们的网站通过Plesk的接口部署在他们的共享服务器上,并且没有root权限。每一条要发给几百人的通讯消息,都会让网站变的脆弱不堪,像是在慢慢的爬行,有时候甚至会完全没有反应。因此,我的第一个任务就是找到可以替代方案,把我们的网站运行在他们的专用服务器上。

Rackspace

经过几天的网络搜索,我们选中了Rackspace的一台VPS(虚拟专用服务器)- 8GRAM,320G 硬盘,4核CPU,150M带宽。没过几天,我们就开始使用这个由一台虚拟主机组成的服务器;在上面运行Linux,Apache, PHP, MySQL,还带有一点Memcache缓存服务。

不出所料,这台虚拟主机没用到多久,就又变慢了。

不仅仅是它的不可扩展性,更重要的是,每一部分都是单点故障,Apache掉了,网站就掉了,Rackspace掉了,网站也掉了,数据库掉了.....ok,你已经明白了。

另一个方面是他的花费。

我们平均每个月的账单很快超过了$1000.这对于单台机器来说是一个非常高的标价,而且我们当时生成的流量很少。

在运行这一块技术服务几年后,2013 年年中,我决定是时候使我们的网站更具可扩展性、冗余性,而且更具成本效益。

我估计,我们需要至少3台服务器,让我们的网站可以更稳定,这将会让我们在Rackspace每年增加将近$14400的开销。作为一家规模很小的初创公司,我们无法证明这个“高明的”方案值得为账单付款;我一直在寻找。 最终,在裸机服务器上运行我们的网站,是最便宜的选择。

OVH

过去,我曾用OVH(OVH是一家法国云计算公司,于1999年创立于法国,主要特点为DDoS防御极高,流量清洗能力较为优秀)工作过,并且相当的满意。我估计在OVH上,3台服务器每年要花费$3240,比Rackspace便宜将近5倍。
不仅在于便宜,他们的服务器性能也是Rackspace的4倍:32GB的RAM,8个CPU,SSD,无限制的带宽。 最重要的是,他们在北美新建了一个数据中心。 几周之后,我们把Betabrand.com部署在了OVH在Beauharnois, Canada 的服务器上面。

Hardware infrastructure

Between 2013 and 2017, our hardware infrastructure went through a few architectural changes.

Towards the end of 2017, our stack was significantly larger than it used to be, both in terms of software and hardware.

Betabrand.com ran on 17 bare-metal servers:

  • 2 HAProxy machines in charge of SSL Offloading configured as hot-standby

  • 2 varnish-cache machines configured in a hot-standby load-balancing to our webservers

  • 5 machines running Apache and PHP-FPM

  • 2 redis servers, each running 2 separate instances of redis. 1 instance for some application caching, 1 instance for our PHP sessions

  • 3 MariaDB servers configured as master-master, though used in a master-slave manner

  • 3 Glusterd servers serving all our static assets

Each machine would otherwise run one or multiple processes like keepalived, Ganglia, Munin, logstash, exim, backup-manager, supervisord, sshd, fail2ban, prerender, rabbitmq and… docker.

However, while this infrastructure was very cheap, redundant and had no single point of failure, it still wasn’t scalable and was also much harder to maintain.

architectural :adj. 建筑学的;建筑上的;有关建筑的符合建筑法的
architectural changes :体系结构的更改
significantly :adv. 意味深长地,值得注目的
Offloading :卸载
hot-standby :热备份
separate :vt. 使分离;使分居;使分开;vi. 分开;分居;隔开;adj. 分开的;单独的
instances :实例
assets :n. 资产;有用的东西;有利条件;优点

Varnish Cache 是一个web应用程序加速器,也是一个HTTP反向代理软件
HAProxy是一个使用C语言编写的*及开放源代码软件,其提供高可用性、负载均衡,以及基于TCP和HTTP的应用程序代理
SSL网络通信提供安全及数据完整性的一种安全协议

硬件基础架构

在2013年到2017年之间,我们的硬件架构经过了几次体系结构上的改变。

到 2017 年底,我们的技术栈在软件和硬件方面都比过去大得多。

Betabrand.com运行在17台裸机服务器上面。

  • 2台HAProxy机器,主要用于SSL的卸载配置来作为热备份。(这里应该就是通过重新加载配置实现高可用的意思)
  • 2台varnish-cache 机器,配置成我们的web服务的热备份、负载均衡
  • 5台机器,运行Apache和PHP-FPM
  • 2台redis服务器,每一台都运行两个单独的实例,一个用于应用的缓存,一个用于保存PHP的sessions
  • 3台MariaDB 服务器,配置成主-主,虽然在使用中是主-从
  • 3台Glusterd 服务器,保存我们所有的静态资源。

The scalability and maintainability issue

Administering our server "fleet" now involved writing a set of Ansible scripts and maintaining them, which, despite Ansible being an amazing software, was no easy feat.

Even though it will make its best effort to get you there, Ansible doesn’t guarantee the state of your system.

fleet :adj. 快速的,敏捷的;n. 舰队;小河;港湾
involved :adj. 卷入的;有关的;复杂的;v. 涉及;使参与
guarantee :vt. 保证; 担保;n. 保证, 保障; 保证书; 保用期;担保, 担保人;担保品, 抵押品

For example, running your Ansible scripts on a server fleet made of heterogeneous OSes (say debian 8 and debian 9) will bring all your machines to a state close to what you defined, but you will most likely end up with discrepancies; the first one being that you’re running on Debian 8 and Debian 9, but also software versions and configurations being different on some servers and others.

I searched quite often for an Ansible replacement, but never found better.

I looked into Puppet but found its learning curve too steep, and, from reading other people’s recipes, was taken aback by what seemed to be too many different ways of doing the same thing. Some people might think of this as flexibility, I see it as complexity.

SaltStack caught my eyes but also found it very hard to learn; despite their extensive, in depth documentation, their nomenclature choices (mine, pillar, salt, etc) never stuck with me; and it seemed to suffer the same issue as Puppet regarding complexity.

Nix package manager and NixOS sounded amazing, to the exception that I didn’t feel comfortable learning a whole new OS (I’ve been using Debian for years) and was worried that despite their huge package selection, I would eventually need packages not already available, which would then become something new to maintain.

Those are the only 3 I looked at but I’m sure there’s many other tools out there I’ve probably never heard of.

heterogeneous :adj. 多种多样的;混杂的
Puppet :puppet是一个IT基础设施自动化管理工具
curve :n. 曲线;弯曲;曲线球;曲线图表
steep :adj. 陡峭的;夸大的;不合理的;急剧升降的
recipe :n. 烹饪法; 食谱;方法; 秘诀; 诀窍
flexibility :n. 柔韧性;机动性,灵活性
complexity :n. 复杂性,错综复杂的状态
taken aback :吃了一惊
aback :adv. 向后;处于顶风位置;向后地
SaltStack 是一个服务器基础架构集中化管理平台
caught :v. 捕捉(catch的过去分词)
extensive :adj. 广阔的, 广泛的; 大量的, 大规模的
nomenclature :n. 命名法;术语
stuck :v. 刺(stick的过去式)adj. 不能动的;被卡住的
suffer :vt. 忍受;遭受;经历;vi. 受损害;受痛苦;遭受,忍受;经验
regarding :prep. (表示论及)关于; 至于; 就…而论
unix与类unix系统,统称为*nix。
exception :n. 例外
eventually :adv. 终于, 最后

Writing Ansible scripts and maintaining them, however, wasn’t our only issue; adding capacity was another one.

With bare-metal, it is impossible to add and remove capacity on the fly. You need to plan your needs well in advance: buy a machine—​usually leased for a minimum of 1 month—​wait for it to be ready—​which can take from 2 minutes to 3 days--, install its base os, install Ansible’s dependencies (mainly python and a few other packages) then, finally, run your Ansible scripts against it.

For us this entire process was wholly unpractical and what usually happened is that we’d add capacity for an anticipated peak load, but never would remove it afterwards which in turn added to our costs.

It is worth noting, however, that even though having unused capacity in your infrastructure is akin to setting cash on fire, it is still a magnitude less expensive on bare-metal than in the cloud. On the other hand, the engineering headaches that come with using bare-metal servers simply shift the cost from purely material to administrative ones.

In our bare-metal setup capacity planning, server administration and Ansible scripting were just the tip of the iceberb.

capacity :n. 能力;容量;生产力;资格,地位
in advance :adv. 预先,提前
leased :adj. 租用的
entire :adj. 全部的,整个的;全体的
wholly :adv. 完全地;全部;统统
unpractical :adj. 不切实际的;不实用的;不现实的;行不通的
anticipated :vt. 先于…行动,预期
peak :n. 顶点;山峰;最高点;帽舌;vt. 使达到最高点;使竖起;adj. 最高的;最大值的;vi. 消瘦;到达最高点;变憔悴
infrastructure :基础设施
akin :adj. 同族的;同类的;类似的
magnitude :n. 巨大; 重要性
shift :n. 手段;移动;轮班;变化;vi. 移动;转换;转变;vt. 替换;转移;改变
purely :adv. 纯粹地;贞淑地;清洁地;完全地;仅仅,只不过
iceberb :冰山

可扩展性和可维护性的问题

现在,为了维护和管理我们的服务“集群”,我们需要写一套Ansible脚本,尽管,Ansible是一个神奇的软件,但是这绝对不是一件简单的事情。

即使,Ansible现在可以带给你最好的结果,但是他也不能总给你保证系统的状态。

例如:在由不同的操作系统(这里可能是的debian8 和 debian9)组成的服务器集群上运行我们的Ansible脚本,可以使我们所有的机器达到近似我们设定的状态。但是,最终也是会有差异的,第一个就是在debian8 和 debian9上面运行,但是这些不同的服务器上的软件的版本和配置都会有差异。

  我经常搜索Ansible的替代软件,但是,一直没有找到比Ansible更好的。

  我查到了Puppet,但是这个软件的学习曲线太陡峭了。并且,在阅读别人编写的操作指南的时候,做一件相同的事情有好多种不同的操作方法,我真的很吃惊。有些人可能会认为这事灵活性的体现,但是我觉得这让它变得很复杂。

  SaltStack这个软件进入了我的实现,但是我发现它也很难学。尽管它有大量的,写的很细致的文档。但是他的那些术语并没有打动到我,并且,它好像和Puppet有同样的-复杂性的毛病。

  Nix包管理和Nix系统,看起来很不错。例外的是,我有些反感学习一个新系统(我一直用Debian,并且用了许多年),并且,尽管它提供了大量可供选择的包,最终我还是没有找到我需要的软件包,这会变成一个新的需要管理的项目。

  这仅仅是我找出来的3个软件,但是我确定,肯定还有很多种我没有听说过的工具没有列出来。

然而,编写和管理Ansibe脚本并不是我们唯一的问题。另一个问题是,无法对设备进行升级(提升性能或者容量)。

使用裸机的时候,无法动态的对设备进行升级。你需要提前按照计划规划好你需要的性能:购买机器---首先需要先租用至少一个月---等它准备好---通常需要花费2分钟到3天的时间---安装系统---安装Ansible的依赖环境(主要是python和其他一些软件包)---最后运行你的Ansible脚本,可能你还需要对它进行调整,然后重来

对于我们来说,这整个过程是不切实际的。并且,通常情况下,增加的最大性能,一旦处于运行中,之后就不会将它移除了,这样就会增加我们的开销。

值得注意的是,尽管,设备中没有使用到的性能就像是在烧钱,重要的是,这依然比购买云服务器便宜。另一方面,使用裸机服务器引起的工程难题只是将成本从纯粹的材料花费转移到了管理成本。

在我们的裸机性能容量规划中,服务器的管理和Ansible脚本只是冰山一角。

Scaling development processes

In early 2017, while our infrastructure had grown, so had our team.

We hired 7 more engineers making us a small 9 people team, with skillsets distributed all over the spectrum from backend to frontend with varying levels of seniority.

Even in a small 9 people team, being productive and limiting the amount of bugs deployed to production warrants a simple, easy to setup and easy to use development-staging-production trifecta.

Setting up your development environment as a new hire shouldn’t take hours, neither should upgrading or re-creating it.

Moreover, a company-wide accessible staging environment should exist and match 99% of your production, if not 100%.

Unfortunately, in our hardware infrastructure reaching this harmonious trifecta was impossible.

Scaling :n. 缩放比例;鳞片排列;[医]刮治术,刮牙术;v. 刮鳞;剥落;生水垢(scale的ing形式)
infrastructure :n. 基础设施; 基础结构
distributed :adj. 分布式的
spectrum :n. 光谱;范围, 系列
frontend :前端
seniority :n. 年长;职位高;年资, 资历
productive :adj. 多产的, 富饶的;富有成效的; 有益的
warrants :n. 授权证; 许可证;vt. 使…显得合理; 成为…的根据;保证, 担保
trifecta :n. (赛马赌博的)三连胜式
Moreover :此外,而且
company-wide :全公司
accessible :adj. 容易取得的,容易获得的,容易达到的
harmonious :adj. 和谐的,和睦的;协调的,调和的;音调优美的;悦耳的

缩短开发进程

在2017年的年初,随着我们服务器数量的增多,我们的团队也增大了。

我们雇佣了7名工程师组成了9人团队,技能范围覆盖了从后端到前端的各个资历,各个级别。

即使是在9人的小团队中,要做到工作效率高,并且将bug的数量限制在一个合理的范围中,也需要一个简单的,易于设置,易于使用的开发--测试--生产的三大流程体系工具。

重新设置一个开发环境,不应该花费数个小时,也不应该升级或者重新安装。

此外,必须有全公司随时都能用的,用于中间测试的环境,并且和真实生产环境能达到99%的匹配(如果达不到100%的话)

The development environment

First of all, everybody in our engineering team uses MacBook Pros, which is an issue since our stack is linux based.

However, asking everybody to switch to linux and potentially change their precious workflow wasn’t really ideal. This meant that the best solution was to provide a development environment agnostic of developers' personal preferences in machines.

I could only see two obvious options:

Either provide a Vagrant stack that would run multiple virtual machines (17 potentially, though, more realistically, 1 machine running our entire stack), or, re-use the already written ansible scripts and run them against our local macbooks.

After investigating Vagrant, I felt that using virtual machines would hinder performances too much and wasn’t worth it. I decided, for better or worse, to go the Ansible route (in hindsight, this probably wasn’t the best decision).

We would use the same set of Ansible scripts on production, staging and dev. The caveat being of course that our development stack, although close to production, was not a 100% match.

This worked well enough for a while; However, the mismatch caused issues later when, for example, our development and production MySQL versions weren’t aligned. Some queries that ran on dev, wouldn’t on production.

potentially :adv. 潜在地;可能地
precious :adj. 宝贵的;珍贵的;矫揉造作的
agnostic :n. 不可知论者;adj. 不可知论(者)的
obvious :明显的,显而易见的
potentially :
realistically :adv. 现实地;实际地
investigating :调查
hinder :vt. & vi. 阻碍; 妨碍
hindsight :n. 事后的觉悟;事后的聪明
caveat :n. 警告;中止诉讼手续的申请;货物出门概不退换;停止支付的广告

 

开发环境

首先,我们团队中的所有开发工程师都是使用MacBook Pros,因为我们的代码运行在Linux上,因此这是一个问题。

然而,要求他们都切换到Linux,并且很可能改变他们宝贵的工作习惯,不是一个好主意。这意味着,最好的解决办法就是在机器上提供一个开发环境,并不考虑开发者个人的喜好。

只有两个显而易见的选择:

要么提供一个Vagrant虚拟机运行多个虚拟主机(更实际地说,可能有17台主机在一台机器上运行我们的整个项目),要么用已经编写好的ansible脚本,并在本地macbooks上运行它们。

在调查了Vagrant之后,我觉得使用虚拟机,会阻碍太多的表现,不值得。无论好坏,我决定Ansible这条路(事后看来,这可能不是最好的决定)。

我们将在生产、测试和开发上使用相同的Ansible脚本。需要注意的是,我们的开发堆栈虽然接近生产,但并不是100%匹配。

刚开始的时候,运行的挺好的;但是,当我们的开发和生产MySQL版本不一致时,这种不匹配就会导致后面的问题,一些在开发上可以运行的查询,在生产环境上却不能运行。

The staging environment

Secondly, having a development and production environments running on widely different softwares (mac os versus debian) meant that we absolutely needed a staging environment.

Not only because of potential bugs caused by version mismatches, but also because we needed a way to share new features to external members before launch.

Once again I had multiple choices:

  • buy 17 servers and run ansible against them. This would double our costs though and we were trying to save money.

  • setup our entire stack on a unique linux server, accessible from the outside. Cheaper solution, but once again not providing an exact replica of our production system.

I decided to implement the cost-saving solution.

An early version of the staging environment involved 3 independant linux servers, each running the entire stack. Developers would then yell across the room (or hipchat) "taking over dev1", "is anybody using dev3?", "dev2 is down :/".

Overall, our development-staging-production setup was far from optimal: it did the job; but definitely needed improvements.

absolutely :adv. 完全地,绝对地
replica :复制品
implement :vt. 使生效, 贯彻, 执行 ;n. 工具, 器具, 用具
independant :adj. 独立的;单独的;无党派的;不受约束...
definitely :    adv. 明确地, 确切地     一定地, 肯定地

 

测试环境(交付准备环境)

其次,在不同的软件系统上(mac os或者 debia)运行我们的生产和开发环境,意味着我们必须要有一个交付准备环境。

这不仅是因为版本不匹配导致潜在的bug,还因为我们需要在启动之前向外部成员共享新特性。

我又一次有了多个选择:

  • 购买17台服务器,然后在上面运行Ansilbe。这么做会让我们的成本翻倍,我们本来就是打算要节省开销的。
  • 把我们所有的技术栈放在一台Linux服务器上,所有人都从外部访问,这是个省钱的方案,但是依然没有提供精确的生产环境副本。

我决定执行那个省钱的方案。

一个早期的测试环境包含3个独立的Linux服务器,每一个都运行着全部的技术栈。然后,开发人员会在房间里大声说(或者嘻嘻哈哈)“接管一下 dev1”,“有人在用dev3吗?”,“dev2关机了”

总的来说,我们的开发---测试---生产 的流程距离理想状态很远很远,他是可以工作的,但真的还需要好好改进。

The advent of Docker

In 2013 Dotcloud released Docker.

The Betabrand use case for Docker was immediately obvious. I saw it as the solution to simplify our development and staging environments; by getting rid of the ansible scripts (well, almost; more on that later).

Those scripts would now only be used for production.

At the time, one main pain point for the team was competing for our three physical staging servers: dev1, dev2 and dev3; and for me maintaining those 3 servers was a major annoyance.

After observing docker for a few months, I decided to give it a go in April 2014.

After installing docker on one of the staging servers, I created a single docker image containing our entire stack (haproxy, varnish, redis, apache, etc.) then over the next few months wrote a tool (sailor) allowing us to create, destroy and manage an infinite number of staging environment accessible via individual unique URLs.

Worth noting that docker-compose didn’t exist at that time; and that putting your entire stack inside one docker image is of course a big no-no but that’s an unimportant detail here.

From this point on, the team wasn’t competing anymore for access to the staging servers. Anybody could create a new, fully configured, staging container from the docker image using sailor. I didn’t need to maintain the servers anymore either; better yet, I shut down and cancelled 2 of them.

Our development environment, however, still was running on macos (well, "Mac OS X" at the time) and using the Ansible scripts.

Then, sometime around 2016 docker-machine was released.

Docker machine is a tool taking care of deploying a docker daemon on any stack of your choice: virtualbox, aws, gce, bare-metal, azure, you name it, docker-machine does it; in one command line.

I saw it as the opportunity to easily and quickly migrate our ansible-based development environment to a docker based one. I modified sailor to use docker-machine as its backend.

Setting up a development environment was now a matter of creating a new docker-machine then passing a flag for sailor to use it.

At this point, our development-staging process had been simplified tremendously; at least from a dev-ops perspective: anytime I needed to upgrade any software of our stack to a newer version or change the configuration, instead of modifying my ansible scripts, asking all the team to run them, then running them myself on all 3 staging servers; I could now simply push a new docker image.

Ironically enough, I ended up needing virtual machines (which I had deliberately avoided) to run docker on our macbooks. Using vagrant instead of Ansible would have been a better choice from the get go. Hindsight is always 20/20.

Using docker for our development and staging systems paved the way to the better solution that Betabrand.com now runs on.

immediately :    adv. 立即, 马上     直接地
rid :vt. 使摆脱, 解除…的负担, 从…中清除
annoyance :n. 恼怒;烦恼;打扰
infinite :adj. 无限的,无穷的;无数的;

Worth noting :值得注意
compose :t. 组成, 构成
opportunity :机会
modified :改良的
tremendously :极大地
perspective :n. 远景, 景 前途; 希望 透视 透视图 观点, 想法
modify :修改
Ironically :adv. 嘲讽地, 挖苦地 具有讽刺意味地
deliberately :adv. 慎重地;谨慎地 故意地,蓄意地 从容不迫地,不慌不忙地
Hindsight :n. 事后的觉悟;事后的聪明
20/20. :用来表示 完美

Docker 的出现

2013年,Dotcloud发布了Docker。

我们的网站Betabrand在使用了Docker实例后的效果非常明显,我觉得这是简化我们的开发和测试环境的方案,所以,我们可以摆脱麻烦的Ansible了(好的,后面将详细介绍)。

这些脚本,现在只用在生产环境。

当时,团队的一个主要痛点是争夺我们的三个测试服务器:dev1、dev2和dev3;对我来说,维护这3台服务器是一个很大的麻烦。

在对docker观察了几个月之后,2014年,我决定放手去做。

我在一台测试服务器上安装了Docker,创建了一个包含我们所有技术栈(haproxy, varnish, redis, apache,等等)的docker镜像。在接下来的几个月写了一个工具(sailor),这个工具可以允许我们每一个的单独URLs创建、销毁、管理无数的测试环境

值得注意的是,docker-compose在当时并不存在;当然,将整个堆栈放在一个docker映像中是一个很大的禁忌,但在这里,这是一个不重要的细节。

从现在开始,团队不再争着访问测试服务器了。任何人都可以使用sailor从docker镜像创建一个新的,完全配置的docker容器。我也不用再需要维护服务器了;更好的是,我关闭并取消了其中的2个。

但是,我们的开发环境仍在macos上运行(当时,“Mac OS X”)并使用Ansible脚本。

然后,2016年左右的docker-machine发布了。

docker-machine是一个工具,为你选择的技术栈创建并维护一个守护进程:virtualbox,aws,gce,bare-metal,azure,你可以为它命名,这些都在命令行中操作。

我觉得这是可以简单快速的将基于ansible的开发环境迁移到基于docker机会。我改进了sailor使用docker-machine作为它的后端。

现在,建立一个开发环境就是创建一个新的docker-machine,然后为sailor传递一个标志来使用它。

在这一点上,我们的开发阶段过程得到了极大的简化;至少从开发者的角度来看:任何时候我需要将我们技术栈的任何软件升级到更新的版本或更改配置,而不是修改我的ansible脚本,要求所有团队运行它们,然后我需要在3台测试服务器上把它们都运行一次;我现在可以简单地推送一个新的docker镜像。

具有讽刺意味的是,我最终需要虚拟机(我故意避免使用)在我们的macbook上运行docker。使用vagrant代替Ansible本来是一个更好的选择。后见之明总是20/20。

使用Docker,为我们开发和测试系统找到更好的网站运行的方案铺平了道路

Kubernetes

Because Betabrand is primarily an e-commerce platform, Black Friday loomed over our website more and more each year.

To our surprise, the website had handled increasingly higher loads since 2013 without failing in any major catastrophe, but, it did require a month long preparation beforehand: adding capacity, load testing and optimizing our checkout code paths as much as we possibly could.

After preparing for Black Friday 2016, however, it became evident the infrastructure wouldn’t scale for Black Friday 2017; I worried the website would become inacessible under the load.

Luckily, sometime in 2015, the release of Kubernetes 1.0 caught my attention.

Just like I saw in docker an obvious use-case, I knew k8s was what we needed to solve many of our issues. First of all, it would finally allow us to run an almost identical dev-staging-production environment. But also, would solve our scalability issues.

I also evaluated 2 other solutions, Nomad and Docker Swarm, but Kubernetes seemed to be the most promising.

For Black Friday 2017, I set out to migrate our entire infra to k8s.

Although I considered it, I quickly ruled out using our current OVH bare-metal servers for our k8s nodes since it would play against my goal of getting rid of Ansible and not dealing with all the issue that comes with hardware servers. Moreover, soon after I started investigating Kubernetes, Google released their managed Kubernetes (GKE) offer, which I rapidly came to choose.

loom :n. 织布机;若隐若现的景象;vi. 可怕地出现;朦胧地出现;隐约可见
increasingly :adv. 越来越多地;渐增地
catastrophe :n. 大灾难;大祸;惨败
preparation :n. 预备;准备
evident :adj. 明显的;明白的
infrastructure :n. 基础设施;公共建设;下部构造
inacessible :?inaccessible 作者写错了?--->adj. 达不到的, 不可及的
identical :adj. 同一的;完全相同的
scalability :n. 可扩展性;可伸缩性;可量测性

 

Kubernetes

由于Betabrand主要是一个电子商务平台,每年的黑色星期五对我们网站的关注的用户越来越多。

令我们惊讶的是,自2013年以来,该网站已经处理了越来越高的负载而没有遇到任何重大灾难,但是,它确实需要提前一个月进行准备:增加容量,负载测试并尽可能地优化我们的结账代码路径。

然而,在准备2016年黑色星期五之后,很明显2017年黑色星期五的基础设施不会扩展;我担心网站会在负载下变得无法控制。

幸运的是,在2015年的某个时候,Kubernetes 1.0的发布引起了我的注意。


就像我在docker中看到一个明显的用例一样,我知道k8s可以借我我们遇到的许多问题。首先,它最终将允许我们运行几乎相同的开发生产环境。同时,也将解决我们的可扩展性问题。

我还评估了其他2个解决方案,Nomad和Docker Swarm,但Kubernetes似乎是最有希望的。

对于2017年黑色星期五,我开始将整个基础设施迁移到k8s。

尽管我考虑过这一点(将我们的服务器用于k8s节点),但我很快就排除了这个做法,因为它会违背我的目标,即摆脱Ansible而不是处理硬件服务器带来的所有问题。 此外,在我开始调查Kubernetes之后不久,谷歌发布了他们管理的Kubernetes(GKE)产品,我很快就选择了。

Learning Kubernetes

Migrating to k8s first involved gaining a strong understanding its architecture and its concepts, by reading the online documentation.

Most importantly understanding containers, Pods, Deployments and Services and how they all fit together. Then in order, ConfigMaps, Secrets, Daemonsets, StatefulSets, Volumes, PersistentVolumes and PersistentVolumeClaims.

Other concepts are important, though less necessary to get a cluster going.

Once I assimilated those concepts, the second, and hardest, step involved translating our bare-metal architecture into a set of YAML manifests.

From the beginning I set out to have one, and only one, set of manifests to be used for the creation of all three development, staging and production environment. I quickly ran into needing to parameterized my YAML manifests, which isn’t out-of-the-box supported by Kubernetes. This is where Helm [1] comes in handy.

from the Helm website: Helm helps you manage Kubernetes applications—​Helm Charts helps you define, install, and upgrade even the most complex Kubernetes application.

Helm markets itself as a package manager for Kubernetes, I originally used it solely for its templating feature though. I have, now, also come to appreciate its package manager aspect and used it to install Grafana [2] and Prometheus [3].

After a bit of sweat and a few tears, our infrastructure was now neatly organized into 1 Helm package, 17 Deployments, 9 ConfigMaps, 5 PersistentVolumeClaims, 5 Secrets, 18 Services, 1 StatefulSet, 2 StorageClasses, 22 container images.

All that was left was to migrate to this new infrastructure and shutdown all our hardware servers.

gain :n. 增加;利润;收获 vt. 获得;增加;赚到 vi. 增加;获利
concepts :n. 概念,观念;思想
assimilate :vt. 吸收;使同化;把…比作;使相似
architecture :n. 建筑学;建筑风格;建筑式样;架构
manifest :n. 载货单,货单;旅客名单;货运列车编组清单;v. 表明,清楚显示
set out :vt. 规划,展现,开始@vi. 出发
parameterized :参数化的
out-of-the-box :开箱即用的
handy :adj. 手边的,就近的;便利的;容易取得的;敏捷的
sweat :汗水

 

学习k8s

迁移到k8s首先需要通过阅读在线文档对其体系结构和概念有一个较强的理解。

最重要的是理解containers, Pods, Deployments 和Services以及它们是如何组合在一起的。然后依次是ConfigMaps, Secrets, Daemonsets, StatefulSets, Volumes, PersistentVolumes and PersistentVolumeClaims。

其他的概念也很重要,但是对于集群的运行来说就不是那么必要了。

一旦我吸收了这些概念,第二个也是最难的步骤就是将我们的服务器架构转换成一组YAML清单。

从一开始,我就规划了一组Yaml清单,仅有的清单用于创建所有三个开发、测试和生产环境。我很快就需要参数化我的YAML清单,Kubernetes不支持的开箱即用。这个时候,Helm就该派上用场了

来自Helm的网站:Helm帮助您定义、安装和升级无论多么复杂的Kubernetes应用程序。

Helm将自己定位为Kubernetes的包管理器,但我最初只是将其用于模板特性。现在,我也开始欣赏它的包管理器方面,并使用它安装Grafana[2]和Prometheus[3]。

经过一些汗水和泪水,我们的基础设施现在被整齐地组织成一个Helm包、17个部署、9个ConfigMaps、5个PersistentVolumeClaims、5个secret、18个服务、1个状态集、2个存储库、22个容器映像。

这些都做好之后,剩下的就是迁移到这个新的设备上,并关闭所有的硬件服务器。

Officially migrating

October 5th 2017 was the night.

Pulling the trigger was extremely easy and went without a hitch.

I created a new GKE cluster, ran helm install betabrand --name production, imported our MySQL database to Google Cloud SQL, then, after what actually took about 2 hours, we were live in the Clouds.

The migration was that simple.

What helped a lot of course, was the ability to create multiple clusters in Google GKE: before migrating our production, I was able to rehearse through many test migration, jotting down every steps needed for a successful launch.

Black Friday 2017 was very successful for Betabrand and the few technical issues we ran into weren’t associated to the migration.

pull :拖、拉
trigger :vt. 触发;引发,引起 vi. 松开扳柄 n. 触发器;扳机;制滑机
extremely :极端、极其、非常
hitch :n. 钩;猛拉;急推;蹒跚;故障
rehearse :排练,预演
jotting :简短的笔记
associated :联合的,关联的

 

2017年10月5日晚上。

扣动扳机极其容易,并且没有任何问题。

通过运行helm install betabrand --name production,我创建了一个新的GKE集群,将我们的MySQL引入到谷歌的云SQL。然后,在等待了将近2个小时后,我们就进入了云端。

迁移就是这么简单。

帮助最大的功课,是在Google GKE中创建多个集群的能力:在迁移我们的生产环境之前,我尽可能的排练了多次迁移过程,然后记下成功操作的每一个步骤。

我们的Betabrand成功的度过了2017年的黑色星期五,有几个技术问题和这次的迁移无关。

The development/staging environments

Our development machines run a Kubernetes cluster via Minikube [4].

The same YAML manifests are being used to create a local development environment or a "production-like" environment.

Everything that runs on Production, also runs in Development. The only difference between the two environments is that our development environment talks to a local MySQL database, whereas production talks to Google Cloud SQL.

Creating a staging environment is exactly the same as creating a new production cluster: all that is needed is to clone the production database instance (which is only a few clicks or one command line) then point the staging cluster to this database via a --set database parameter in helm.

parameter  :实例

 

开发 /测试环境

我们的开发环境通过Minikube运行在K8s集群上。

相同的YAML清单用于创建本地开发环境或“类似生产”环境。

在生产环境运行的所有东西,都要在开发环境上运行,这两个环境的不同之处点就是,我们的开发环境使用本地的MySQL数据库,而生产环境使用谷歌云SQL。

创建一个测试环境几乎和创建一个新的生产环境是一样的。需要做的就是复制一些生产环境的数据库实例(只需要点击记下或者一条命令)然后通过在helm里面设置参数 -- set database 将测试环境的数据执行复制的这个数据库。

A year after

It’s now been a year and 2 months since we moved our infrastructure to Kubernetes and I couldn’t be happier.

Kubernetes has been rock solid in production and we have yet to experience an outage.

In anticipation of a lot of traffic for Black Friday 2018, we were able to create an exact replica of our production services in a few minutes and do a lot of load testing. Those load tests revealed specific code paths performing extremely poorly that only a lot of traffic could reveal and allowed us to fix them before Black Friday.

As expected, Black Friday 2018 brought more traffic than ever to Betabrand.com, but k8s met its promises, and, features like the HorizontalPodAutoscaler coupled to GKE’s node autoscaling allowed our website to absorb peak loads without any issues.

K8s, combined with GKE, gave us the tools we needed to make our infrastructure reliable, available, scalable and maintainable.

solid :固态的
outage :短供期
anticipation :预料,预期
replica :复制品
perform :表演的,履行的
extremely :极端,极其,非常
reveal :显示,显露
promises :承诺
coupled :v. 联接的;成对的;耦合的

 

一年以后

从我们迁移到K8s上已经过去一年两个月了,我现在非常的轻松愉快。

K8s现在已经如磐石般坚固,我们的系统再也没有经历过中断。

为了预测2018年黑色星期五的大流量,我们用几分钟的时间创建了一个额外的生产环境副本并开始做压力测试。这些测试显示出特定的代码路径表现的非常差劲,只能显示少量的流量,我们可以在黑色星期五到来之前修复他们。

意料之中,2018年的黑色星期五给我们的网站Betabrand带来了前所未有的流量,但是,K8s兑现了它的承诺。像HorizontalPodAutoscaler这样的功能和GKE节点的自动调整伸缩结合在一起,可以使我们的网站吸收峰值负载,而不会出现任何问题。

K8s和GKE的结合,给我们提供了我们需要的工具,来使我们的基础设施可靠、可用、可扩展和可维护。


1. https://helm.sh/ 2. https://grafana.com/ 3. https://prometheus.io/ 4. https://github.com/kubernetes/minikube
上一篇:CIFeet technical support


下一篇:Paper | Densely Connected Convolutional Networks