Deploying a Service Fabric cluster to run Windows containers

From container perspective, Service Fabric is a container orchestrator which supports both Windows and Linux containers. In legacy application lift and shift scenarios, we usually containerize the legacy application with minimal code change. And Service Fabric is a good platform to run these containers.

To deploy a Service Fabric cluster on Azure which is suitable for running containers, we can use ARM template. I created a template with the following special settings:

1 – An additional data disk is attached to the VMs in the cluster to host the downloaded container images. We need this disk is because by default all container images would be downloaded to C drive of the VMs. The C drive may run out of space if there are several large images downloaded.

"dataDisks": [
    {
        "lun": 0,
        "createOption": "Empty",
        "caching": "None",
        "managedDisk": {
            "storageAccountType": "Standard_LRS"
        },
        "diskSizeGB": 100
    }
]

2 – A custom script extension is used to run a custom script to format the data disk and change the configuration of dockerd service.

{
    "properties": {
        "publisher": "Microsoft.Compute",
        "type": "CustomScriptExtension",
        "typeHandlerVersion": "1.9",
        "autoUpgradeMinorVersion": true,
        "settings": {
            "fileUris": [
            
"https://gist.githubusercontent.com/chunliu/8b3c495f7ff0289c19d7d359d9e14f0d/raw/2fdcd207f795756dd94ad7aef4cdb3a97e03d9f8/config-docker.ps1"
            ],
            "commandToExecute": "powershell -ExecutionPolicy Unrestricted -File config-docker.ps1"
        }
    },
    "name": "VMCustomScriptVmExt_vmNodeType0Name"
}

The customer script is as follows:

Create authorization header for Cosmos DB with Go

I started a side project to create a client package for Cosmos DB SQL API with Go so I can try Go in a real project. My plan is to implement something similar to .NET Core SDK with Go. As this is a project for learning and practice, I will do it little by little, and there is no timeline regarding when it can be done.

I build the project based on SQL API via REST. To access resources in Cosmos DB with SQL API via REST, an authorization header is required for the requests. The value of the authorization header has the following format, as it is mentioned in this document.

type={typeoftoken}&ver={tokenversion}&sig={hashsignature}

In the above string, the values of type and version are simple: type is either master or resource, while the current version is 1.0. The value of signature is a bit complex. It is a hash of several other values by using the access key of Cosmos DB as the hash key. The document has all details in it and even better it has a sample written in C#.

So following the document and the sample, I implemented a Go equivalence as follows. It is a good example to try the base64 encoding and HMAC hash in Go.

The date format in the signature is required to be in HTTP-date format defined by RFC7231. However, the time package in the Go standard library doesn’t seem to support this format out of the box, but it provides a very easy way to create custom format. The utcNow() function in the above code is what I implemented to format the time to RFC7231 format.

Go语言笔记

最近接了个业余任务,给对Go语言感兴趣的同事讲讲Go语言是怎么回事。

说起来我对Go语言也不是很懂。记得我第一次听说Go,是在大概2011年的时候。那时听说Google开发了一种新语言,看了一眼,留下的第一印象是丑。那时候我正好在研究比特币是怎么回事,没时间玩这个新语言。2013年的时候,Go忽然在国内火了起来,很多网站开始用Go来写后台。我于是好奇这个语言有什么特别,花了点时间了解了一下。只是在工作中一直没有机会用,所以也谈不上精通了。这次趁着准备这个技术分享,我又深入学习了一下Go语言。

准备材料的时候,我在想怎么跟有其他语言经验的人介绍Go。我觉得作为一个简介,如果能回答好下面三个问题,应该就算不错了吧。

什么是Go?

不同于C#或Java这样依赖虚机的语言,或者Python这样的解释型动态语言,Go是一种编译型的静态类型语言,更接近C。实际上,C是直接影响go的语言之一。看看Go的三个创始人的背景,就大概知道Go会有怎样的基因了。编程语言领域转了一圈又回来了,十几年前我刚工作时,C/C++几乎一统天下。后来为了解决内存管理问题,Java和C#流行起来。然后随着机器性能的提升和解释器的改进,Javascript和Python在不同的领域崛起。现在随着云计算的普及,静态编译语言如Go和rust又开始流行了,所不同的是,它们提供了比C/C++更好的内存管理。

Go的最突出的几个特性是,可编译;静态类型,但也有部分类型推导;垃圾收集,这是其它大部分编译型静态类型语言所没有的;基于CSP(Communicating sequential processes)的并发编程;等等。

这两年,Go语言社区成长很快。据说Go是2017年,GitHub上用户增长最多的语言。而stack overflow的2017年开发者调查中,Go是most loved第五名,most wanted第三名,足见其火爆。

Most Wanted vs. Most Loved

为什么需要Go?

可是我们已经有无数种编程语言了,为什么还需要go呢?这就要从go试图解决什么问题说起了。根据go的创始人之一,Rob Pike,的说法,go的设计初衷是为了解决两个问题:

  1. Google的问题:big hardware, big software. 编译慢;依赖关系复杂;每个程序员都有自己的风格,不易合作;缺少文档;升级困难;经常重复造轮子;等等。
  2. 让Go的设计者的日常工作能够更轻松,生活更美好。

为此,Go的设计哲学遵循了下面两条核心规则。

  1. 极简:使用类Pascal语法,语法简单,关键字少;不支持类,继承,泛型等语言特性。
  2. 正交:数据结构和方法分开,通过聚合而非继承来联系二者;类型抽象通过接口实现;数据结构和接口都可以通过内嵌的方式来扩展。

虽然关于Go的设计哲学,不同的人可以列举出不同的项目,但极简和正交是最重要的两条。这两条使得Go简单易学,容易上手。这是Go受欢迎的重要原因之一。

但也由于这两条哲学,我觉得导致Go,至少是Go 1,并没有很好的解决所有它的设计者打算解决的问题。比如,Go的依赖通过package实现,但是它并没能解决大型项目的依赖问题。这个问题Go通过不同的package试了好几次,不过看来又要推倒重来,Go 1.11会有新的依赖管理工具。再比如不支持泛型,导致重复造轮子变成了免不了的问题。这些都是Go在大受欢迎的同时,为人诟病的问题。

目前Go有哪些应用场景?

目前看了,Go使用比较多的场景,还是在服务端的后台程序。容器领域里,Go俨然成为了标准语言。Docker, Kubernetes等等都是Go编写的。根据Go 2017 Survey的结果,其它go比较流行的领域包括中间件,微服务等等。Go比较不适合用来写桌面GUI应用。

Install Minikube on Ubuntu Server 17.10

I have some experiences with Docker and containers, but never played with Kubernetes before. I started to explore Kubernetes recently as I may need a container orchestration solution in the coming projects. Kubernetes is supported by Azure AKS. Even Docker has announced their support of it. Looks like it is going to be the major container orchestration solution in the market for the coming years.

I started with deploying a local Kubernetes cluster with Minikube on a Ubuntu 17.10 server on Azure. Kubernetes has a document on its site which is about installing the Minikube. But it is very brief. So in this post, I will try to document the step by step procedure both for the future reference of myself and for others who are new to Kubernetes.

Install a Hypervisor

To install Minikube, the first step is to install a hypervisor on the server. On Linux, both VirtualBox and KVM are supported hypervisors. I chose to install KVM and followed the guidance here. The following are steps.

  • Make sure VT-x or AMD-v virtualization is enabled. In Azure, if the VM is based on vCPUs, the virtualization is enabled. To double check, run command egrep -c '(vmx|svm)' /proc/cpuinfo, if the output is 1, the virtualization is enabled.
  • Install the KVM packages with the following command:
sudo apt-get install qemu-kvm libvirt-bin ubuntu-vm-builder bridge-utils
  • Use the following command to add the current user to the libvert group, and then logout and login to make it work. Note, in the guidance the group name is libvirtd, but on Ubuntu 17.10, the name has changed to libvert.
sudo adduser `id -un` libvirt
  • Test if your install has been successful with the following command:
virsh list --all
  • Install virt-manager so that we have a UI to manage VMs
sudo apt-get install virt-manager

Install kubectl

Follow the instruction here to install kubectl. The following are the commands:

curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
chmod +x ./kubectl
sudo mv ./kubectl /usr/local/bin/kubectl

Install Minikube

Follow the instruction on the release notes of Minikube to install it. I used the following command:

curl -Lo minikube https://storage.googleapis.com/minikube/releases/v0.25.0/minikube-linux-amd64 && chmod +x minikube && sudo mv minikube /usr/local/bin/

When you finish this step, according to the official document, the installation of Minikube has been completed. But before you can use it, there are several other components which needs to be installed as well.

Install Docker, Docker-Machine, and KVM driver

Minikube can run on natively on the Ubuntu server without a virtual machine. To do so, Docker needs to be installed on the server. Docker-CE has a different way to be installed and Docker has a document for it.

Docker Machine can be installed with the following commands:

curl -L https://github.com/docker/machine/releases/download/v0.13.0/docker-machine-`uname -s`-`uname -m` >/tmp/docker-machine && \
sudo install /tmp/docker-machine /usr/local/bin/docker-machine

Finally, we need to install a VM driver for the docker machine. Kubernetes team ships a KVM2 driver which is supposed to replace the KVM driver created by others. However, I failed to make the Minikube work with the KVM2 driver. There is a bug report for this issue and hope the Kubernetes team will fix it soon.

So I installed the KVM driver with the following command:

curl -LO https://github.com/dhiltgen/docker-machine-kvm/releases/download/v0.10.0/docker-machine-driver-kvm-ubuntu16.04
sudo cp docker-machine-driver-kvm-ubuntu16.04 /usr/local/bin/docker-machine-driver-kvm
sudo chmod +x /usr/local/bin/docker-machine-driver-kvm

Test if Minikube Works

With the completion of all the above steps, we can test the Minikube now.

minikube start --vm-driver kvm

It will create a vm named as minikube in KVM and configure a local Kubernetes cluster based on it. With kubectl, you should be able to see the cluster info and node info.

kubectl cluster-info
kubectl get nodes

With that, you can start to explore Kubernetes.

Running Linux Containers on Windows Server 2016

I never thought running Linux containers on Windows Server is a big deal. A reason that I run Docker for Windows on my Windows 10 laptop is to run some Linux based containers. I thought I just need to install Docker for Windows on a Windows Server 2016 server with Container feature enabled, then I should be able to run both Linux and Windows containers. I didn’t know it is not the case until when I tried it yesterday.

It turns out the Linux Containers on Windows (lcow) Server is a preview feature of both Windows Server, version 1709 and Docker EE. It won’t work on Windows Server 2016 of which the version is older than 1709. As a side learning of this topic, I also got some ideas about the Windows Server semi-annual channel. An interesting change.

So here is a summary of how to enable lcow on Windows Server, version 1709.

  1. First of all, you need to get a Windows Server, version 1709 up and running. You can get the installation media of Windows Server, version 1709 from here. As I use Azure, I provision a server based on the Windows Server, version 1709 with Container image. Version 1709 was only offered as a Server Core installation. It doesn’t have the desktop environment.

  2. Once you have the server up and running, you will have to enable the Hyper-V and Containers feature on it, and install the Docker EE preview. It can be installed with the following PowerShell script.

    As I use the Azure image, the Container feature and Docker EE has been enabled on it, and docker daemon has been configured as a Windows service. I don’t have to run the above script.

  3. Now you can follow the instruction here to configure the lcow. Specifically, I use the following script to configure it. I also update the configuration file in C:\ProgramData\Docker\config\daemon.json to enable the experimental feature of LinuxKit when docker service is started.

  4. Once you finish all the above configuration, you have enabled the lcow on Windows Server 1709. To test it, simply run

docker run --platform linux --rm -ti busybox sh

That is it. If you want, you can also try to run Ubuntu containers by following the instructions here.

Creating API Management instances in Parallel with Automation Runbook

Provisioning an Azure API Management (APIM) service instance is a bit time-consuming task. It usually takes 20 to 30 minutes to get an instance created. In most of cases, it is fine because you usually don’t have the needs to create many APIM instances. For most customers, 2 or 3 instances are enough for their solutions. And provisioning APIM instances doesn’t seem to be a day to day work.

But recently I am preparing a lab environment for an APIM related lab session that I am going to deliver in an event. Given that provisioning an APIM instance would take 20 to 30 minutes, it is impracticable to let attendees create the instances during the lab session. I have to provision an APIM instance for each attendee before the lab session. As there could be more than 40 attendees, I have to do it with a script rather than manually clicking around in Azure portal.

APIM supports creating instances with PowerShell. It doesn’t support Azure CLI at the moment. The Cmdlet for instance creation is New-AzureRmApiManagement, and as mentioned in the document, this is a long running operation which could take up to 15 minutes. If I simply create a PowerShell script to run this operation sequentially, it would take tens of hours to get all APIM instances created. It is not acceptable. I have to run the operations in parallel.

I ended up creating a PowerShell Workflow runbook in Azure Automation to do the task. PowerShell Workflow has several ways to support parallel processing, and Azure Automation provides the enough computing resource to run all operations in parallel.

The following code snippet shows the key part of the workflow.

The code is quite straight forward. I need to include the Azure authorization code for each of the parallel operation because when operations are running in parallel, each operation runs in its own process. So each of them need to be authorized before they can access the Azure resource.

For the completed code, you can get it from here. To run this workflow runbook in Azure Automation, the AzureRM.ApiManagement module needs to be imported into Azure Automation. That’s all.

为什么Surface Book才是我心目中,PC的理想形态

上个月去美国旅行的时候,跑去微软商店买了一台Surface Book 2。几个星期用下来,我觉得它是我用过的,最好用的Windows 10 PC。它的这种混合了笔记本和平板的模式,是我觉得十分理想的PC形态。不知道微软怎么考虑的,Surface Book系列竟然只在少数几个国家卖。如果不是受限于微软的市场策略,和偏高的价格,它应该大卖才对。

Surface Book好在哪儿呢?

首先,它够强大,够powerful,又够轻。我之前用的是ThinkPad W540,i7+32GB内存,双显卡,非常强大,但是太重,单单它的充电器就比别人的整台机器还重了,不适合背着到处走。我的这台surface book,是13寸版本,i7+16GB内存,底座是带显卡的。跑Visual Studio,Docker等等,统统没问题。文明6玩起来也比W540感觉还强。重点是,不相上下的性能,surface book的重量连W540的一半都不到,也就刚刚比W540的充电器重了一点点吧,绝对有利于保护肩膀。

其次,surface book的底座更稳定,键盘手感更好。我之前还有一台surface pro 3。那时候是出门用surface pro,在家用W540。Surface pro的键盘手感不好还在其次。有一次去客户机房,服务器旁边没桌子,我只好坐在高脚椅上,然后把surface pro摊开放腿上。可是surface pro典型的头重脚轻,一不小心,它从我腿上掉下去了,我伸手只抓到了它的键盘。偏巧掉下去的角度不对,是一角先着地,屏幕就摔裂了,直接导致触屏玩完了。也导致我之后不得不背着W540出门。Surface book的屏幕和键盘重量比较均衡,链接铰链也比surface pro的安全多了。

第三点就是它对Windows 10的支持了。通过平板分离按键,可以在不断电的情况下,将平板部分卸下来。由于它有两块电池,平板部分的电池比较小,使得它端着比surface pro还轻。这点太方便了。有时候资料看到一半,不想坐在办公桌前看了,可以卸下平板,端着坐沙发上看,完全可以无缝切换。虽然离了底座,平板的电池大概只能支撑两三个小时,不过对轻度使用足够了。

总之surface book我用着蛮好的,希望微软能调整策略,让它大卖。

Dockerfile实践

最近在玩OpenCV,顺手build了一个OpenCV 3.2.0的Docker image。这个image是基于Ubuntu 16.04和OpenCV 3.2.0的source code build的,顺带也build进了Python3的绑定。这个image比较适合用来作为开发和测试基于OpenCV的服务端程序环境的base image。由于包含了几乎全部的OpenCV组件,build的过程还是比较费时的,image的尺寸也比较
大,所以我将它push到了Docker Hub里。需要的话,可以用
docker pull chunliu/docker-opencv

把它拉下来使用。如果要精简组件,或build别的版本的OpenCV,可以修改Dockerfile,重新build。

实际上我以前并没有怎么用过Docker,只在虚机中安装过,顺着Docker官方的tutorial做过,并简单看过官方的几篇doc,仅此而已。大概明白Dockerfile是怎么回事,但没有写过很完整复杂的Dockerfile。事实证明,事非经过不知难,写这个Dockerfile还是有一些坑的。

首先,要写好这个Dockerfile,只靠记事本比较困难,使用辅助工具会容易一些。我用的是VS Code + Docker support,它能提供关键字着色和IntelliSense,也仅此而已。如果有工具能做语法检查就更好了,比如检查行尾是否少了一个续行符之类的。我开始几次都是跑build失败才发现,是某一行少了一个续行符。

另外,我没发现有什么好的方法,来debug和测试Dockfile。最开始,我是修改了Dockfile之后,就跑build,失败再找原因。但是这个build比较费时,这样不是很有效率。后来,我开始在一个container里,逐条跑Dockerfile里的命令,保证每条命令都没问题,再跑build。这样做的问题是,所有命令在一个bash session里跑成功了,并不能保证它们用RUN组织到Dockfile以后,build还能成功。

这就牵扯到Docker是怎么执行这个RUN的问题了。Docker的文档说,每一个RUN会是一个新的layer。我起初不太明白layer的含义,做过之后发现,所谓layer,就是一个中间状态的container。RUN后面的代码是在这个container里跑,跑完之后这个container被commit到image里,然后这个container被删除。后面的RUN,会基于新commit的image,起一个新的container。

所以,如果两段代码需要在一个bash session里跑的话,就需要在一个RUN里面才行。一个例子,比如build OpenCV的时候,会用下面的方式来make:

mkdir build
cd build
cmake ......
make ......

如果将cd,cmake和make分开到不同的RUN中,那cmake和make就有问题了,因为工作路径不对。实际上,RUN的工作目录是由WORKDIR设定的,每个RUN开始时都会使用它上面最靠近它的WORKDIR作为工作目录。所以如果非要将上面的代码分开到不同的RUN,也可以在RUN之间插入WORKDIR来指定路径,不过路径跳来跳去的,比较混乱。

细读Docker的两篇官方文档,对规避Dockerfile里的一些坑,是很有帮助的。

Reference

Open Live Writer

早年间Blog还流行的时候,微软出的Windows Live Writer是非常流行的一款离线写blog的工具。WLW我用过很久,一开始是和MSN Messenger一起的装的,主要是支持MSN Spaces。后来微软不支持Spaces了,但WLW还留着,因为它也支持Wordpress。又后来MSN Messenger被淘汰了,还是会通过Live Essential Tools安装里面的WLW。直到后来WLW2012之后,这个工具我还是用了很久,直到开发和支持都停止了。

WLW之后,我就再没用过桌面编辑器写blog了。主要是blog写的也少了,偶尔写一篇,就在浏览器里解决了。再者也没发现顺手的工具。直到今天在讨论组里看到有人提起,原来WLW有了开源的版本,而且还发布到了Windows Store里面。赶紧下了一个来试试。新的OLW界面和WLW一致,看起来不是UWP应用,像是通过Desktop Bridge包装了一下。OLW是.NET Foundation支持的,官网是http://openlivewriter.org/,代码开源在Github上。我已经fork了一份。它的readme还介绍了一段OLW的历史,蛮有趣的。

我以前用WLW的时候,给它写过插件。当时就想,有些功能它是怎么实现的。现在开源了,而且这么怀旧的玩具,有空的时候真要好好玩玩。

Ubuntu 16.04

前两天收到通知,说是我host在Azure上的这台VM,可以升级到Ubuntu 16.04了。趁着有空,就将它升了上去。

说起来,这台VM也经历好几次版本升级了。最初的时候,OS是Ubuntu 13.04。后来升级到13.10,再后来是14.04。每次升级都或多或少会遇到一些问题,要花些时间troubleshooting。因为怕麻烦,升到14.04之后就没继续折腾15.10,呆在14.04有两年多了。

因为之前升级的时候遇到过问题,我在今天升级之前还专门搜了搜,果然还是有不少人在升级16.04的时候遇到问题。为了以防万一,我觉着还是先做个备份比较保险。这时候就显出用Azure的好处了。Azure里有一个VMBackup服务,大大简化了云端虚拟机的备份和恢复操作,这个服务也支持Hybrid模式,可以将本地VM备份到云端。考虑到云存储非常便宜,这确实是个不错的功能。我之前就建了一个备份策略来保护这台VM,所以只要跑一下已经定义好的Job,备份就完成了。万一升级出了问题,恢复的话也是一个按钮的事。

备份好之后,我就开始跑升级。没想到还挺顺利的,除了mysql升级失败之外,没有遇到什么会导致升级失败的错误。查log之后发现,mysql之所以失败,是因为apparmor保护了一些路径。因为升级的过程中,我选择保留所有的旧的配置文件,这导致mysql需要访问的一些新的文件路径是被apparmor保护的。改了apparmor的设置,问题就解决了。fail2ban也遇到一样的问题,我在旧版里修改的jail rule有一个bug,但旧版忽略了,新版就出错无法启动。修了这个bug之后就好了。其他的服务都没有遇到问题,升级之后就立即可用了。

总之这次升级还蛮顺利的,一个上午就搞定了,我本来还打算搞一天的。