Author Archives: Alex Muntada

Paving the road from CRITICAL to OK

Imagine you have an issue on your mail server. It’s an intricate issue, one of those whose origin may be difficult to track because it lies on the lowest levels of the system: mail server runs on a legacy operating system inside a virtual machine on top a modern host that attaches disk images from a clustered storage system, connected into a fast network and receiving user requests via load balanced tunnels, etc.

There are so many parts running this setup that figuring out what happened could be a lot of effort, so the first thing that comes to your mind are the metrics collected by Munin (aha! let’s find some correlation of events in those graphs). So, you open the Munin web pages and you discover that it’s been 3 days since graphs were last updated. You think that maybe the state files were updated and you just need to rebuild the graphs… Bad luck, again.

How could this happen and nobody noticed? You think that we need to add some Nagios checks for the Munin lock files, so you get a warning or critical alert if it’s been too long since last update, right? Then you discover that the check is already there and it’s been moaning every single one of those 3 days that Munin wasn’t being updated. Well, it happens that Munin is not a core service, it’s only useful once in a while, though it can be very frustrating not having it when you actually need it. Thus, nobody payed attention to that alert because nobody needed Munin at that time and there were other alerts more pressing. So, what will happen next time?

Okay, you promise that next time you’ll pay more attention to those critical alerts and see that Munin needs an intervention. Are you happy with that? I’m not. I believe that errors caused by humans raise defects on the underlying system being operated. Also, I believe that repetitive work is a bad approach to solve any issue. Automation makes IT better.

In this particular case, Munin fails to remove its locks sometimes for whatever reason (it may well be that there’s something caused by our configuration, but it’s not worth the trouble to debug it because it only happens a few times a year). When it happens though, Nagios checks send alerts and the person on call should connect to the Munin server and just remove the locks. That’s just it, remove the locks. Therefore, I added an event handler to those Nagios checks that will remove the lock next time is gets too old. The critical alert will be triggered anyway and that’s good because it’s a critical event that should be recorded, but the system will take care of that immediately and very soon it will be OK again.

Now I’m wondering how many of our critical alerts could be automated like this. I’m sure that not all of them will be that easy to automate, but what if most of them are? Well, I won’t start adding random event handlers everywhere. I’ll wait for the next issue that raises an automation opportunity because I’m a huge fan of small steps. I hope you’ll do the same.

Lideratge i prepotència

A la Universitat Politècnica de Catalunya fa temps que estem en procés de canvis: el dèficit acumulat i l’envelliment de la plantilla plantegen reptes difícils per a tothom i els canvis en els ens públics costen molt de temps i esforç. Però aquests canvis són alhora oportunitats i, des de fa uns mesos, em plantejo quin vull que sigui el meu paper en tot plegat.

Tot va començar amb la reorganització d’una part de la plantilla que, en el cas que em pertoca, implicarà deixar de formar part en breu del Departament d’Arquitectura de Computadors i passar a formar part d’una Unitat Transversal de Gestió que donarà servei a aquest mateix departament i d’altres unitats de l’àmbit TIC al Campus Nord de la UPC. Amb aquesta reorganització de la plantilla, juntament amb altres que s’han fet als Serveis Generals i a d’altres campus de la universitat, s’han produït trasllats de companys del TIC que obren la possibilitat a canvis que feia anys que no es produïen per falta de pressupost per a la plantilla.

En aquest escenari m’estic plantejant des de fa mesos si m’interessaria presentar-me a una plaça de responsable. D’una banda, en una posició com aquesta tindria més llibertat per experimentar sobre models de lideratge i gestió d’equips, un tema que fa temps que m’interessa. Per altra banda, em preocupa haver de renunciar a la part tècnica de la meva feina, que m’engresca tant i em permet ser creatiu alhora que ajudo a millorar els serveis que donem a la universitat. Tot i que estic carregat de dubtes, no em preocupa excessivament sentir-me capaç de fer de responsable perquè porto una pila d’anys liderant iniciatives en diferents comunitats. Tampoc em preocupa massa perdre pistonada en matèria TIC perquè segueixo molt implicat en algunes comunitats de programari lliure. Em preocupa trobar-me fent una feina que no em faci feliç. I és que durant aquests 20 anys al departament, la feina que faig té molt de pes en la meva felicitat i realització personal i professional.

Alguns companys de feina m’animen a presentar-me a les places de responsable i els agreixo la confiança. Em diuen que sóc un líder i que em veuen fent aquesta feina, però jo em pregunto si ser un bon líder em fa automàticament un bon responsable. M’agrada pensar que sóc un líder transformador, que inspira amb l’exemple i que s’arremanga amb la resta de l’equip per tirar endavant els reptes, però em preocupa caure en la micro-gestió i no saber delegar prou. De fet, hom pot ser un líder transformador sense ser un gestor. Potser el responsable està a mig camí entre els dos.

Per si no tenia prou dubtes, recentment alguns companys m’han explicat algunes experiències passades en què hom havia percebut que jo tenia una actitud prepotent envers algú sobre un tema que domino. Sempre que algú em comenta alguna cosa així, recordo l’ocasió en què un amic que venia de visita al departament em va dir «tio, com et passes amb la becària, no?». Ho recordo perfectament: encara avui veig la Cristina asseguda a la seva cadira amb la mirada cap a terra després de fer-li un comentari cínic perquè no sabia una cosa que jo havia après pel meu compte fora de la universitat. Quina vergonya cada cop que ho recordo. Penso que ara procuro ser més empàtic, tenir un discurs menys violent, entendre que cadascú té una experiència diferent i que he après una mica a tractar millor les persones. Tot i així, segur que no sempre me’n surto i per aquest motiu us demano perdó. Si mai us he dit o us dic res que us fa mal, digueu-m’ho sense por i tornaré a recordar la Cristina.

En el fons estic carregat de dubtes i inseguretat com tothom. Sóc una mica vanitós perquè busco el reconeixement dels altres per les coses de les que em sento orgullós. Així, em trobareu fardant perquè als meus 44 anys em fan volar pels aires quan faig aikido o perquè practico iaido amb una espasa japonesa sense tall. De la mateixa manera, parlo amb molta seguretat dels temes tècnics que domino, de les experiències que he tingut o del meu treball en comunitats. Al mateix temps, medito i reflexiono sobre les meves virtuts i els meus defectes, per tant si em doneu feedback sobre les meves interaccions amb vosaltres us estaré enormement agraït perquè m’ajudareu a ser millor líder i persona.

Moltes gràcies i perdoneu-me si us he ferit en algun moment ♥

My Free Software Activities in Jul-Sep 2017

If you read Planet Debian often, you’ve probably noticed a trend of Free Software activity reports at the beginning of the month. First, those reports seemed a bit unamusing and lengthy, but since I take the time to read them I’ve learnt a lot of things, and now I’m amazed at the amount of work that people are doing for Free Software. Indeed, I knew already that many people are doing lots of work. But reading those reports gives you an actual view of how much it is.

Then, I decided that I should do the same and write some kind of report since I became a Debian Developer in July. I think it’s a nice way to share your work with others and maybe inspire them as it happened to me. So I asked some of the people that have been inspiring me how do they do it. I mean, I was curious to know how they keep track of the work they do and how long it takes to write their reports. It seems that it takes quite some time, it’s mostly manual work and usually starts by the end of the month, reviewing their contributions in mailing lists, bug trackers, e-mail folders, etc.

Here I am now, writing my first report about my Free Software activities since July and until September 2017. I hope you like it:

  • Filed bug #867068 in nm.debian.org: Cannot claim account after former SSO alioth cert expired.
  • Replied a request in private mail for becoming the maintainer for the Monero Wallet, that I declined suggesting to file an RFP.
  • Attended DebConf17 DebCamp but I missed most of Open Day and the rest of the Debian conference in Montreal.
  • Rebuilt libdbd-oracle-perl after being removed from testing to enable the transition to perl 5.26.
  • Filed bug #870872 in tracker.debian.org: Server Error (500) when using a new SSO cert.
  • Filed bug #870876 in tracker.debian.org: make subscription easier to upstreams with many packages.
  • Filed bug #871767 in lintian: [checks/cruft] use substr instead of substring in example.
  • Filed bug #871769 in reportbug: man page mentions -a instead of -A.
  • Suggested to remove libmail-sender-perl in bug #790727, since it’s been deprecated upstream.
  • Mentioned -n option for dpt-takeover in how to adopt pkg-perl manual.
  • Fixed a broken link to HCL in https://wiki.debian.org/Hardware.
  • Adopted libapache-admin-config-perl into pkg-perl team, upgraded to 0.95-1 and closed bug #615457.
  • Fixed bug #875835 in libflickr-api-perl: don’t add quote marks in SYNOPSIS.
  • Removed 50 inactive accounts from pkg-perl team in alioth as part of our annual membership ping.

Happy hacking!

 

Notes from FOSDEM

Going to FOSDEM has always been a mix of feelings: is that time of year when you meet many friends from the Free Software community, you learn some interesting things that you didn’t know about, you share some knowledge, and you may have a fair amount of chocolate and beer in a usually cold weather.

Sometimes talks are not what they seem, and oftentimes you can’t get into a room because it’s full. But there’s always the chance to learn something new, so here’s my list of notes:

  • Play etcd if you want to try it and see what happens when you make changes.
  • Minikube: mini Kubernetes for developing on your laptop.
  • Software Heritage API is publicly available.
  • OpsTheater offers a stack for IaaS with Puppet, Foreman, GitLab, Icinga, ELK+Graphana, Mattermost (integrates easily with GitLab).
  • Recommendation: move things from Hiera to Foreman smart parameters. Debugging Hiera can be a nightmare if you have hundreds of YAML files.
  • octocatalog-diff compares two Puppet catalogs without deploying the changes. Facts are not live and changes in providers won’t show. A Foreman plugin is available too as a proof of concept.
  • Puppeteer helps find configuration smells that violate recommended best practices.
  • Legacy docs are big, comprehensive, and feature based. Modular docs are lean, concise, targeted, and user-story based. Content rot makes docs hard to find and navigate. Document only what users need, as user stories.
  • Perl6 grammars make it easy to implement informal DSLs. Reading recommendation: Domain Specific Languages, by Martin Fowler (2010).

Happy hacking!

Config Management Camp

This was my first time at the Config Management Camp in Gent and I had a great time and you’ll see from my notes below that it was definitely worth it.

Day 1

  • Recommendation: use find-nodes from PuppetDB with i parallel SSH (pssh).
  • Services resiliance depends on human resiliance (HumanOps).
  • tiny puppet installs applications on any OS (slides):
    • e.g. tp install puppetdb
    • tinydata is the default source for application data.
  • Vox Pupuli maintain abandoned puppet modules.
  • Reading recommendation: Thinking in Systems: A Primer, by Donella Meadows.
  • Puppet extensions:
    • Ruby functions can take lambda arguments.
    • dalen-puppetdbquery query_resources function to find other nodes resources.
    • Puppet faces allows new puppet subcommands (dalen-puppetls).
  • Foreman unknown gems (slides):
    • Foreman hooks plugin.
    • Trends show changes over time.
    • Bookmark searches. Puppet can ask the search API for information with puppet-foreman module.
    • Class import has rules to hide things in the UI.
    • hammer ssh -c ‘uptime’ -s ‘architecture=…’
    • foreman-rake hosts:scan_out_of_sync
    • There are several Foreman UI themes.
    • API docs are available in your Foreman instance as http://foreman/apidoc
  • Types and providers:
    • require “wirble” in ~/.irbrc
    • Pro tip: use Puppet types and providers for managing web APIs.

Day 2

  • Inspiring story by Annie Hedgpeth, My Journey Into Technology Through Inspec (video).
  • Getting data to the end user:
    • Memex maps the Dark Web.
    • NASA beards like GitHub, sysadmins don’t.
    • juju allows users choose their applications, configure and scale them.
  • Someone mentioned that libral (a native Resource Abstraction Layer) seemed interesting.
  • Quality automation with rudder-dev (slides)
  • undef: refactoring old puppet code (slides)
    • Puppet 3.x is EOL.
    • Hiera overload, bloated YAML. Clean it up!
    • Lack of validation/CI:
      • Syntax error should not be deployable.
      • Fix style with puppet-lint -f
      • rspec-puppet to test special cases.
      • Beaker or Test Kitchen for acceptance tests.
    • VCS top notch:
      • Make it easy as possible to avoid mistakes.
      • Put full context in the commit message.
      • Use the body to explain what and why, not how.
      • Commit often, perfect later, publish once.
      • The git pickaxe shows you how to find any text in the commits.
      • GitMagic helps setting contribution guidelines.
    • Make newbie experience better:
      • Start with control-repo.
      • Pick supported forge modules, then pick approved ones.
      • puppet module skeleton
      • Write as little as possible.

Hope you find the notes useful. Let me know if you have any questions.

 

Reunió de juny de Barcelona.pm

Per la reunió del passat mes de juny dels Perl Mongers de Barcelona fam fer un experiment al que vam anomenar Testing Open Space, una mena de desconferència en què l’eix central seria el concepte dels tests i els temes dels que es parlarien es decidirien a la mateixa reunió. Comparteixo aquí el resum de la reunió que he enviat a la llista perquè crec que també podria ser interessant per a gent de fora de la comunitat dels mongers.

Després de les presentacions corresponents (teníem cares noves) vam explicar diferents casos amb què ens trobem que cal introduir tests, sobretot d’integració, en sistemes legacy. Vam posar com a exemples els següents:

  • Introduir tests en un sistema no modularitzat per a fer les altes d’usuaris als serveis del meu departament. És un codi que originalment es va fer per resoldre un problema concret i que ha anat creixent de forma descontrolada (un script per cada servei) i sense tests.
  • Introduir tests en una eina per automatitzar els pull requests als upstreams dels mòduls de Perl que empaquetem a Debian. Ja tenim una forma d’enviar les diferències dels canvis que hem de fer per generar els paquets a Debian, però per als upstreams que tenen els repositoris a GitHub volem crear directament els pull requests.
  • Com fer tests d’integració en un sistema que utilitza serveis d’Amazon Web Services (AWS) sense replicar tot l’entorn de producció.

En aquest punt vam fer una petita explicació de les diferències entre els tests funcionals o unitaris i els d’integració. També vam parlar de mocking i de com evitar-lo tenint diferents entorns per a producció i test.

Tot seguit, vam comentar com amb refactoritzacions petites que vagin afegint una capa d’abstracció als serveis d’AWS es podrien fer els tests més fàcilment: aquest middleware primer cridaria exactament als serveis d’AWS (assegurant així que no s’introdueix cap canvi de disseny que afecti al funcionament) i que després gradualment es podria anar evolucionant fins que permeti fer tests sense tocar els serveis d’AWS. Vam comparar-ho amb el patró Model-View-Controller i amb altres middlewares com DBIC.

Després vam fer una mica de teràpia de grup parlant dels motius pels quals no es fan els tests i la qualitat del codi no és la que hom desitjaria. Vam parlar del triangle de ferro (recursos, abast, temps i qualitat) i de la versió pick two.

Finalment, ja quan estàvem a la porta a punt de marxar va sorgir el tema del Behaviour-Driven Development i vam comentar molt ràpidament què fa i quina diferències té respecte al Test-Driven Development: el primer està orientat a negoci i el segon a desenvolupament.

Us recomano aquest parell de llibres:

També podeu trobar interessant aquest vídeo sobre La economia del refactoring d’en Xavi Gost a la CAS2014 (no estic d’acord amb tot el que diu però el trobo igualment interessant).

Packaging SoftiWARP kernel module for Debian

One of the HPC clusters we have at work has a mixed set of nodes: a few of them have InfiniBand interfaces and the others don’t. A few weeks ago we were requested to install the SoftiWARP kernel module on those nodes that lack the InfiniBand interface. We had already tried to build the module from source and it worked well, but now the challenge was to install it as a Debian package with DKMS, so it would be built for all installed kernel versions on each node.

We use Puppet to manage cluster node configuration, so you may wonder why not using Puppet instead. Well, for one we’re talking about installing the source of a kernel module, with no configuration at all. But there’s also the fact that Puppet delegates to package providers the management of software, which know much better how to deal with software upgrades. Lastly, there’s the challenge of learning something new: though I had previous knowlegde of DKMS, I had no idea on how to make a Debian package out of it.

Fortunately, I found Evgeny Golov‘s DKMS playground on Debian wiki. With those tips and my recently updated experience on packaging Perl modules for Debian, I was confident enough to try my first DKMS Debian package. Actually, it came out quite easy: I just had to adapt debian/rules a bit to accommodate modern debhelper best practices:

#!/usr/bin/make -f
pdkms:=siw-dkms
sname:=siw
sversion:=$(shell dpkg-parsechangelog|grep "^Version:"|cut -d" " -f2|rev|cut -d- -f2-|rev|cut -d':' -f2)
 
%:
    dh $@
 
override_dh_auto_install:
    dh_installdirs -p$(pdkms)  usr/src/$(sname)-$(sversion)
    cp -a *.txt Makefile *.c *.h debian/$(pdkms)/usr/src/$(sname)-$(sversion)
    sed "s/__VERSION__/$(sversion)/g" debian/dkms.conf.in > debian/$(pdkms)/usr/src/$(sname)-$(sversion)/dkms.conf

Funny enough, I spend more time filling the details on debian/copyright and debian/control files than actually setting up DKMS, so big kudos to Evgeni!

Take a look at the full debian packaging for further details. You may notice that this package has a dependency on libsiw-dev (SoftiWARP userland library), which I had to package first and was a bit trickier. More on that next time.