Some time ago was a day when I reached “ten” years at Linaro. Why “”? Because it
was 3 + 7 rather than 10 years straight. First three years as Canonical
contractor now seven years at Red Hat employee assigned as Member Engineer.
My first three years at Linaro
NewCo or NewCore? Or Ubuntu on ARM?
In 2010 I signed contract with Canonical as “Foundation OS Engineer”. Once there
I signed another paper which moved me to NewCo project (also called NewCore but
NewCo name is on paper I signed).
On 30th April 2010 I got “Welcome to Linaro” e-mail.
Many things changed during last years. Qualcomm released Snapdragon SoC line for
laptops. All were running Microsoft Windows (for Arm) with bastard version of
ACPI tables. Took some time for Linux community to get Linux running on those
systems. In Device Tree mode as ACPI tables were so bad that it was easier to
ignore them.
Apple released laptops and mini desktops with M1 SoC (in many variants) and then
M2 (also with variants). With MacOS on them. Again, Linux community started
working on getting Linux running there. Device Tree again.
There are moments when developers want to make changes in other FOSS projects.
Which usually means dealing with patch reviews. On mailing list, on Gerrit,
GitHub or any other collaboration platform etc. And it can take quite a long time.
One note at start: I am giving code patches as examples but post applies to any
type of contributions (code, docs, examples, tests etc.).
Each time I update my Fedora desktop to new release (usually around Beta) I
give a try to Wayland. Which shows that I still use X11.
My setup
My desktop has Ryzen cpu and NVidia GTX 1050 Ti graphics card. Only one monitor
(34” 3440x1440). I use binary blobs as this generation of GPU chipset is not
really usable with FOSS driver (nouveau).
For desktop environment I use KDE. Which means Plasma desktop/panel, Konsole and
few KDE apps. Firefox and Chrome as web browsers, Thunderbird for mail, Steam
for gaming and Zoom (or Google Meet) for most of video calls.
About two and half year passed since start of COVID-19 pandemic. A time when
most of conferences got cancelled or went online (with different level of success).
A conference which should have been a YouTube playlist
There is saying “a meeting which should have been an email”. For meetings which
were complete waste of time for most of attendees. During pandemic several
events were “a conference which should have been a YouTube playlist”. Never mind
did talks were interesting or not.
Far too often there was no way to chat with speakers or other “attendees”. Or
there was one chat channel per whole conference or track. Also without threading
(just one long list of messages) and no way to mention names.
Life kicks in
Some time ago one online conference took place. Three days of talks about things
which interest me. I had plans to attend several of them. Then life happened —
video calls and local stuff. Also several hours of time difference was a problem.
Good part is that organizers recorded all talks so I can watch them later. I
“just” missed a way to get part in chat and Q&A session at the end.
Started to ignore
I started to ignore most of invites to such events. Sooner or later videos from
them land online so I pick those which interest me.
Sitting in front of a screen and watching people talking is boring. Especially
after over two years of doing it because there was no other way.
It was yet another boring week. I got some thanks for work we did to get
Tensorflow running on AArch64 and was asked is there any other Python project
which could use our help.
I had some projects already on my list of things to check. And then found “Top
PyPI Packages” website…
Let install 5000 Python packages
So I got an idea — let me grab list of top5000 PyPI packages and check how many
of them have issues on AArch64.
The plan was simple. Loop over list and do 3 tasks:
create virtualenv
install package
destroy virtualenv
This way each package had the same environment and I did not had to worry about
version conflicts.
Virtualenv preparation
To not repeat creation of virtualenv five thousand times I did it once:
So if package provided only source tarball/zip then it was marked as failed.
There were 1569 packages which failed to pass this phase. Common issues (other
than missing some development headers):
INFO: pip is looking at multiple versions of PACKAGE_NAME to
determine which version is compatible with other requirements. This could
take a while.
INFO: pip is looking at multiple versions of <Python from Requires-Python>
to determine which version is compatible with other requirements. This could
take a while.
INFO: This is taking longer than usual. You might need to provide the
dependency resolver with stricter constraints to reduce runtime. See
https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort
this run, press Ctrl + C.
ERROR: Cannot install OTHER_PACKAGE_NAME because these package versions
have conflicting dependencies.
ERROR: Could not find a version that satisfies the requirement
OTHER_PACKAGE_NAME (from versions: none)
ERROR: Could not find a version that satisfies the requirement
OTHER_PACKAGE_NAME==N.V.R (from ANOTHER_PACKAGE_NAME) (from versions: x.y,
x.y.z, x.z.z)
ERROR: No matching distribution found for OTHER_PACKAGE_NAME
Note that failure at this phase is allowed as I just want ready to use wheel files.
Whole process took about 20 hours on Honeycomb.
Phase 2
The main difference was getting rid of “—only-binary” and “—no-compile”
options from pip install calls.
Still no additional development packages installed. Cache from phase 1 in use to
not re-download/re-build existing wheel files.
The main issue is how single threaded pip install is. Nevermind that
Honeycomb has 16 cpu cores — only one is used (and this is Cortex-A72 so
nothing fancy). This makes building times higher than they suppose to be:
Building wheels for collected packages: pandas, typing
Building wheel for pandas (setup.py): started
Building wheel for pandas (setup.py): still running...
Building wheel for pandas (setup.py): still running...
Building wheel for pandas (setup.py): still running...
Building wheel for pandas (setup.py): still running...
Building wheel for pandas (setup.py): still running...
Building wheel for pandas (setup.py): still running...
Building wheel for pandas (setup.py): still running...
Building wheel for pandas (setup.py): still running...
Building wheel for pandas (setup.py): still running...
Building wheel for pandas (setup.py): still running...
Building wheel for pandas (setup.py): still running...
There were 313 packages which failed to pass this phase. Issues were similar to
those in phase 1 with one exception (as building packages was allowed):
ERROR: Could not build wheels for OTHER_PACKAGE_NAME, which is required to
install pyproject.toml-based projects
This phase took about 13 hours on Honeycomb.
Phase 3
About 6% packages left. Now it is time to install some development headers:
blas-devel
bzip2-devel
cairo-devel
cyrus-sasl-devel
gmp-devel
gobject-introspection-devel
graphviz-devel
gtk3-devel
httpd-devel
krb5-devel
lapack-devel
libcap-devel
libcurl-devel
libicu-devel
libjpeg-devel
libmemcached-devel
mariadb-devel
ncurses-devel
openldap-devel
openssl-devel
poppler-cpp-devel
postgresql-devel
protobuf-compiler
unixODBC-devel
xmlsec1-devel
I created this list by checking how packages failed to build. It should be
longer but CentOS 7 (base of “manylinux2014” container image) does not provide
everything needed (for example up-to-date Rust compiler or LLVM).
Before starting phase 3 run I removed all entries related to “pyobjc” as they are
MacOS related so there is no need to waste time again.
After 3.5 hours I had another 54 packages built.
Phase 4
Some packages are not present in CentOS 7 but are present in EPEL repository. So
after enabling EPEL (yum install -y epel-release) I installed another set of
development packages:
augeas-devel
boost-devel
cargo
gdal-devel
leptonica-devel
leveldb-devel
suitesparse-devel
portaudio-devel
proj
protobuf-devel
rust
zbar-devel
Some of those packages should be installed in previous step. I did not caught
them because build processes failed earlier.
Before starting round I went through logs and removed everything:
failed with “No matching distribution for PACKAGE_NAME“
failed with “use_2to3 is invalid” (aka “I need old setuptools”)
requiring Bazel
requiring tensorflow
At the end I had about one hundred of failed to build packages. For different reasons:
missing build dependencies
expecting newer libraries than “manylinux2014” (CentOS 7) has
not listing all dependencies (everyone has “numpy” installed, right?)
being Python 2.7 only
using removed modules or classes
breaking install to say “this module is deprecated, use OTHER_NAME”
not supporting AArch64 architecture
Summary
One hundred of top five thousand packages equals two percent of failures. There
were 13 failures in top 1000, another 14 in second thousand.
Is 2% acceptable amount? I think that it is. Some improvements can still be made
but nothing requiring shown. OK, would be nice to get Tensorflow for AArch64
released by upstream under same name (instead of “tensorflow_aarch64” builds
done by team at Linaro).
How to run it?
After my tweet I had
several comments and people wanted to run this test on other architectures,
operating systems or devices. So I wrote simple script:
#!/bin/bash
echo "cleanup after previous runs"
rm -rf venvs/* logs/*
echo "Prepare clean virtualenv"
python3 -mvenv venvs/test
. venvs/test/bin/activate
pip install -U pip wheel setuptools
deactivate
cp -a venvs/test venvs/clean
echo "fetch and prepare top5000 list"
rm top-pypi-packages-30-days.*
wget https://hugovk.github.io/top-pypi-packages/top-pypi-packages-30-days.json
grep project top-pypi-packages-30-days.json \
|sed -e 's/"project": "\(.*\)"/\1/g' > top-pypi-packages-30-days.text
echo "go through packages"
mkdir -p logs
for package in `cat top-pypi-packages-30-days.text`; do
echo "processing ${package}"
rm -rf venvs/test
cp -a venvs/clean venvs/test
source venvs/test/bin/activate
pip install --no-input \
-U --upgrade-strategy=only-if-needed \
$package | tee logs/${package}.log
deactivate
echo "-----------------------------------------------------------------"
done
It should work on any operating system capable of running Python. All build
dependencies need to be installed first. I suggest mounting “tmpfs” over
“venvs/” directory as there will be lot of temporary i/o going on there.
Once it finish just run grep to check how many packages were installed with success:
grep "^Successfully installed" logs/*|wc -l
Please share your results. Contact page lists several ways to catch me.
Due to current events it was hard to concentrate on work lately. So I decided to
learn something new but still work related.
In OpenStack Kolla project we provide images with Collectd, Grafana, Influxdb,
Prometheus, Telegraf to give people options for monitoring their deployments. I
never used any of them. Until now.
Local setup
At home I have some devices I could monitor for random data:
TrueNAS based NAS
OpenWRT based router
OpenWRT based access point
HoneyComb arm devbox
other Linux based computers
Home Assistant
All machines can ping each other so sending data is not a problem.
Software
For software stack I decided to go for PIG — Prometheus, Influxdb, Grafana.
I used instruction from blog posts written by Chris Smart:
Read both — they have more details than I used to mention. Also more graphics.
OpenWRT
Turned out that OpenWRT devices can either use “collectd” or Prometheus node
exporter to gather metrics. I had first one installed already (to have some
graphs in webUI). If you do not then all you need is this set of commands:
TrueNAS already has reporting in webUI. Can also send data to remote Graphite
server. So again I had everything in place for gathering metrics.
Next step was sorting out data collector and visualisation. There is community
provided “Grafana & Influxdb” plugin. Installed it, gave jail a name, set to
request own IP and got “grafana.lan” running in a minute.
Configuration
At this moment everything is installed for both gathering and visualisation of
data. Time for some configuration.
Logged into jail (“sudo iocage console grafana”) and fun started.
Influxdb
First Influxdb needed to be configured to accept both “collectd” data from
OpenWRT nodes and “graphite” data from TrueNAS. Simple edit of
“/usr/local/etc/influxd.conf” file to have this:
[[collectd]]
enabled = true
bind-address = ":25826"
database = "collectd"
# retention-policy = ""
#
# The collectd service supports either scanning a directory for multiple types
# db files, or specifying a single db file.
typesdb = "/usr/local/share/collectd"
#
security-level = "none"
# auth-file = "/etc/collectd/auth_file"
Second one is simple edit of “/etc/collectd.conf” file:
LoadPlugin network
<Plugin network>
Server "192.168.202.183" "25826"
</Plugin>
No idea why it wants IP address instead of FQDN.
TrueNAS
TrueNAS is simple — visit webUI -> System -> Reporting and give address (IP or
FQDN) of remote graphite server.
TrueNAS System Reporting configuration
I enabled both “Report CPU usage in percent” and “Graphite separate instances”
checkboxes because of Grafana dashboard I use.
Grafana
Logged into http://grafana.lan:3000, setup admin account and then setup data
sources. Both will be “InfluxDB” type — one for ‘collectd’ database keeping
metrics from OpenWRT nodes, second for ‘graphite’ one with data from TrueNAS.
Use “http://localhost:8086” as URL, provide database names and name each data
source in a way telling you which one is which.
Visualisation
Ok, software installed on all nodes, configuration files edited. Metrics flow to
InfluxDB (you can check them using “show series” in Influx shell tool). Time to
setup some dashboards in Grafana.
This one looks like good base. Also suggests me to visit my server wardrobe and
check fans in NAS box.
Future
I need to alter both dashboards to make them show some really usable data. Then
add some alerts. And more nodes — there is still no Prometheus in use here
gathering metrics from my generic Linux machines. Nor my server.