Export Jupyter Notebook as Tex on CentOS

Today I tried to figure out how to print a Jupyter Notebook as Tex file including all images on a CentOS 7 installation. After a while I found the following packages which are required to succesfully generate the files:

yum -y install texlive texlive-latex texlive-xetex
yum -y install texlive-collection-latex
yum -y install texlive-collection-latexrecommended
yum -y install texlive-xetex-def
yum -y install texlive-collection-xetex
yum -y install texlive-collection-latexextra
yum -y install texlive-adjustbox
yum -y install texlive-upquote

After that, just call:

pdflatex mytex.tex

Building OpenCV 3.4 on Raspberry Pi

I’ve read many tutorials about building and installing OpenCV on a Raspberry Pi but they either did not work or were outdated. Hence, I decided to write down all steps required to build and install OpenCV 3.4.

 
# Update your system
sudo rpi-update
sudo apt-get update
sudo apt-get upgrade
 
# Change swap size
sudo nano /etc/dphys-swapfile
# Change this... CONF_SWAPSIZE=100 to this
CONF_SWAPSIZE=1024
 
# Restart service
sudo /etc/init.d/dphys-swapfile stop
sudo /etc/init.d/dphys-swapfile start
 
# Install packages
sudo apt-get install build-essential cmake cmake-curses-gui pkg-config
sudo apt-get install libatlas-base-dev gfortran
sudo apt-get install \
  libjpeg-dev \
  libtiff5-dev \
  libjasper-dev \
  libpng12-dev \
  libavcodec-dev \
  libavformat-dev \
  libswscale-dev \
  libeigen3-dev \
  libxvidcore-dev \
  libx264-dev \
  libgtk2.0-dev
 
sudo apt-get -y install libv4l-dev v4l-utils
 
sudo apt-get install python2.7-dev
sudo apt-get install python3-dev
 
pip install numpy
pip3 install numpy
 
# Install OpenCV
wget https://github.com/opencv/opencv/archive/3.4.0.zip -O opencv.zip
wget https://github.com/opencv/opencv_contrib/archive/3.4.0.zip -O opencv_contrib.zip
 
unzip opencv.zip
unzip opencv_contrib.zip
 
cd opencv-3.4.0
mkdir build
cd build
 
sudo cmake -D CMAKE_BUILD_TYPE=RELEASE \
    -D CMAKE_INSTALL_PREFIX=/usr/local \
    -D BUILD_WITH_DEBUG_INFO=OFF \
    -D BUILD_DOCS=OFF \
    -D BUILD_EXAMPLES=OFF \
    -D BUILD_TESTS=OFF \
    -D BUILD_opencv_ts=OFF \
    -D BUILD_PERF_TESTS=OFF \
    -D INSTALL_C_EXAMPLES=OFF \
    -D INSTALL_PYTHON_EXAMPLES=ON \
    -D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib-3.4.0/modules \
    -D ENABLE_NEON=ON \
    -D WITH_LIBV4L=ON \
        ..
 
sudo make -j3
sudo make install
sudo ldconfig
 
# Fix library file name
cd /usr/local/lib/python3.5/dist-packages/
sudo mv cv2.cpython-35m.so cv2.so

Quickly executing multiple hbase commands from shell

We tend to setup our hbase test tables by shell scripts. Hence we have to execute multiple puts and deletes in a row. When you add an hbase shell [command] for each single operation you will have to wait a long time until all commands are processed. A more performant proceeding is to put all commands in one single <<EOF … EOF construct which basically allows to pass a multi line parameter to the shell.

hbase shell <<EOF
put 'shared:ITEMS','item0001','c:value','1.0'
put 'shared:ITEMS','item0002','c:value','2.0'
put 'shared:ITEMS','item0003','c:value','3.0'
put 'shared:ITEMS','item0004','c:value','4.0'
put 'shared:ITEMS','item0005','c:value','5.0'
put 'shared:ITEMS','item0006','c:value','6.0'
EOF

Doing so hbase will only start one shell instead of starting an instance for each single command.

Compiling the Alize speaker detection library in Visual Studio 2015

To compile the ALIZE speaker detection library download the projects alize-core and LIA_RAL from https://github.com/ALIZE-Speaker-Recognition. Unzip both in the same parent directory and rename the alize-core folder to ALIZE.

Open the SLN file in the ALIZE directory and build it, there should be no errors.

Afterwards open the SLN file in the LIA_RAL folder. Before you start the build process you have to make changes to two files. In Macros.h change line 301 from:

#if defined(_MSC_VER) && (!defined(__INTEL_COMPILER))

to:

#if defined(_MSC_VER) && (_MSC_VER < 1900) && (!defined(__INTEL_COMPILER))

Then change the following in MapBase.h. Add the following line after line 172 (before the keyword public:):

typedef MapBase<Derived, ReadOnlyAccessors> ReadOnlyMapBase;

Afterwards, change the following line:

Base::Base::operator=(other);

to:

ReadOnlyMapBase::Base::operator=(other);

And a few lines later change:

using Base::operator=;

to:

using ReadOnlyMapBase::Base::operator=;

Now open the SLN file in the LIA_RAL folder and build the project liatools. Make sure that you pick the same platform as you did before when compiling ALIZE.

ClassNotFoundException in Spark application using KryoSerializer

We frequently encoutered a ClassNotFoundException in our Java based Spark applications for classes that we verifiably included in our application’s JAR. Furthermore, we used the kryoSerializer (org.apache.spark.serializer.KryoSerializer) for performance reasons.

After some very annoying debugging sessions we found out that we can get rid of the exception by registering the apparently missing classes by adding them to the spark configration item org.apache.spark.serializer.KryoSerializer. This property is a simple comma separated list of full qualified class names. After adding each class the ClassNotFoundException disappeared.

Cross compiling OpenCV for ARM from Ubuntu

Similar to the compilation of dlib for ARM on Ubuntu (see https://www.jofre.de/?p=1494) I’d like to provide a short guilde on how to compile OpenCV for ARM on Ubuntu.

First, install all required packages:
sudo apt-get install build-essential git cmake pkg-config libjpeg8-dev libtiff5-dev libjasper-dev libpng12-dev libavcodec-dev libavformat-dev libswscale-dev libv4l-dev libgtk2.0-dev libatlas-base-dev
gfortran

Download/clone OpenCV from github

Build OpenCV
mkdir build
cd build
cmake -DSOFTFP=ON -DCMAKE_TOOLCHAIN_FILE=/home/user1/opencv/platforms/linux/arm-gnueabi.toolchain.cmake ..

make

Cross compiling dlib for ARM on Ubuntu

Since I could not find any instruction to compile dlib for ARM (to use it e.g. on a raspberry pi) I decided to write one.

First, install all required packages:
sudo apt-get install libc6-armel-cross libc6-dev-armel-cross binutils-arm-linux-gnueabi pkg-config libx11-dev libatlas-base-dev libgtk-3-dev libboost-all-dev build-essential cmake libncurses5-dev gcc-arm-linux-gnueabihf g++-arm-linux-gnueabihf

Download dlib and upzip:
wget http://dlib.net/files/dlib-19.8.zip
unzip dlib-19.8.zip

Compile:
cmake -DCMAKE_C_FLAGS=”-O3 -mfpu=neon -fprofile-use -DENABLE_NEON” -DNEON=ON -DCMAKE_C_COMPILER=/usr/bin/arm-linux-gnueabihf-gcc -DCMAKE_CXX_COMPILER=/usr/bin/arm-linux-gnueabihf-g++ -DCMAKE_CXX_FLAGS=”-std=c++11″ –build –config Release ..

make

sudo make install

Calculate prime numbers using spark

Hi, for a test I wrote a short java application that calculates prime numbers on a distributed spark cluster between 0 and 1000000. Since spark 2.x examples are rare on the internet I just leave this here. Prime number code is by Oscar Sanchez.

import org.apache.spark.api.java.function.MapFunction;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.SparkSession;
import scala.Tuple2;
 
public class Main {
 
  private static boolean isPrime(long n) {
    for (long i = 2; 2 * i < n; i++) {
      if (n % i == 0) {
        return false;
      }
    }
    return true;
  }
 
  public static void main(String[] args) {
    SparkSession spark = SparkSession.builder().appName("PrimeApp").getOrCreate();
    Dataset<Tuple2<Long, Boolean>> rnd = spark.range(0L, 1000000L).map(
      (MapFunction<Long, Tuple2<Long, Boolean>>) x -> new Tuple2<Long, Boolean>(x, isPrime(x)), Encoders.tuple(Encoders.LONG(), Encoders.BOOLEAN()));
    rnd.show(false);
    spark.stop();
  }
 
}

Reading Parquet Files from a Java Application

Recently I came accross the requirement to read a parquet file into a java application and I figured out it is neither well documented nor easy to do so. As a consequence I wrote a short tutorial. The first task is to add your maven dependencies.

<dependencies>
 <dependency>
 <groupId>org.apache.parquet</groupId>
 <artifactId>parquet-hadoop</artifactId>
 <version>1.9.0</version>
 </dependency>
 <dependency>
 <groupId>org.apache.hadoop</groupId>
 <artifactId>hadoop-common</artifactId>
 <version>2.7.0</version>
 </dependency>
</dependencies>

To write the java application is easy once you know how to do it. Instead of using the AvroParquetReader or the ParquetReader class that you find frequently when searching for a solution to read parquet files use the class ParquetFileReader instead. The basic setup is to read all row groups and then read all groups recursively.

Continue reading “Reading Parquet Files from a Java Application”

Installing VirtualBox Guest Additions and fix the KERN_DIR issue

When installing the VirtualBox Guest Additions on CentOS 7 you might get an error saying the sources of the linux kernel could not be found.

Error: unable to find the sources of your current Linux kernel. Specify KERN_DIR= and run Make again. Stop.

This error can be solved by installing the kernel developer package and setting the KERN_DIR environment variable.

$ sudo yum install kernel-devel
$ sudo uname -r   # store the variable
$ echo export KERN_DIR=/usr/src/kernels/[uname output] >> ~/.bashrc

So what we do here is to intall the kernel-developer package and, get the current kernel version and set the environment variable to KERN_DIR. Now restart the virtual machine and install the Guest Additions again. The error should have gone.

Furthermore, developer tools need to be installed via yum groupinstall ‘Developer Tools’. The group name may differ if you are using a different language but english.