Author: Alexey Milovidov, 2018-12-02.
— almost unknown to anyone;
— developed by one person;
— abandoned.
«EventQL is a distributed, column-oriented database built for large-scale event collection and analytics. It runs super-fast SQL and JavaScript queries».
Open-source since July 26, 2016
https://github.com/eventql/eventql (963 stars)
Written in C++11
Uses ZooKeeper for coordination
No dependencies except ZooKeeper
MPP, Distributed, Column-Oriented...
Scales to petabytes. Fast range scans...
Almost complete SQL 2009 support.
Real-time Inserts & Updates.
Automatic distributed partitioning.
ChartSQL.
Last commit May 4, 2017.
Website http://eventql.io/ doesn't load.
Last GitHub issue asking about development — unanswered.
Belongs to company DeepCortex, Berlin.
One C++ developer, one frontend developer.
Active development since 2014.
AGPL license.
Less than a year in open-source, product abandoned.
— developer moved to another company?
— company changed priorities?
— life circumstances changed?
— open-source caused by
lack of internal company development?
— just got bored?
ChartSQL inspired the implementation of chart functionality in Tabix interface for ClickHouse.
Interesting articles about system architecture in the blog
(can be read through web.archive.org or in source tree).
Well-organized code — there's something to learn.
Originally — ålenkå.
GPU database engine
https://github.com/antonmks/Alenka (1103 stars)
Written in CUDA, C++
One developer — Anton Starobinskiy (antonmks), Minsk
Apache 2.0 license
Has JDBC driver from Technica Corporation
Open-source, since January 26, 2012
Last commit — November 2016
Personal project
System is a research prototype
Poorly extensible codebase
Mark Litwintschik tests:
http://tech.marksblogg.com/alenka-open-source-gpu-database.html
Why abandoned?
— developer joined nVidia.
Increased interest in GPU database technologies
Possibility to use for research
See also:
MapD (now called OmniSci):
https://github.com/mapd/ (Apache 2.0)
Open-source since May 8, 2017
https://www.mapd.com
PGStorm: https://github.com/heterodb/pg-strom (GPLv2)
BrytlytDB: https://www.brytlyt.com/ (closed source)
Kinetica DB:
https://www.kinetica.com/ (closed source)
Polymatica BI:
https://www.polymatica.ru/ (closed source)
FPGA. Example: Kickfire (company closed)
DAX instruction set (SQL in Silicon) in SPARC processors
(decompression + filtering)
Offload filtering to SSD level:
https://www.vldb.org/pvldb/vol9/p924-jo.pdf
«Analytical database for unsorted data»
https://github.com/viyadb/viyadb (Apache 2.0)
Written in C++17
Open-source since February 28, 2018
One developer — Michael Spector
Good launch preparation:
https://habrahabr.ru/post/350154/
Medium, LinkedIn, Hacker News...
Last commit — April 26, 2018
Data entirely in RAM
Works on aggregated data
Weak SQL support (originally — queries in JSON)
Dynamically generates C++ code for query processing
Has cluster, uses Consul for coordination
There exists a proprietary system with a very similar name:
SAS Viya
I couldn't figure out whether this is a coincidence or not.
Comes from contradictory assumptions:
«Only in-memory database can handle random writes accompanied with analytical queries, which require full table scans».
— https://medium.com/viyadb/analyzing-mobile-users-activity-with-viyadb-c88a02104269
Only in-memory DB can handle continuous addition of events coming in unordered time stream and simultaneous processing of analytical queries.
???
Is the system worth studying?
Example: C++ code generation
... but, see also:
DBToaster:
https://dbtoaster.github.io/ (Apache 2.0)
EPFL research development (Switzerland)
C++ code generation vs. LLVM
Example: MemSQL switched mechanism from C++ to LLVM
in version 5 (March 30, 2016)
http://blog.memsql.com/memsql-5-ships/
Example: Cloudera Impala initially uses LLVM for code generation
Example: ClickHouse uses rudimentary C++ code generation mechanism, but mainly relies on vectorized query execution.
«LucidDB is the first and only open-source RDBMS purpose-built entirely for data warehousing and business intelligence».
https://github.com/LucidDB (Apache 2.0, previously GPLv2)
Company: The Eigenbase Project (USA), non-profit organization
+ LucidEra company (BI provider)
Java, some C++
Last commit 6 years ago
What was 6 years ago?
Well-extensible codebase
More than one developer
Good documentation (http://www.eigenbase.org/ doesn't load, part available on web.archive.org)
Rich functionality, good SQL support
Why did it die?
— lack of funding;
— no enthusiasts;
— LucidEra company closed;

Apache Calcite — "frontend" for SQL DBMS
(parsing, query analysis, optimization,
query plan, JDBC)
Used in Hive, Drill, Kylin, Samza, Storm, MapD...

Originally closed-source
Developed by Calpont company
October 2013 — open-source release, GPL 2.0
October 2014 — Calpont bankruptcy
https://github.com/infinidb/infinidb
Last commit — September 2014
MariaDB ColumnStore
https://github.com/mariadb-corporation/mariadb-columnstore-server
«Extreme Scale Transaction Processing»
http://www.infinisql.org/ (site available)
https://github.com/infinisql/infinisql (GPL 3.0, was AGPL)
Written in C++
Two developers
Open-source — November 25, 2013
Last commit — January 12, 2014
OLTP, in-memory
Has cluster. No fault tolerance.
Basic SQL support
Personal project.
Incomplete, abandoned.
Why abandoned?
— open-source release was motivated by hope to attract enthusiasts to the project, which was doomed to fail;
— developing a DBMS is complex, time-consuming and expensive.
«The open-source database for the realtime web»
Document-oriented (JSON)
Properly implemented replication (RAFT) and sharding
Support for realtime update subscriptions
Convenient ReQL query language and client libraries
Written in C++
Cool website: https://rethinkdb.com/
https://github.com/rethinkdb/rethinkdb/
In development since 2009
Decent number of developers
Excellent documentation
Active community
20,938 stars on GitHub!
2009 — company foundation, investments
Difficulties with positioning,
lack of commercial success.
October 2016 — company closure,
development team moves to Stripe
February 2017 — thanks to donations, managed to buy RethinkDB rights and transfer them to The Linux Foundation.
License changed from AGPL to Apache 2.
2017-2018 — development continues, but much slower.
Story about mistakes from company founder:
http://www.defmacro.org/2017/01/18/why-rethinkdb-failed.html
«Native XML Database System»
Developed by ISP RAS
https://github.com/sedna/sedna (Apache 2.0)
Last commit — 2013
GOODS, POST++, ShMem, FastDB, GigaBASE, MiniDB, PERST, DyBASE...
IMCS (In-Memory Columnar Store)
https://github.com/knizhnik/imcs
PostgreSQL extension for storing
and processing time series
Use-case — exchange data.
Weak SQL integration (essentially its own language inside Postgres).
Personal project: changing circumstances, loss of interest, underestimation of effort.
Startup: lack of niche, difficulty in market positioning, loss of funding.
Company side-product:
— departure of key developers;
— cessation of open-source development support;
— release to open-source due to bankruptcy;
— release to open-source by mistake.
Institute: research project, research completed.
1. Scaling development.
2. Clear positioning.
3. Focus on specific niche.
4. Reliable support from parent company.
5. Non-restrictive license.
6. Advantages must come from fundamental reasons.
7. Community development support.
