Developers Remained Unknown

Author: Alexey Milovidov, 2018-12-02.

"Developers Remained Unknown"

Craft Databases

— almost unknown to anyone;

— developed by one person;

— abandoned.

EventQL

EventQL

«EventQL is a distributed, column-oriented database built for large-scale event collection and analytics. It runs super-fast SQL and JavaScript queries».

Open-source since July 26, 2016

https://github.com/eventql/eventql (963 stars)

Written in C++11

Uses ZooKeeper for coordination

No dependencies except ZooKeeper

EventQL

MPP, Distributed, Column-Oriented...

Scales to petabytes. Fast range scans...

Almost complete SQL 2009 support.

Real-time Inserts & Updates.

Automatic distributed partitioning.

ChartSQL.

EventQL

Last commit May 4, 2017.

Website http://eventql.io/ doesn't load.

Last GitHub issue asking about development — unanswered.

EventQL

Belongs to company DeepCortex, Berlin.

One C++ developer, one frontend developer.

Active development since 2014.

AGPL license.

Less than a year in open-source, product abandoned.

EventQL

— developer moved to another company?

— company changed priorities?

— life circumstances changed?

— open-source caused by
lack of internal company development?

— just got bored?

EventQL, Legacy

ChartSQL inspired the implementation of chart functionality in Tabix interface for ClickHouse.

Interesting articles about system architecture in the blog
(can be read through web.archive.org or in source tree).

Well-organized code — there's something to learn.

Alenka

Originally — ålenkå.

Alenka

GPU database engine

https://github.com/antonmks/Alenka (1103 stars)

Written in CUDA, C++

One developer — Anton Starobinskiy (antonmks), Minsk

Apache 2.0 license

Has JDBC driver from Technica Corporation

Alenka

Open-source, since January 26, 2012

Last commit — November 2016

Personal project

System is a research prototype

Poorly extensible codebase

Mark Litwintschik tests:
http://tech.marksblogg.com/alenka-open-source-gpu-database.html

Alenka

Why abandoned?

— developer joined nVidia.

Alenka, Legacy

Increased interest in GPU database technologies

Possibility to use for research

Alenka

See also:

MapD (now called OmniSci):
https://github.com/mapd/ (Apache 2.0)
Open-source since May 8, 2017
https://www.mapd.com

PGStorm: https://github.com/heterodb/pg-strom (GPLv2)
BrytlytDB: https://www.brytlyt.com/ (closed source)

Kinetica DB:
https://www.kinetica.com/ (closed source)

Polymatica BI:
https://www.polymatica.ru/ (closed source)

Other Hardware Options

FPGA. Example: Kickfire (company closed)

DAX instruction set (SQL in Silicon) in SPARC processors
(decompression + filtering)

Offload filtering to SSD level:
https://www.vldb.org/pvldb/vol9/p924-jo.pdf

ViyaDB

ViyaDB

«Analytical database for unsorted data»

https://github.com/viyadb/viyadb (Apache 2.0)

Written in C++17

Open-source since February 28, 2018

One developer — Michael Spector

ViyaDB

Good launch preparation:

https://viyadb.com/

https://habrahabr.ru/post/350154/

Medium, LinkedIn, Hacker News...

Last commit — April 26, 2018

ViyaDB

Data entirely in RAM

Works on aggregated data

Weak SQL support (originally — queries in JSON)

Dynamically generates C++ code for query processing

Has cluster, uses Consul for coordination

ViyaDB

There exists a proprietary system with a very similar name:

SAS Viya

I couldn't figure out whether this is a coincidence or not.

ViyaDB

Comes from contradictory assumptions:

«Only in-memory database can handle random writes accompanied with analytical queries, which require full table scans».

https://medium.com/viyadb/analyzing-mobile-users-activity-with-viyadb-c88a02104269

Only in-memory DB can handle continuous addition of events coming in unordered time stream and simultaneous processing of analytical queries.

???

ViyaDB

Is the system worth studying?

Example: C++ code generation

... but, see also:

DBToaster:
https://dbtoaster.github.io/ (Apache 2.0)
EPFL research development (Switzerland)

ViyaDB

C++ code generation vs. LLVM

Example: MemSQL switched mechanism from C++ to LLVM
in version 5 (March 30, 2016)
http://blog.memsql.com/memsql-5-ships/

Example: Cloudera Impala initially uses LLVM for code generation

Example: ClickHouse uses rudimentary C++ code generation mechanism, but mainly relies on vectorized query execution.

LucidDB

LucidDB

«LucidDB is the first and only open-source RDBMS purpose-built entirely for data warehousing and business intelligence».

https://github.com/LucidDB (Apache 2.0, previously GPLv2)

Company: The Eigenbase Project (USA), non-profit organization
+ LucidEra company (BI provider)

Java, some C++

Last commit 6 years ago

LucidDB

What was 6 years ago?

Well-extensible codebase

More than one developer

Good documentation (http://www.eigenbase.org/ doesn't load, part available on web.archive.org)

Rich functionality, good SQL support

LucidDB

Why did it die?

— lack of funding;

— no enthusiasts;

— LucidEra company closed;

LucidDB, Legacy

Apache Calcite — "frontend" for SQL DBMS
(parsing, query analysis, optimization,
query plan, JDBC)

http://calcite.apache.org/

Used in Hive, Drill, Kylin, Samza, Storm, MapD...

Apache Calcite

InfiniDB

InfiniDB

Originally closed-source

Developed by Calpont company

October 2013 — open-source release, GPL 2.0

October 2014 — Calpont bankruptcy

https://github.com/infinidb/infinidb

Last commit — September 2014

InfiniDB, Legacy

MariaDB ColumnStore

https://github.com/mariadb-corporation/mariadb-columnstore-server

InfiniSQL

InfiniSQL

«Extreme Scale Transaction Processing»

http://www.infinisql.org/ (site available)

https://github.com/infinisql/infinisql (GPL 3.0, was AGPL)

Written in C++

Two developers

Open-source — November 25, 2013

Last commit — January 12, 2014

InfiniSQL

OLTP, in-memory

Has cluster. No fault tolerance.

Basic SQL support

Personal project.

Incomplete, abandoned.

InfiniSQL

Why abandoned?

— open-source release was motivated by hope to attract enthusiasts to the project, which was doomed to fail;

— developing a DBMS is complex, time-consuming and expensive.

RethinkDB

RethinkDB

«The open-source database for the realtime web»

Document-oriented (JSON)

Properly implemented replication (RAFT) and sharding

Support for realtime update subscriptions

Convenient ReQL query language and client libraries

Written in C++

RethinkDB

Cool website: https://rethinkdb.com/

https://github.com/rethinkdb/rethinkdb/

In development since 2009

Decent number of developers

Excellent documentation

Active community

20,938 stars on GitHub!

RethinkDB

2009 — company foundation, investments

Difficulties with positioning,
lack of commercial success.

October 2016 — company closure,
development team moves to Stripe

February 2017 — thanks to donations, managed to buy RethinkDB rights and transfer them to The Linux Foundation.
License changed from AGPL to Apache 2.

RethinkDB

2017-2018 — development continues, but much slower.

Story about mistakes from company founder:

http://www.defmacro.org/2017/01/18/why-rethinkdb-failed.html

Sedna

Sedna

«Native XML Database System»

Developed by ISP RAS

https://www.sedna.org/

https://github.com/sedna/sedna (Apache 2.0)

Last commit — 2013

DBMSs by Konstantin Knizhnik

DBMSs by Konstantin Knizhnik

http://garret.ru/

GOODS, POST++, ShMem, FastDB, GigaBASE, MiniDB, PERST, DyBASE...

DBMSs by Konstantin Knizhnik

IMCS (In-Memory Columnar Store)

https://github.com/knizhnik/imcs

PostgreSQL extension for storing
and processing time series

Use-case — exchange data.

Weak SQL integration (essentially its own language inside Postgres).

Conclusions

Reasons for Abandoned Open-Source

Personal project: changing circumstances, loss of interest, underestimation of effort.

Startup: lack of niche, difficulty in market positioning, loss of funding.

Company side-product:
— departure of key developers;
— cessation of open-source development support;
— release to open-source due to bankruptcy;
— release to open-source by mistake.

Institute: research project, research completed.

What's Needed for Living Open-Source?

1. Scaling development.

2. Clear positioning.

3. Focus on specific niche.

4. Reliable support from parent company.

5. Non-restrictive license.

6. Advantages must come from fundamental reasons.

7. Community development support.


source: imdb.com

?