What about those 19 Oracle features that I thought I would miss? ⭕🧪🚀 — a comparison with YugabyteDB
In 2019 I made a “19 features you will miss if you leave Oracle Database” presentation, about the features that are used every day with Oracle, and that may not be available in other databases. Because, as a consultant, I’ve seen many enterprises thinking about moving to open-source databases, especially PostgreSQL. They look at the marketing slides listing the features, but miss the real details. The goal of the talk was not to influence any decision. Moving from commercial database is not really about covering a full set of features, but a strategy to stay in control of the software that processes and stores the enterprise data. However, knowing the features that you may miss, or have to workaround, is important to avoid surprises.
The slides are available from the DOAG conference:
and a recording from the presentation for AskTom:
19 Critical Features Customers Lose When the Leave the Oracle Database
Now, I did this move myself. I’m now Developer Advocate for Yugabyte, the open source Distributed SQL database. I’ve been asked if, on my turn, I miss those features or not. I’ll take them one by one. And flag them with:
- 🚀 for the features that are also there, with equivalent behavior, in YugabyteDB,
- with ⭕ the one where we still have to catch-up with Oracle to get at the same level,
- and with 🧪 those where I’m not sure we need them, or at least in the same way. Your feedback in comments will be appreciated, especially for those points.
🚀 1- lock-free SELECT
This one is easy… we have no locks 😎… YugabyteDB has its roots in optimistic locking, which avoids surprises with response time, and prevents non-scalable waits. In case of a transaction conflict, the user gets an error and has to retry the transaction. This means that the application code has to handle it. Anyway, if application is not ready for that, you may have issues in a Highly Available system where, in case of failover to another node, the current transaction must be cancelled and retried when re-connecting. The YugabyteDB default isolation level is SERIALIZABLE (which is higher than the Oracle one with the same name — see https://blog.dbi-services.com/oracle-serializable-is-not-serializable). Of course, this approach may not be the one expected by legacy applications that heavily rely on pessimistic locking. And this is why we will support the lower isolation level READ COMMITTED (https://github.com/yugabyte/yugabyte-db/issues/5680). But the idea is still non-blocking reads. Like Oracle or PostgreSQL
⭕ 2- fast load INSERT
This is subject to discussion. On one hand, many databases have implemented some fast-load operations at some points, like Oracle direct-path inserts. But there is a drawback: two write paths, which increases the complexity of code, and risk of bugs. YugabyteDB stores table rows, and index entries, in DocDB, which is similar to RocksDB. We could imagine building directly the SST files, rather than going though the Raft consensus and the memtable. But this requires an alternative code path for it, and synchronization with metadata. This increases the complexity, and and open-source software should stay simple to allow contributions. And, finally, do we need that? the LSM Tree structure is optimized for writes. We have already some ways to improve COPY when there’s no need to go to the transactional layer — which is the case for the first load into empty tables. I’ve gathered some tips for initial loading: https://dev.to/yugabyte/copy-progression-in-yugabytedb-4ghb. But we definitely need to document it better and see how to improve the bulk load experience.
🧪 3- in-place UPDATE
The YugabyteDB storage is LSM Tree (Log-structure merge-tree), where changes are appended, rather than modifying existing blocks. There is no in-place update. However, we don’t have the same problems as in PostgreSQL, for many reasons. PostgreSQL copies the whole row when a column is updated, and generates the whole block in the WAL. YugabyteDB generates a new change only for the updated column, and the WAL is at DocDB level and protects the memtable only. When it comes to disk, the flush to SST file doesn’t need more protection because it writes the whole file from beginning to end. And further writes during compaction are similar. There is no need of full-block logging in this case. So, no in-place update, but without the same level of bloat that we see in PostgreSQL. Oracle needs to avoid row movement, because this changes the ROWID and all indexes have to be updated. Because rows are stored in heap tables. YugabyteDB tables are stored in key-value datastore, and secondary indexes use the primary key to address them. Then, out-of-place updates do have to change the index pointers. And when changes are appended as a new version, there’s no need to generate rollback segments.
🚀 4- index only access
YugabyteDB supports Index Only access. MVCC implementation is for table rows and index entries, so there is no need to look at the table to get visibility, like with PostgreSQL. I’ve written about this in https://blog.yugabyte.com/how-a-distributed-sql-database-boosts-secondary-index-queries-with-index-only-scan. Index Only Scan is even more important in a distributed database where it saves the expensive RPC to get table rows.
🚀 5- no need to reorg
Oracle does a lot of work during writes in order to leave data structure optimized for queries. PostgreSQL does the minimum and requires VACUUM to cleanup, and maybe some reorganization, and index rebuilt, later. With YugabyteDB this reorganization is automated by the SST file compaction. There is nothing like VACUUM and index fragmentation. LSM Tree and SST Files is the modern structure for databases that need to scale and and have files in SSD, like BigTable, Cassandra, HBASE, RocksDB…
🧪 6- cursor sharing
We use the PostgreSQL query planner. Generic plans can be shared within a session, with prepared statements, but not among sessions. Some years ago, I was convinced that the absence of a shared library cache was a problem in open source databases. I’ve changed my mind on this. Because applications change, and because hardware change. With Oracle, you pay an expensive license per CPU and you don’t want to over-provision the capacity to compile and re-compile always the same statements. But you pay the price for it: shared memory structures are sources of memory errors, because of fragmentation (remember those ORA-4031), and latch bottleneck (library cache wait events). With PostgreSQL, the CPU consumption is a smaller problem. And with distributed database, you can scale out if one server is at full CPU usage. On the other side, contention on shared structure is something to avoid at all prices on high-performance applications, because it cannot scale. So, no cursor sharing between sessions, here, but that’s by design. We will improve the caching of dictionary metadata, but I don’t think we will have to implement a shared pool.
🚀 7- partitioning
I mentioned partitioning as a feature that can be missed when moving from Oracle to PostgreSQL because of the many advanced features of Oracle. Mainly global indexes, which are not possible in PostgreSQL, but required when scaling OLTP. You can partition (declarative partitioning) and even shard (CitusDB extension), but an index cannot cover multiple partitions. With YugabyteDB, we have the the same with YSQL partitioning. In addition to that, we have sharding at the storage level, into tablets. At this level, all indexes are global: they are hashed and range sharded on their columns, which can be different than the table primary key. So, basically, we have partitioning at two levels. This is the foundation of scale-out and geo-partitioning. I’ll publish soon about this in https://blog.yugabyte.com/author/fpachot
⭕ 8- optimizer features enable
Clearly, I’ve seen no software with the same ability as the Oracle optimizer to run as-of any previous version. This is important for a database where each upgrade may bring severe execution plan regressions. But it comes with a price: the complexity of code. An open-source database must have algorithms and code that are understandable by the large majority of contributors. This is the case with PostgreSQL and we have the same in YugabyteDB. We have other features to reduce the risk of regression. Because we can do rolling upgrades, you can run one node with the new version and connect to it from a few session. I don’t think we need this in an open-source database, but I’m still full of admiration of Oracle optimizer developers for that.
🚀 9- hints
Yes, I’m 100% convinced that we need hints. When testing and understanding the query planner behavior, and also to workaround some situations where the software cannot guess our distribution of data. Hints exists in PostgreSQL, but with an extension, and finally it is not there when we need it. We have this enabled by default in YugabyteDB. Here is an example to refine the cardinality estimations: https://dev.to/yugabyte/pghintplan-and-single-table-cardinality-correction-3j45
🚀 10- crash/media recovery
This is the most important for a database: never corrupt data whatever failure happens. I’ve never had any doubts after an Oracle database crash. Even if I avoid to do it in production, “startup force”, which does a “shutdown abort”, is ok to restart the instance because the recovery process is reliable. Resilience is also at the heart of YugabyteDB: the writes to memory are protected by WAL, and on multiple nodes through the Raft consensus. You can kill a node and, not only the data is not corrupt, but many connections don’t even have to re-connect. Similar to RAC, but without the complexity of interconnect and shared disks.
🧪 11- RMAN
The backups, in YugabyteDB, work differently, because SST files are best suited to snapshots. They can be run manually, or be automated by the managed service, in our cloud or the management platform. And there is a repository in the yb-master to get consistent snapshots for the whole cluster. This is for Point-In-Time Recovery, in case of user or application error. For infrastructure failures, the RPO=0 (recovery of all transactions until failure) is fully automated by the distributed nature of the database, replication factor and Raft protocol. I consider this at the same level of RMAN. What is missing currently is the ability do duplicate a database with a single command. And, I think we will need to provide thin clones on those snapshots.
🧪 12- My Oracle Support
This point was about the knowledge base, for a database that has 30 years of history. YugabyteDB is younger. We still have a lot to document, but we are also much more responsive on tickets. It is just not comparable between commercial and open-source, old database kernel vs fast growing new database. Sharing problems and solutions between users and with contributors is on on github and slack.
⭕ 13- wait events
Yes, this is the one I miss the most. Oracle reached a level of instrumentation, with timed wait events, and active session sampling. As we are open-source and running on Linux, there are many ways to profile the processes. You can read Frits Hoogland https://dev.to/fritshooglandyugabyte/time-is-on-my-side-wait-events-1mn8 to get an example. But, yes, I miss timed wait events, AWR reports and SQL Monitor. We are also limited by the compatibility with PostgreSQL where the proposals for better instrumentation are regularly rejected. Let’s see what will come in future versions. Probably, the “workload repository” will not be stored in the database. Cloud native applications export to Prometheus and display in Grafana.
🚀 14- upgrade and migration
This is where a distributed database, active on all nodes, shows operational excellence. You upgrade without application interruption, restarting the nodes with the new version, one at a time. And you migrate by adding nodes to the new platform, which can even be another could provider.
🧪 15- multiplatform
Oracle used to run on many platforms but they focus mainly on Exadata today,. YugabyteDB runs on Linux, and this is probably the only OS that makes sense today. I run it on Windows with Docker. Written in C and C++, the portability is possible. For example, the support for Arm processors is coming with lower cost on AWS Graviton 2. But we will probably never compile for AIX or Solaris.
⭕ 16- materialized views
We don’t have materialized views. I see four reasons why it is not a high priority for YugabyteDB:
- We focus on OLTP. One major use-case for mviews is building aggregates in data warehouse
- We are a distributed database. One common use-case for mviews is distribute data through database links
- In cloud-native applications, Change Data Capture is the preferred way of propagating changes to reporting services. No need for those old snapshot logs. CDC, Debezium and Kafka, can stream changes to the right cloud service for analytics.
- Finally, the few users who thought they needed materialized views have solved their problem with secondary indexes. Because we have Index Only access and Global Indexes (see above).
🚀 17- flashback
Many flashback features of Oracle are based on MVCC (Multi-Version Concurrency Control) with SCN (System Change Number) which can be mapped to a timestamp. PostgreSQL is MVCC but only the tables (not the index) are versioned, and with a xid (transaction ID) rather than a timestamp. The old “Time Travel” has been abandoned since Postgres 6.2. YugabyteDB is versioning all changes to the table rows and index entries, and with the Hybrid Logical Clock (HLC) that maps to timestamps. Even if not all is implemented yet, all flashback features are possible natively.
🚀 18- Data Guard
I was mentioning Data Guard because Oracle has, in my opinion, one of the most mature log-streaming replication, from the performance and reliability of log-shipping, to the easiness to avoid split-brain and reinstate after a failover. But, by nature, YugabyteDB with Replication Factor RF=3 goes further because it is a distributed SQL database. All nodes are read-write active, failover is transparent to the majority of connections, and the others can just reconnect immediately (RPO=0 RT=3 seconds)
19- Other features
The 19th slide was a pot-pourri of many ones. I’ll cover them quickly:
🚀Online operations to save your week-ends
One goal of distributed databases is to have the DB always up. Many operations default to online. For examples a simple CREATE INDEX is online, and the additional syntax NONCONCCURENTLY is there to allow faster DML on empty tables. Restarting the database can be done during business hour with rolling restart.
🚀Parallel Query to scale on multi-core processors
YugabyteDB is multi-threaded: multiple CPU are used even without specifying parallel DML. For the moment, we don’t use the PostgreSQL parallel query possibilities. But, being sharded at storage level, it is easy process tablets in parallel using the yb_hash_code() function.
🧪 Data Pump to copy, migrate, archive, convert,…
We use pg_dump for logical export/import, to allow full PostgreSQL compatibility. There are some features missing when compared to Oracle Data Pump, but also other advantages. For example, Oracle has never implemented a data unloader (the Data Pump format is proprietary) by fear of migrations to competitors. A YugabyteDB export is easy to import in PostgreSQL. This is why you can start your new application with Yugabyte and still decide to move to PostgreSQL if you want later.
🚀Audit without any additional development
This comes with the pgaudit extension from PostgreSQL: https://docs.yugabyte.com/latest/secure/audit-logging/
🚀 VPD to secure and isolate data without changing the code
This uses the PostgreSQL RLS (Row Level Security) and is very useful for multi-tenancy: https://dev.to/yugabyte/hyper-scale-multi-tenant-for-saas-an-example-with-pgbench-33a3
🚀CMAN to have a single network address with HA and DR
There are many options for this. PostgreSQL has some extension to load-balance. We can also use HA Proxy, or a Load Balancer. On Kubernetes, we recommend a Headless Service, but a proxy is possible. And we also provide the JDBC Smart Driver to simplify the application configuration (https://docs.yugabyte.com/latest/integrations/jdbc-driver). Everything is easier when when all nodes are equal, with no need to isolate read-only workloads.
🧪 PL/SQL, EBR, APEX… to rapidly build efficient applications
About applications, we benefit from all the PostgreSQL features and the ecosystem of frameworks and middleware. There are differences of course, but everything is possible, running the code in the database, or in the application.
🧪 DRCP so scale the many micro-service connection pools
We have no connection pool in the database, but you can run pgbouncer on each node, which is similar.
🚀 JSON support for new data with same performance and reliability
PostgreSQL is, with Oracle Database, very advanced with JSON datatype. This is available in the YSQL API. YugabyteDB also provides a Cassandra-like API with YCQL, adding a native JSON datatype to it. And in the open-source area, there are interesting projects for providing a document API on top of the PostgreSQL protocol, like this MongoDB proxy: https://dev.to/yugabyte/open-sourcemongodb-api-to-yugabytedb-with-mangodb-proxy-22ka
🧪 20- Block Chain table
I was presenting this at Oracle Open World, where the Block Chain table was announced for 20c. In a cloud-native environment, this special use case will probably be a micro-services using a dedicated database. As an AWS Solution Architect, I would use Amazon Quantum Ledger Database (QLDB) for this. Should we have a Foreign Data Wrapper to query the PartiQL API of QLB? Or add the PL/Python language extension to call the procedural API? Again, when there is a need for it, the right way is git issue and pull request.
If some feature is missing in your conversion from commercial databases to open-source distributed SQL database, YugabyteDB is open-source and PostgreSQL compatible. This means that you can contribute, or make an extension work. Here is a an example where Radek Gruchalski submitted a PR to get the Foreign Data Wrapper working: https://gruchalski.com/posts/2021-11-08-yugabytedb-postgres-foreign-data-wrapper. He made the feature working, and we all benefit from it. I’ve mentioned many points that are not yet there but in the roadmap. Priority is driven by the customers and the community of users. And if you want to go further, like I did, we are hiring at Yugabyte!