Scalability with MariaDB and MaxScale talks about MariaDB 10, and MaxScale, a pluggable router for your queries. These are technologies developed at MariaDB Corporation, made opensource, and will help scale your MariaDB and MySQL workloads
DevoxxFR 2024 Reproducible Builds with Apache Maven
[db tech showcase Tokyo 2014] B15: Scalability with MariaDB and MaxScale by MariaDB Corporation Colin Charles
1. Scalability with MariaDB and
MaxScale
Colin Charles, Team MariaDB, MariaDB Corporation
colin@mariadb.org | http://mariadb.org/
http://bytebot.net/blog/ | @bytebot on Twitter
db tech showcase, Tokyo, Japan
11 November 2014
2. whoami
• Work on MariaDB at MariaDB Corporation (SkySQL Ab)
• Merged with Monty Program Ab, makers of MariaDB
• Formerly MySQL AB (exit: Sun Microsystems)
• Past lives include Fedora Project (FESCO), OpenOffice.org
4. MariaDB Introduction
• Drop-in compatible MySQL replacement
Available!
TODAY!
• Community developed, Foundation & Corporation backed, feature enhanced,
backwards compatible, GPLv2 licensed
• Steady stream of releases in 4 years 9 months: 5.1, 5.2, 5.3, 5.5, 10.0,
MariaDB Galera Cluster 5.5, MariaDB Galera Cluster 10.0, MariaDB with
TokuDB 5.5
• MySQL Enterprise features made open: PAM authentication plugin,
threadpool, audit plugin
• Default in Red Hat Enterprise Linux, Fedora, openSUSE, SUSE Enterprise, etc.
5. InfiniDB
• MariaDB Corporation has inherited assets of InfiniDB Corporation
• Full support for InfiniDB, continued consulting & engineering work
• Looking into integrating it with MariaDB Enterprise
6.
7.
8. MariaDB 5.2+
Virtual Columns
• A column in a table that has its value automatically calculated either
with a pre-calculated/deterministic expression or values of other
fields in the table
• VIRTUAL - computed on the fly when data is queried (like a VIEW)
• PERSISTENT - computed when data is inserted and stored in a table
9. MariaDB 10.0+
PCRE Regular Expressions
• Powerful REGEXP/RLIKE operator
• New operators:
• REGEXP_REPLACE(sub,pattern,replace)
• REGEXP_INSTR(sub,pattern)
• REGEXP_SUBSTR(sub,pattern)
• Works with multi-byte character sets that MariaDB supports, including
East-Asian sets
10. GIS
• MariaDB implements a subset of SQL with Geometry Types
• No longer just minimum bounding rectangles (MBR) - shapes
considered
CREATE TABLE geom (g GEOMETRY NOT NULL, SPATIAL
INDEX(g)) ENGINE=MyISAM;
• ST_ prefix - as per OpenGIS requirements
MariaDB 5.3+
11. MariaDB 5.3+
Dynamic columns
• Allows you to create virtual columns with dynamic content for each row in
table. Store different attributes for each item (like a web store).
• Basically a BLOB with handling functions: COLUMN_CREATE,
COLUMN_ADD, COLUMN_GET, COLUMN_DELETE, COLUMN_EXISTS,
COLUMN_LIST, COLUMN_CHECK, COLUMN_JSON
• In MariaDB 10.0: name support (instead of referring to columns by numbers,
name it), convert all dynamic column content to JSON array, interface with
Cassandra
INSERT INTO tbl SET
dyncol_blob=COLUMN_CREATE("column_name", "value");
13. What is SphinxSE?
• SphinxSE is just the storage engine that still depends on the Sphinx
daemon
• It doesn’t store any data itself
• Its just a built-in client to allow MariaDB to talk to Sphinx searchd,
run queries, obtain results
• Indexing, searching is performed on Sphinx
14. Sphinx search table
CREATE TABLE t1
(
id INTEGER UNSIGNED NOT NULL,
weight INTEGER NOT NULL,
query VARCHAR(3072) NOT NULL,
group_id INTEGER,
INDEX(query)
) ENGINE=SPHINX CONNECTION="sphinx://localhost:9312/test";
!
SELECT * FROM t1 WHERE query='test it;mode=any';
15. MariaDB 10.0+
Query Cassandra
• Data is mapped: rowkey, static columns, dynamic columns
• super columns aren’t supported
• No 1-1 direct map for data types
• Write to Cassandra from SQL (SELECT, INSERT, UPDATE, DELETE)
16. CONNECT
• Target: ETL for BI or analytics
• Import data from CSV, XML, ODBC, MS Access, etc.
• WHERE conditions pushed to ODBC source
• DROP TABLE just removes the stored definition, not data itself
• “Virtual” tables cannot be indexed
MariaDB 10.0+
17. SPIDER
MariaDB 10.0+
• Horizontal partitioning, built on top of PARTITIONs
• Associates a partition with a remote server
• Transparent to user, easy to expand
• Has index condition pushdown support enabled
18. TokuDB
• Opensource - separate MariaDB 5.5+TokuDB/integrated in 10.0.5
• Improved insert (10-20x faster) & query speed, compression (up to
90% space reduction), replication performance and online schema
flexibility
• Uses Fractal Tree Indexes instead of B-Tree
• Tests & builds of TokuDB on multiple platforms
19. Threadpool
• Modified from 5.1 (libevent based), great for CPU bound
loads and short running queries
• Windows (threadpool), Linux (epoll), Solaris (event ports),
FreeBSD/OSX (kevents)
• No minimization of concurrent transactions with dynamic
pool size
• thread_handling=pool-of-threads
• https://mariadb.com/kb/en/thread-pool-in-mariadb-55/
MariaDB 5.5+
20. PAM Authentication
• Authentication using /etc/shadow
• Authentication using LDAP, SSH pass phrases, password expiration,
username mapping, logging every login attempt, etc.
• INSTALL PLUGIN pam SONAME ‘auth_pam.so’;
• CREATE USER foo@host IDENTIFIED via pam
• Remember to configure PAM (/etc/pam.d or /etc/pam.conf)
• http://www.mysqlperformanceblog.com/2013/02/24/using-two-factor-authentication-
with-percona-server/
MariaDB 5.2+
21. MariaDB 5.5+
SQL Error Logging Plugin
• Log errors sent to clients in a log file that can be analysed later. Log
file can be rotated (recommended)
• a MYSQL_AUDIT_PLUGIN
install plugin SQL_ERROR_LOG soname 'sql_errlog.so';
22. Audit Plugin
• Log server activity - who connects to the server, what queries run,
what tables touched - rotating log file or syslogd
• a MYSQL_AUDIT_PLUGIN
INSTALL PLUGIN server_audit SONAME
‘server_audit.so’;
MariaDB 10.0+
23. Group commit in MariaDB 10
• Remove commit in slow part of InnoDB commit (stage 4 - third
fsync())
• Reduce cost of crash-safe binlog
• A binlog checkpoint is a point in the binlog where no crash
recovery is needed before it. In InnoDB you wait for flush + fsync
its redo log for commit
24. crash-safe binlog
• MariaDB 5.5 checkpoints after every commit —> quite expensive!
• 5.5/5.6 stalls commits around binlog rotate, waiting for all prepared
transactions to commit (since crash recovery can only scan latest
binlog file)
25. crash-safe binlog 10.0
• 10.0 makes binlog checkpoints asynchronous
• A binlog can have no checkpoints at all
• Ability to scan multiple binlogs during crash recovery
• Remove stalls around binlog rotates
29. Extensions to the SE API
• prepare() - write prepared trx in parallel w/group commit
• prepare_ordered() - called serially, in commit order
• commit_ordered() - called serially, in commit order; fast commit
to memory
• commit() - commit to disk in parallel, w/group commit
30. group commit in 10.1
• Tricky locking issues hard to change without getting deadlocks sometimes
• mysql#68251, mysql#68569
• New code? Binlog rotate in background thread (further reducing stalls). Split
transactions across binlogs, so big transactions do not lead to big binlog files
• Enhanced semi-sync replication (wait for slave before commit on the master
rather than after commit)
31. START TRANSACTION WITH CONSISTENT SNAPSHOT
• START TRANSACTION WITH CONSISTENT SNAPSHOT
• mysqldump —single-transaction —master-data - full non-blocking
backup
• No need for FLUSH TABLES WITH READ LOCK
• No stalls for long running queries
• Consistent snapshot sees all of a transaction, or nothing, also for multi-engine
transactions.
32. Multi-source replication
• Multi-source replication - (real-time) analytics, shard provisioning,
backups, etc.
• @@default_master_connection contains current connection
name (used if connection name is not given)
• All master/slave commands take a connection name now (like
CHANGE MASTER “connection_name”, SHOW SLAVE
“connection_name” STATUS, etc.)
33. Global Transaction ID (GTID)
• Supports multi-source replication
• GTID can be enabled or disabled independently and online for masters or slaves
• Slaves using GTID do not have to have binary logging enabled.
• Supports multiple replication domains (independent binlog streams)
• Queries in different domains can be run in parallel on the slave.
• Simpler, more robust design compared to MySQL 5.6
34. Automatic binlog position for master failover
• On Server2: CHANGE MASTER TO master_host=’server2’, master_use_gtid=1;
35. Why different GTID compared to 5.6?
• MySQL 5.6 GTID does not support multi-source replication
• Supports —log-slave-updates=0 for efficiency
• Enabled by default, with self-healing capabilities
36. Binlog (size matters!)
• Example query: INSERT INTO t1 VALUES (10, “foo”);
• MySQL 5.6… 265 bytes
• MariaDB 10.0… 161 bytes
• Do you want a 60% larger binlog size?
37. Crash-safe slave (w/InnoDB DML)
• Replace non-transactional file relay_log.info with transactional
mysql.rpl_slave_state
• Changes to rpl_slave_state are transactionally recovered after
crash along with user data.
38. Replication domains
• Keep central concept that replication is just applying events in-order from a
serial binlog stream.
• Allow multi-source replication with multiple active masters
• Let’s the DBA configure multiple independent binlog streams (one per active
master: mysqld --git-domain-id=#)
• Events within one stream are ordered the same across entire replication topology
• Events between different streams can be in different order on different servers
• Binlog position is one ID per replication domain
39.
40. Parallel replication
• Multi-source replication from different masters executed in parallel
• Queries from different domains are executed in parallel
• Queries that are run in parallel on the master are run in parallel on
the slave (based on group commit).
• Transactions modifying the same table can be updated in parallel
on the slave!
• Supports both statement based and row based replication.
41. MariaDB 5.3+
New KILL syntax
• HARD | SOFT & USER USERNAME are MariaDB-specific (5.3.2)
• KILL QUERY ID query_id (10.0.5) - kill by query id, rather than thread id
• SOFT ensures things that may leave a table in an inconsistent state
aren’t interrupted (like REPAIR or INDEX creation for MyISAM or Aria)
KILL [HARD | SOFT] [CONNECTION | QUERY] [thread_id |
USER user_name]
42. Statistics
MariaDB 5.2+
• Understand server activity better to understand database loads
• SET GLOBAL userstat=1;
• SHOW CLIENT_STATISTICS; SHOW USER_STATISTICS;
• # of connections, CPU usage, bytes received/sent, row statistics
• SHOW INDEX_STATISTICS; SHOW TABLE_STATISTICS;
• # rows read, changed, indexes
MariaDB 10.0+
• INFORMATION_SCHEMA.PROCESSLIST has MEMORY_USAGE, EXAMINED_ROWS
(similar with SHOW STATUS output)
43. MariaDB 10.0+
EXPLAIN enhanced
• Explain analyser: https://mariadb.org/explain_analyzer/analyze/
• SHOW EXPLAIN for <thread_id>
• EXPLAIN output in the slow query log
• EXPLAIN not just for SELECT but INSERT/UPDATE/DELETE
44. Roles
MariaDB 10.0+
• Bundles users together, with similar privileges - follows the SQL
standard
CREATE ROLE audit_bean_counters;
GRANT SELECT ON accounts.* to audit_bean_counters;
GRANT audit_bean_counters to ceo;
45. FusionIO
MariaDB 10.0+
• If you have nvmfs (formerly DirectFS), you can disable the
innodb_doublewrite buffer
• page level compression in background threads (reduces I/O, saves
the life of your device)
46. What else is there
• Engines: Aria, OQGRAPH, FederatedX
• Segmented MyISAM keycaches
• Progress reporting for ALTER/LOAD DATA INFILE
• Table Elimination
• HandlerSocket
• SHUTDOWN functionality
• And a lot more….
47. Connectors
• The MariaDB project provides LGPL connectors (client libraries) for:
• C
• Java
• ODBC
• Embedding a connector? Makes sense to use these LGPL licensed
ones…
48. MariaDB Galera Cluster
• MariaDB Galera Cluster is made for today’s cloud based
environments. It is fully read-write scalable, comes with synchronous
replication, allows multi-master topologies, and guarantees no lag or
lost transactions.
• 5.5 or 10.0 based
• We’ve seen migrations from Oracle RAC to MariaDB Galera Cluster
— look for a case study by Greetz as an example
49. MariaDB MaxScale
• “Pluggable router” that offers connection & statement based routing
for load balancing, query rewriting, filtering, etc. (full regex support)
• Simplifies complex replication schemes for massive scale, high
availability, manages performance with logging, safeguards data
through firewall filtering, connects diverse clients and databases
with multiple protocols, query transformations.
• MaxScale as binlog server @ Booking - to replace intermediate
masters (downloads binlog from master, saves to disk, serves to
slave as if served from master)
50. MariaDB MaxScale
• Extensible thru filters (which you can write)
• Current admin interface: CLI-based
• Release Candidate now, expected to go GA
by January 2015
• Binlog Streaming Server — register for the
special build!
Routing
Filter/Log
Client Protocol
Message Core
&
State
Machine
Server Protocol
.log
51. Trusted by many
• Google
• Wikipedia
• Tumblr
• SpamExperts
• Limelight Networks
• KakaoTalk
• Paybox Services
52. Quality matters
• security@mariadb.org is now commonly on CC when it comes to
MySQL bugs
• Selective (not blind) merging
• Tests (mysql-test/)
• MySQL 5.5: 2,466
• MySQL 5.6: 3,603
• MariaDB 10.0: 3,812
53. Well supported
• Everyone whom supports MySQL tends to support MariaDB
• MariaDB Corporation, Percona, FromDual, etc.
• Some PaaS services (like Jelastic) support it
• Many web hosting companies
• Rackspace Cloud
• Oracle Enterprise Linux 7
• All GA releases supported for 5 years from release
54. Going forward
• column level & block level encryption (Eperi, Google - InnoDB, Aria)
• Kerberos authentication plugin
• Full 5.6 compatibility + 5.7 features (so syntax will match for
duplicated functionality)
• Integrate mroonga
• More work on POWER8 (with IBM)
56. Resources
• We moved to github! https://github.com/MariaDB/server
• We’re still on launchpad for older branches: https://launchpad.net/maria
• maria-discuss@lists.launchpad.net
• maria-developers@lists.launchpad.net
• #maria on freenode
• facebook.com/MariaDB.dbms
• @mariadb / +MariaDB
57. Books!
1. MariaDB Crash Course, Ben Forta (September 2011)
2. Getting Started with MariaDB, Daniel Bartholomew (October 2013)
3. MariaDB Cookbook, Daniel Bartholomew (March 2014)
4. Building a Web Application with PHP & MariaDB: A Reference Guide, Sai Srinivas
Sriparasa (June 2014)
5. MariaDB: Beginners Guide, Rodrigo Ribeiro (August 2014)
6. Mastering MariaDB, Federico Razzioli (September 2014)
7. MariaDB High Performance, Pierre Mavro (September 2014)