tel. 883 59 39 89 l mail: kontakt@najlepszerolety.pl l CZYNNE PN-PT 9-17

NajlepszeRolety.PL - rolety - żaluzje - moskitiery

TEL. 883 59 39 89 / UL. MONIUSZKI 54 - MYSŁOWICE

Wieści RSS

Planet MySQL

Planet MySQL - https://planet.mysql.com
  • How to Deal with Triggers in Your Mysql Database When Using Tungsten Replicator
    Overview Over the past few days we have been working with a number of customers on the best way to handle Triggers within their MySQL environment when combined with Tungsten Replicator. We looked at situations where Tungsten Replicator was either part of a Tungsten Clustering installation or a standalone replication pipeline. This blog dives head first into the minefield of Triggers and Replication. Summary and Recommendations The conclusion was that there is no easy one-answer-fits-all solution – It really depends on the complexity of your environment and the amount of flexibility you have in being able to adjust. Our top level summary and recommendations are as follows: If using Tungsten Clustering and you need to use Triggers: Switch to ROW Based binary Logging, and either Recode triggers to only fire when read_only=ON or based on user(), or Use the replicate.ignore filter If using Tungsten Replicator only, and you need to use Triggers: If source instance is running in ROW based binary logging mode: Drop triggers on target, or Recode triggers to only fire when read_only=ON or based on user(), or Use the replicate.ignore filter If source instance is running in MIXED based binary logging mode: Use the replicate.ignore filter if possible, or Switch to ROW Based logging Read on for more information on why we made these recommendations… Deep Analysis Running with ROW Based Binary Logging Let’s create two simple tables, one for employees and one as an audit table. We’ll then create a trigger that will fire after an INSERT. Each trigger will write into the audit table, the id and employee name from the employee table, along with the action ‘INSERT’ and a timestamp. CREATE TABLE employees ( employee_id INT(6) PRIMARY KEY , first_name VARCHAR(20) , last_name VARCHAR(25) NOT NULL , hire_date DATE NOT NULL ) ; CREATE TABLE employee_audit (id INT(6) AUTO_INCREMENT PRIMARY KEY, employee_id INT(6), employeename VARCHAR(50), recstate CHAR(6), auditdate DATE); CREATE TRIGGER trgInsEmployees AFTER INSERT ON employees FOR EACH ROW BEGIN INSERT INTO employee_audit (employee_id, employeename,recstate,auditdate) VALUES (NEW.employee_id,NEW.first_name,'INSERT',NOW()); END; Our source database is in ROW based logging, and the triggers exist and are active on the slave. Let’s insert a record into the employees table INSERT INTO employees VALUES (100, 'Steven', 'King', '2019-06-01'); All good on our master, but nothing on our slave and our replicator is in an error state. pendingError : Event application failed: seqno=50 fragno=0 message=java.sql.SQLIntegrityConstraintViolationException: Duplicate entry '1' for key 'PRIMARY' pendingErrorCode : NONE pendingErrorEventId : mysql-bin.000004:0000000000017892;-1 pendingErrorSeqno : 50 pendingExceptionMessage: java.sql.SQLIntegrityConstraintViolationException: Duplicate entry '1' for key 'PRIMARY' Failing statement : INSERT INTO `sample`.`employee_audit` ( `id` , `employee_id` , `employeename` , `recstate` , `auditdate` ) VALUES ( ? , ? , UNHEX( ? ) , UNHEX( ? ) , ? ) If we look at the THL we see we have extracted the INSERT on the employee table, but we have also extracted the INSERT on the audit table and this has come through as a complete transaction. When the INSERT on the employee table happens by the replicator, the trigger is firing and doing the INSERT on the audit table for us, but then the replicator is also trying to INSERT the same row. SEQ# = 78 / FRAG# = 0 (last frag) - FILE = thl.data.0000000001 - TIME = 2019-12-05 11:51:48.0 - EPOCH# = 0 - EVENTID = mysql-bin.000004:0000000000027015;-1 - SOURCEID = mysql01 - METADATA = [mysql_server_id=101;dbms_type=mysql;tz_aware=true;service=alpha;shard=sample] - TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent - OPTIONS = [foreign_key_checks = 1, unique_checks = 1, time_zone = '+00:00', ##charset = ISO-8859-1] - SQL(0) = - ACTION = INSERT - SCHEMA = sample - TABLE = employees - ROW# = 0 - COL(1: ) = 100 - COL(2: ) = Steven - COL(3: ) = King - COL(4: ) = 2019-06-01 - OPTIONS = [foreign_key_checks = 1, unique_checks = 1, time_zone = '+00:00', ##charset = ISO-8859-1] - SQL(1) = - ACTION = INSERT - SCHEMA = sample - TABLE = employee_audit - ROW# = 0 - COL(1: ) = 1 - COL(2: ) = 100 - COL(3: ) = Steven King - COL(4: ) = INSERT - COL(5: ) = 2019-12-05 If we skip the error on the replicator then we rollback and loose the initial insert on the employee table too so now we have data discrepancy. If we DROP the trigger on the slave and bring the replicator online to retry, then everything goes through and tables match. If for some reason we have no primary key on our audit table though, we wouldn’t have seen any error, and you would be fooled into thinking everything was ok, but in fact what you would end up with is doubling up of data, or depending upon the complexity of your trigger, data corruption of an even greater scale! What if you need to have triggers on the slaves because your slave could become a MASTER at some point, or perhaps you have consistency checks and you need to ensure the entire structure matches? In this scenario the safest option is to add some simple checks in your Triggers. Typically, your slave databases should be in read only mode, therefore a simple test could check and only execute the statements within the trigger if the database is read/write. However, this could be flawed if you are only replicating into your target, a subset of data for example, and in fact you need the target to be read/write for applications that perhaps work with other schemas. In this instance you could do a check in the trigger for the user that caused the trigger to fire. You could stop the trigger firing if the value of user() is the account you configured the replicator to use, ie tungsten, then you know the trigger will only fire when the initial call is a genuine insert, not as a result of the replicator applying the data, this could look something like the following: CREATE TRIGGER trgInsEmployees AFTER INSERT ON employees FOR EACH ROW BEGIN IF user() != 'tungsten@db1' THEN INSERT INTO employee_audit (employee_id, employeename,recstate,auditdate) VALUES (NEW.employee_id,CONCAT(NEW.first_name,' ',NEW.last_name),'INSERT','2019-12-05'); END IF; END; However, what happens if you have hundreds of tables or very complex triggers? – this would be a lot of coding. Sadly, there is no simple answer, the three options above need to be assessed and the right course of action taken to suit your environment! Running with MIXED Based Binary Logging Let’s now look at what happens when we are in MIXED logging mode Using the same structure and same trigger code, we see the same result because as we’re in MIXED mode, MySQL has decided to log the event as a ROW event, so the same situation arises as before. It logged the entire transaction as ROW because the trigger code was non-deterministic due to the use of the now() function and also because the table has an auto_increment column. MySQL’s decision making on whether to switch between ROW or STATEMENT when in MIXED mode has a number of conditions, more detail on those rules at the link below, but specifically I want to call out this line: “When one or more tables with AUTO_INCREMENT columns are updated and a trigger or stored function is invoked..” Taken from https://dev.mysql.com/doc/refman/8.0/en/binary-log-mixed.html So let’s change our table to force MySQL to NOT switch to ROW based logging by removing the AUTO_INCREMENT and removing the now() from our code: CREATE TABLE employee_audit (id INT(6) PRIMARY KEY, employee_id INT(6), employeename VARCHAR(50), recstate CHAR(6), auditdate DATE); CREATE TRIGGER trgInsEmployees AFTER INSERT ON employees FOR EACH ROW BEGIN INSERT INTO employee_audit (id,employee_id, employeename,recstate,auditdate) VALUES (NEW.employee_id, NEW.employee_id,NEW.first_name,'INSERT','2019-12-05'); END; Now let’s run our insert and see what happens… This time, the initial INSERT is logged as a statement, and in STATEMENT mode, MySQL does NOT log the data changes as a result of a trigger firing, therefore we don’t replicate them either. The THL shows this: SEQ# = 110 / FRAG# = 0 (last frag) - FILE = thl.data.0000000001 - TIME = 2019-12-05 12:25:52.0 - EPOCH# = 79 - EVENTID = mysql-bin.000005:0000000000010296;17 - SOURCEID = mysql01 - METADATA = [mysql_server_id=101;dbms_type=mysql;tz_aware=true;service=alpha;shard=sample] - TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent - OPTIONS = [##charset = UTF-8, autocommit = 1, sql_auto_is_null = 0, foreign_key_checks = 1, unique_checks = 1, sql_mode = 'NO_ENGINE_SUBSTITUTION,NO_AUTO_CREATE_USER,ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_ZERO_DATE,NO_ZERO_IN_DATE', character_set_client = 33, collation_connection = 33, collation_server = 8] - SCHEMA = sample - SQL(0) = INSERT INTO employees VALUES ( 100 , 'Steven' , 'King' , '2019-06-01' ) In this situation you need to have the trigger enabled on the target otherwise you will end up missing data!! If you have a mix of triggers that will cause MySQL to behave differently each time, MIXED mode could cause a lot of confusion and data drift/corruption if not handled with care. Using the check against user() or read_only in this case won’t help either because you may need the trigger to fire as a result of the replicators preceding insert, sadly there is no way to know if the trigger is fired as a result of a ROW or STATEMENT based action. When in MIXED mode there really is no safe option unless you are 100% confident that all of your triggers would be non-deterministic and result in a ROW based binlog entry. Can the replicator help? Sadly not! Because MySQL doesn’t flag in the binlog whether the DML is the result of a trigger firing, and doesn’t log it at all in some situations, we have no way of making a judgement on what to do. However, one final option that I haven’t covered, could be to use the replicator filters. This would only help if the tables affected by your triggers are solely maintained as a result of the triggers code. In these cases you could consider excluding them from replication by using the replicate.ignore filter This would ensure no changes associated with these tables are applied and you would rely 100% on the triggers in the target database to maintain them, however, for non-deterministic statements you could still end up with data differences between these tables in your source and target!! The Wrap-Up In this blog post we discussed the correct way to Triggers with Tungsten Replication between MySQL databases. Tungsten Clustering is the most flexible, performant global database layer available today – use it underlying your SaaS offering as a strong base upon which to grow your worldwide business! For more information, please visit https://www.continuent.com/solutions Want to learn more or run a POC? Contact us.

  • NDB Parallel Query, part 2
    In part 1 we showed how NDB can parallelise a simple query with only a singletable involved. In this blog we will build on this and show how NDB can onlyparallelise some parts of two-way join query. As example we will use Q12 inDBT3:SELECT        l_shipmode,        SUM(CASE                WHEN o_orderpriority = '1-URGENT'                        OR o_orderpriority = '2-HIGH'                        THEN 1                ELSE 0        END) AS high_line_count,        SUM(CASE                WHEN o_orderpriority <> '1-URGENT'                        AND o_orderpriority <> '2-HIGH'                        THEN 1                ELSE 0        END) AS low_line_countFROM        orders,        lineitemWHERE        o_orderkey = l_orderkey        AND l_shipmode IN ('MAIL', 'SHIP')        AND l_commitdate < l_receiptdate        AND l_shipdate < l_commitdate        AND l_receiptdate >= '1994-01-01'        AND l_receiptdate < DATE_ADD( '1994-01-01', INTERVAL '1' year)GROUP BY        l_shipmodeORDER BY        l_shipmode;This query when seen through the relational operators will first pass througha SELECT operator and a PROJECT operator in the data nodes. The JOIN operatorwill be executed on the lineitem and orders tables and the result of the JOIN operatorwill be sent to the MySQL Server. The MySQL Server will thereafter handle theGROUP BY operator with its aggregation function and also the final SORT operator.Thus we can parallelise the filtering, projection and join, but the GROUP BYaggregation and sorting will be implemented in the normal MySQL execution ofGROUP BY, SUM and sorting.This query will be execute by first performing a range scan on the lineitemtable and evaluating the condition that limits the amount of rows to send tothe join with the orders table. The join is performed on the primary key ofthe orders table. So the access in the orders table is a primary key lookupfor each row that comes from the range scan on the lineitem table.In the MySQL implementation of this join one will fetch one row from thelineitem table and for each such row it will perform a primary key lookupin the orders table. Given that this means that we can only handle oneprimary key lookup at a time unless we do something in the NDB storageengine. The execution of this query without pushdown join would makeit possible to run the scans towards the lineitem table in parallel. Theprimary key lookup on the orders table would however execute seriallyand only fetching one row at a time. This will increase the query time inthis case with a factor of around 5x. So by pushing the join down intothe NDB data nodes we can make sure that the primary key lookups on theorders table are parallelised as well.To handle this the MySQL Server has the ability to push an entire joinexecution down to the storage engine. We will describe this interface in moredetail in a later blog part.To handle this query in NDB we have implemented a special scan protocol thatenables performing complex join operations. The scan will be presented witha parameter part for each table in the join operation that will describe thedependencies between the table and the conditions to be pushed to each table.This is implemented in the TC threads in the NDB data node. The TC threads inthis case acts as parallel JOIN operators. The join is parallelised on thefirst table in the join, in this case the lineitem table. For each node inthe cluster a JOIN operator will be created that takes care of scanning allpartitions that have its primary partition in the node. This means that thescan of the first table and the join operator is always located on the same node.The primary key lookup is sent to the node where the data resides, in a clusterwith 2 replicas and 2 nodes and the table uses READ BACKUP, we will always findthe row locally. With larger clusters the likelihood that this lookup is sentover the network increases.Compared to a single threaded storage engine this query scales almost 30x timesusing 2 nodes with 8 LDM threads each. NDBs implementation is as mentioned inthe previous blog very efficiently implemented, so the speedup gets a benefitfrom this.This query is more efficiently implemented in MySQL Cluster 8.0.18 since weimplemented support for comparing two columns, both from the same table andfrom different tables provided they have the same data type. This improvedperformance of this query by 2x. Previous to this the NDB interpreter couldhandle comparisons of the type col_name COMPARATOR constant, e.g.l_receiptdate >= '1994-01-01'.Query Execution In the figure below we show the execution flow for this query in NDB. As describedabove we have a module called DBSPJ in the TC threads that handle the JOINprocessing. We have shown in the figure below the flow for the scan of the lineitemtable in blue arrows. The primary key lookups have been shown with red arrows.In the figure below we have assumed that we're not using READ BACKUP. We willdescribe in more detail the impact of READ BACKUP in a later part of this blog serie. Query AnalysisThe query will read the lineitem in parallel using a range scan. This scan willevaluate 909.844 rows when using scale factor 1 in TPC-H. Of those rows there willbe 30.988 rows that will evaluate to true. Each of those 30.988 rows will be sent to theNDB API but will also be reported to the DBSPJ module to issue parallel key lookupstowards the orders table.As a matter of a fact this query will actually execute faster than Q6 although it doesmore work compared to the previous query we analysed (Q6 in TPC-H). Most of thework is done in the lineitem table, both Q6 and Q12 does almost the same amount ofwork in the range scan on the lineitem. However since there are fewer records to reportback to the MySQL Server this means that parallelism is improved due to the batchhandling in NDB.Scalability impactThis query will scale very well with more partitions of the lineitem tableand the orders table. As the cluster grows some scalability impact willcome from a higher cost of the primary key lookups that have to be sent onthe network to other nodes.Next PartIn part 3 we will discuss how the MySQL Server and the NDB storage engine workstogether to define the query parts pushed down to NDB.

  • NDB Parallel Query, part 1
    I will describe how NDB handles complex SQL queries in a number ofblogs. NDB has the ability to parallelise parts of join processing.Ensuring that your queries makes best possible use of theseparallelisation features enables appplications to boost theirperformance significantly. It will also be a good base to explainany improvements we add to the query processing in NDB Cluster.NDB was designed from the beginning for extremely efficient key lookupsand for extreme availability (less than 30 seconds of downtime per yearincluding time for software change, meta data changes and crashes).Originally the model was single-threaded and optimised for 1-2 CPUs.The execution model uses an architecture where messages are sentbetween modules. This made it very straightforward to extend thearchitecture to support multi-threaded execution when CPUs withmany cores became prevalent. The first multi-threaded version of NDBwas version 7.0. This supported up to 7 threads working parallelplus a large number of threads handling interaction with the filesystem.With the introduction of 7.0 the scans of a table, either using anrange scan on an index or scanning the entire table was automatiallyparallelised. So NDB have supported a limited form of parallel queryalready since the release of 7.0 (around 2011 I think).Now let's use an example query, Q6 from DBT3 that mimics TPC-H.SELECT    SUM(l_extendedprice * l_discount) AS revenueFROM    lineitemWHERE    l_shipdate >= '1994-01-01'    AND l_shipdate < DATE_ADD( '1994-01-01' , INTERVAL '1' year)    AND l_discount BETWEEN 0.06 - 0.01 AND 0.06 + 0.01    AND l_quantity < 24;The execution of this will use a range scan on the index on l_shipdate.This range is a perfectly normal range scan in NDB. Since range scansare parallelised, this query will execute using 1 CPU for each partitionof the table. Assuming that we set up a cluster with default setupand with 8 LDM threads the table will be partitioned into 16 partitions.Each of those partitions will have a different CPU for the primarypartition. This means that the range scans will execute on 16 CPUs inparallel.LDM (Local Data Manager) is the name of the threads in the NDB datanodes that manages the actual data in NDB. It contains a distributedhash index for the primary keys and unique keys, an ordered indeximplemented as a T-tree, a query handler that controls execution oflookups and scans and checkpointing and also handles the REDO log.Finally the LDM thread contains the row storage that has 4 parts.Fixed size parts of the row in memory, variable sized parts of therow in memory, dynamic parts of the row (absence of a column heremeans that it is NULL, so this provides the ability to ADD a columnas an online operation) and finally a fixed size part that is storedon disk using a page cache. The row storage also contains aninterpreter that can evaluate conditions, perform simple operationslike add to support efficient auto increment.Now the first implementation of the NDB storage engine was implementedsuch that all condition evaluation was done in the MySQL Server. Thismeant that although we could scan the table in parallel, we still hada single thread to evaluate the conditions. This meant that to handlethis query efficiently a condition pushdown is required. Conditionpushdown was added to the MySQL storage engine API a fairly long timeago as part of the NDB development and can also benefit any otherstorage engine that can handle condition evaluation.So the above contains 3 parts that can be parallelised individually.Scanning the data, evaluating the condition and finally performingthe sum on the rows that match the condition.NDB currently parallelises the scan part and the condition evaluationpart. The sum is handled by the MySQL Server. In this case this thefiltering factor is high, so this means that the sum part is not abottleneck in this query. The bottleneck in this query is scanningthe data and evaluating the condition.In the terminology of relational algebra this means that NDB supportsa parallelised SELECT operator for some filters. NDB also supports aparallel PROJECT operator. NDB doesn't yet support a parallelAGGREGATE function.The bottleneck in this query is how fast one can scan the data andevaluate the condition. In version 7.6 we made a substantialoptimisation of this part where we managed to improve a simplequery by 240% through low-level optimisations of the code.With this optimisation NDB can handle more than 2 million rowsper second per CPU with a very simple condition to evaluate. Thisquery greatly benefits from this greater efficiency. Executing thisquery with scale factor 10 (60M rows in the lineitem table) takesabout 1.5 seconds with the configuration above where 16 CPUsconcurrently perform the scan and condition evaluation.A single-threaded storage engine is around 20x slower. With moreCPUs available in the LDM threads the parallelisation will be evenhigher.Obviously there are other DBMSs that are focused on analyticalqueries that can handle this query even faster, NDB is focusedon online applications with high write scalability and the highestavailability. But we are working to also make query execution ofcomplex SQL much faster that online applications can analyzedata in real-time.Query Execution In the figure below we describe the execution flow for this query. As usual the query starts with parsing (unless it is a prepared statement) and after that the query is optimised. This query is executed as a single range scan against the lineitem table. Scans are controlled by a TC thread that ensures that all the fragments of the table are scanned. It is possible to control the parallelism of the query through the NDB API. In most of the cases the parallelism will be full parallelism. Each thread has a real-time scheduler and the scan in the LDM threads will be split up into multiple executions that will be interleaved with execution by other queries executing in parallel. This means that in an idle system this query will be able to execute at full speed. However even if there is lots of other queries going on in parallel the query will execute almost as fast as long as the CPUs are not overloaded. In the figure below we also show that control of the scan goes through the TC thread, but the result row is sent directly from the LDM thread to the NDB API. In the MySQL Server the NDB storage engine gets the row from the NDB API and returns it to the MySQL Server for the sum function and result processing. Query AnalysisThe query reads the lineitem table that has about 6M rows in scalefactor 1. It reads them using an index on l_shipdate. The rangeconsists of 909.455 rows to analyse and of those 114.160 rows areproduced to calculate results of the sum.  In the above configurationit takes about 0.15 seconds for NDB to execute the query. There aresome limitations to get full use of all CPUs involved even in thisquery that is related to batch handling. I will describe this in alater blog.Scalability impactThis query is only positively impacted by any type of scaling. Themore fragments the lineitem table is partitioned into, the moreparallelism the query will use. So the only limitation to scalingis when the sum part starts to become the bottleneck.Next partIn the next part we will discuss how NDB can parallelise a verysimple 2-way join from the DBT3 benchmark. This is Q12 fromTPC-H that looks like this.SELECT        l_shipmode,        SUM(CASE                WHEN o_orderpriority = '1-URGENT'                        OR o_orderpriority = '2-HIGH'                        THEN 1                ELSE 0        END) AS high_line_count,        SUM(CASE                WHEN o_orderpriority <> '1-URGENT'                        AND o_orderpriority <> '2-HIGH'                        THEN 1                ELSE 0        END) AS low_line_countFROM        orders,        lineitemWHERE        o_orderkey = l_orderkey        AND l_shipmode IN ('MAIL', 'SHIP')        AND l_commitdate < l_receiptdate        AND l_shipdate < l_commitdate        AND l_receiptdate >= '1994-01-01'        AND l_receiptdate < DATE_ADD( '1994-01-01', INTERVAL '1' year)GROUP BY        l_shipmodeORDER BY        l_shipmode;This query introduces 3 additional relational algebra operators,a JOIN operator, a GROUP BY operator and a SORT operator.

  • How to Save on AWS RDS MySQL Costs by Instance Right-sizing
    Right-sizing database instances is the process of adjusting your database instances’ size to match the workload generated by the application. In most cases, the incentive to right-size your database instances will be to lower the cloud infrastructure’s costs, ideally without compromising on performance. In this post we’ll guide you through how to analyze the instance’s used resources, in an effort to identify opportunities to save on your RDS costs. How to identify memory related down-sizing opportunities MySQL has a lot of “moving parts” which may contribute to the amount of memory it needs to operate efficiently. Just to list a few of the most impactful factors: fixed-size buffers (query cache, innodb buffer pool size), the database’s workload (query connections, query buffers), replication internals (replication connections, binary log caches) and more. When looking for down-sizing opportunities, we’re searching for instances with too much memory, which isn’t really used by the database for the current workload. So how can we tell whether the database really needs the amount of memory that was allocated to it? Looking at the memory usage of MySQL’s process at the OS level isn’t a good indicator, as large portions of the memory (such as the innodb buffer pool) are pre-allocated by the process, but not necessarily used. A better indicator can be found by analyzing the usage patterns of what is usually the largest memory consumer - the innodb buffer pool. The innodb buffer pool is a fixed-size memory area where the database’s engine caches the table’s data and the indexes data. Ideally, the innodb buffer pool should have enough space to hold the data and indexes which are commonly accessed by the database, so that all queries will be served using the memory, with limited amount of disk access required. So if your database needs to access 5GB of data and indexes on a regular basis, it doesn’t make a lot of sense to preallocate 100GB to the innodb buffer pool. To analyze the usage patterns of the innodb buffer pool, you can use MySQL’s built in command `show engine innodb status`. In the command’s output, you should look at: Free buffers - The total size (in pages) of the buffer pool free list. What this actually means is that if your system is fully warmed up and this number is rather high, it probably means you’re allocating too much memory for your innodb buffer pool, as some of it isn’t really utilized. Tracking this indicator over time, in both regular hours and during workload peaks will show you the full picture. High value of free buffers over time may be an indicator for a memory down-sizing opportunity. Pages evicted without access - The per second average of the pages evicted without being accessed from the buffer pool. If this number is larger than 0, it means that there are times where data is loaded to the buffer pool and pulled out from it without it even being actually used. Having a number larger than zero for this indicator over time may mean that you didn’t allocate enough memory to the innodb buffer pool, which may in turn indicate that you shouldn’t down-size this instance, as it may even need more memory to perform optimally. This is an example of the output shown by the innodb status command: ---------------------- BUFFER POOL AND MEMORY ---------------------- Total large memory allocated 6596591616 Dictionary memory allocated 415137 Buffer pool size 393216Free buffers 378927 Database pages 14285 Old database pages 5428 …. 0.00 reads/s, 0.00 creates/s, 0.00 writes/s No buffer pool page gets since the last printout Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s …. This sample demonstrates an output that may indicate that the database has too much memory allocated to it. How to identify CPU related down-sizing opportunities Matching the right amount of vCPUs to efficiently serve your applications workload can be a difficult task. To locate opportunities for instance down-sizing, we should look at the CPU usage over time, to see whether the amount of vCPUs matches the actual requirements from our workload. Amazon RDS Performance Insights allows you to quickly assess the existing load on your database. By looking at the CPU usage over time, you can identify which instances are not actually using their allocated resources and can be considered for down-sizing. For example, looking at the database load chart for this instance, we can see that there is no more than one vCPU used simultaneously at any given time, while the instance has 2 vCPUs allocated to it (as indicated by the dotted flat line at the top of the chart). If you’re seeing a similar pattern for your instances over time (in both regular hours and workload peaks), that’s a strong indication that you may consider down-sizing the instance by reducing the amount of vCPUs. Resizing instances based on users behavior patterns By analyzing your workload, you can identify opportunities for down-sizing instances based on your user behavior patterns. For example, if your users are all from a specific time zone, maybe you can consider down-sizing the instance after a certain hour, when the workload reduces significantly. In the load charts above, Performance Insights can show us that around 19:30, the workload on the instance reduces significantly. Tracking this behavior over time can help us understand our users behavior patterns and apply automatic resizing actions. Please keep in mind that re-sizing the instance may require downtime when working with a single instance for each database. To avoid any downtime, you can consider working with a multi-AZ RDS environment, which will allow you to apply the changes to a reader instance before actually switching it (failover) to act as your production writer instance. Actively driving CPU and memory usage down In some cases, after analyzing the CPU and memory usage on your instances, you may find that you can’t down-size the instance without actually reducing the workload on that instance. Poorly optimized SQL queries tend to require a significant amount of resources, which in turn forces you to choose a ‘larger’ instance type than you would need if those queries were tuned for optimal performance. AWS RDS Performance Insights can help you identify which SQL queries take up most of your resources, so you could focus on optimizing those queries. As the query optimization process can be complex and time consuming, you can integrate Performance Insights with EverSQL’s sql query optimization product, which will allow you to optimize slow queries automatically, directly from the Performance Insights interface. Wrapping up Right-sizing RDS instances is a great way to drive RDS costs down. It can be done by analyzing the actual resource usage on your instances to identify down-sizing opportunities which do not require compromising on performance or availability. Also, you can take actions to actively optimize your SQL queries and your database workload and drive the CPU and memory usage down, which in turn can allow you to safely down-size your RDS instances while keeping the same service level.

  • MySQL Workbench now using Casmine for unit and integration testing
    Starting with version 8.0.18 the MySQL Workbench source package finally ships also all our GPL unit and integration tests, which we are using internally to control code quality. For that we had first to replace our old, outdated testing framework by something new and more appropriate. We evaluated quite a few C++ testing frameworks but found them either not to be sufficient or difficult to use. Instead we had something in mind that comes close to the Jasmine framework which is widely used among JS developers. The way it hides all the boring test management details and the clear structure it uses, was quite an inspiration for us and we decided to develop our own testing framework modeled after that. Casmine – C++17 BDD Testing Framework Casmine is a C++ unit and integration testing framework written in C++17 that enables testing C and C++ code with matchers, similar to the way Jasmine does for Javascript (hence the similar name). Care has been taken to make the framework very flexible but still easy to write tests and later read them with ease. The framework runs on any platform with a C++17 compiler. Casmine hides most of the boilerplate code necessary to run the tests and comes with a set of elements that help to structure your test code in a way that is easily consumable by humans. Tests are organized in spec files (one test case per file) and are self registering. After you add such a spec file to your testing project it will automatically execute on run. Two main elements structure a spec file: $describe&nbsp;and $it. The first covers an entire test case and consists of zero or more $it blocks, which will be executed in definition order ($describe execution is random, because of static initialization). In addition to that, you can specify blocks that are executed before and after all tests, or before/after each test – good for setup and shutdown of test (groups). Each test case and test can be individually disabled or focused on (which disables all non-focused tests). Focused tests in disabled test cases are still disabled, however. Additionally, a test can be marked as pending, which causes Casmine to ignore any result from it and produces a record in the test output that allows you to identify it easily. This helps greatly to start with spec skeletons and fill them later without missing one test or to specifying a reason why a test is not executed now (what $xit does not allow). Main features Written in C++17. Type safe tests with type specific matchers. Comprehensive collection of matchers and expects that should cover most of your needs. Very simple, yet powerful and flexible test spec structure, using only $describe and $it. Run test management code by using $beforeAll, $afterAll, $beforeEach and $afterEach. Focus on specific tests or entire test cases by using $fit and $fdescribe. Exclude specific tests or entire test cases by using $xit and $xdescribe. Mark tests explicitly as succeeded or failed by using $success() and $fail(). Mark tests as pending by using $pending(). Built-in console reporter with colored output. Built-in JSON reporter, which generates a JSON result file for easy integration with any CI system. Easily extend the matchers and the expects, in case your objects don’t fit the stock classes. Getting the Code + Compile It Currently Casmine is part of MySQL Workbench and you can find the code for it in the MySQL Workbench Github repository. It consists only of a handful of files, which are easy to compile (VC++ 2017, clang 11 or gcc 8.0). On macOS you can only build casmine on macOS 10.15 because we use the C++17 std::filesystem namespace for test data and output folder preparation. There’s a cmake file for Casmine in the folder, which you can use or simply add all files in that folder to your project/solution, to compile the testing framework. The only dependency which Casmine has is on RapidJSON, which is used to parse config files. You have to have it in your header + linker search path for compilation. Getting Started In its simplest form a test spec file looks like this: #include "casmine.h" using namespace casmine; namespace { $ModuleEnvironment() {}; $describe("Test Case Description") { $it("Test Description", []() { $expect(1).toBe(1); }); }); } The $ModuleEnvironment macro ensures the necessary structures are in place for the spec file, which are then used by the $describe and $it blocks. Since this implements the same class in each spec file it is necessary to wrap the entire test spec with an anonymous namespace, to avoid linker problems. On the other hand, this macro enables you to easily extend Casmine for your own project structure (more about that below). Macros are used to record file name + line number of the executing code block. This is also important for possible (unexpected) exceptions, which otherwise would not have any source location attached. The $it block takes a std::function for execution, which consists of $expect calls that do the actual test steps (checks). Each $expect call creates a temporary expectation object, runs the associated matcher (here toBe) and records the result. After that the expectation object is freed. Typically a test case also consists of setup and shutdown code, which could be like this: $describe("Test Case Description") { $beforeAll([]() { myServer.start(); }); $afterAll([]() { myServer.stop(); }); $beforeEach([]() { myServer.reset(); }); $afterEach([]() { // Remove temporary data. }); $it("Test Description)", []() { $expect(1).toBe(1); }); }); To run the test suite, call the Casmine context, which manages everything, in your main.cpp file: #include "casmine.h" using namespace casmine; int main(int argc, const char *argv[]) { auto context = CasmineContext::get(); #ifdef _MSC_VER SetConsoleOutputCP(CP_UTF8); #endif context->runTests(/* specify the base path here */); return 0; } The runTests path parameter is used to locate data and output directories. Additionally, all file names reported in the results are made relative to this path. Use CasmineContext::basePath() to get this path in your spec files. Casmine expects a data folder in the base directory (&lt;base&gt;/data) and will copy its entire content to a temp data dir before running any test. The member Casmine::tmpDataDir() can be used to get the name for that folder, when you want to access your data. Because this is a copy of the original data, it’s allowed to be modified, without affecting the original data. Casmine creates an output folder during the test execution, which you can use to store any output your tests might produces. The actual path to that can be read from Casmine::outputDir(). The temporary data dir as well as the output dir are removed when Casmine finishes (and also on start up, when they were left over from a previous run, which would indicate a crash of the application). Overview A test step (which is part of an $it block) has a pretty simple form: $expect(<actual value>).[Not.]toXXX(<expected value>, <optional message>); $expect is a macro that records line number and file name where it is called from and then calls one of the factory functions (depending on the type of &lt;actual value&gt;), which instantiates the associated Expect classes, executes the check, records the result and frees the Expect instance. The toXXX call usually performs a relational operation, but can in special cases also do more complex processing (like matching a regular expression or compare to file content). The member Not inverts the expectation. Results are not stored in Casmine, but only sent to the registered reporters. These can then just count success + failure or do something else with the results. All calls to record results are synchronized (and hence thread-safe), to avoid logging results to the wrong test spec or test. Other access is not guarded because that’s either read-only or using temporary objects. Casmine catches every exception that might occur in the test code, be it wanted or unexpected. Exceptions are logged with the line info and file name of the last successful call to either $describe, $it or $expect, whichever was called last. Additionally, you can do checks for exceptions by placing the code that throws them in a std::function and use that in an $expect call. By default test execution is stopped if an unexpected exception occurs, however this can be changed by the setting continueOnException (see also the Configuration section). It is not necessary that $expect calls are written in the same file where their surrounding $describe or $it block is located. Casmine is a stateful implementation that records test results from anywhere to the currently executing test. This enables sharing of test code between tests, by moving it out to a separate file, and then calling that in different places. Important: while you can write normal code in a $describe block, outside of any $it, $beforeAll etc. call, you must not access anything of that in these calls (via capture by reference), because the outer $describe block is executed during registration and variables in it are no longer available when the inner blocks are later executed. Code like the following will crash: $describe("Test Case Description") { size_t count = 42; $it("Test Description", [&]() { // Crash here because this closure is executed // outside of the outer block. $expect(count).toBe(42); }); }); See the “Test Data” section below for details how you can hold test values etc. for use in tests. Note: it is not necessary for either the actual or expected value to be copy-assignable. Internally, all provided values are held as const references, making it possible to check also non-copyable objects like unique_ptr. However, for strings, sometimes a copy is necessary to allow a comparison of different string types (string against char* and char[] etc.). It’s clear that both actual and expected values must stay valid during a call to $expect. Casmine comes with a number of predefined Expect classes, which are automatically selected via type deduction and provide a number of matcher functions, specific to the deduced type. The framework contains Expect classes for: Scalars (integral types, pointers (except string pointers) and object references) Strings (basic_string, as well as char, wchar_t, char16_t and char32_t pointers and arrays) Classes (all classes/objects, except standard containers, exceptions and strings as they have specialized matchers) Exceptions (exceptions thrown in a std::function) Containers (array, vector, deque, forward_list, list, set, multiset and their unordered variants) Associative Containers (map, multi_map and their unordered variants) Smart Pointers (unique_ptr, shared_ptr, weak_ptr) Each of the Expect classes is derived from one or more of the following predefined matchers, which provide the type-dependent match functions: MatcherScalar (used by the scalar and pointer Expects) toBe (value equality) toEqual (value equality or object identity for classes that override the == operator) toBeLessThan (< comparison, also for classes overriding that operator) toBeLessThanOrEqual (<= comparison) toBeGreaterThan (> comparison) toBeGreaterThanOrEqual (>= comparison) toBeTrue (actual value cast to boolean) toBeFalse (actual value cast to boolean) toBeCloseTo (numeric value with a maximum distance to a given value) toBeOddNumber (for integral types, see std::is_integral) toBeEvenNumber (for integral types) toBeWholeNumber (for arithmetic types, see std::is_arithmetic) toBeWithinRange (numeric value within a range) toBeInf (infinitive number) toBeNan (not a number) MatcherString (used by the string Expect) toBe (alias to toEqual) toEqual (string equality using std::basic_string::compare) toBeLessThan (< string comparison using the string compare function) toBeLessThanOrEqual (<= string comparison) toBeGreaterThan (> string comparison) toBeGreaterThanOrEqual (>= string comparison) toContain (find text in the actual value) toStartWith (test that the actual value begins with a string) toEndWith (test that the actual value ends with a string) toContainOnlyWhitespaces (only tab, vertical tab, space, CR/LF and form feed) toContainNoWhitespaces (all but the aforementioned whitespaces) toBeSameLengthAs (actual value length == expected value length) toBeLongerThan (actual value length > expected value length) toBeShorterThan (actual value length < expected value length) toMatch (actual value matches regular expression, given either as string or regex) toEqualContentOfFile (loads a text file and compares it line-by-line to the test value) MatcherTypeSupport (used by class and pointer Expects) toBeInstanceOf (object is an instance of a specific class) toBeSameType (class type derivation check) MatcherException (used by exception Expect) toThrow (function throws any exception) toThrowError (function throws a specific exception) MatcherContainer (used by container Expect) toContain (actual value contains a specific element) toContainValues (actual value contains a list of values with no implied order) toHaveSize (actual value size) toEqual (actual value content comparison via == operator) MatcherAssociativeContainer (used by associative container Expect) toContainKey (key search) toContainValue (value lookup for a given key) toHaveSize (container size) toEqual (content comparison via == operator) MatcherNull (used by pointer and smart pointer Expects) toBeNull (equals nullptr) toBeValid (non-null value for smart pointers or nullptr equality) MatcherClass (used by class Expect) toEqual (object identity via == operator) toBeLessThan (< comparison) toBeLessThanOrEqual (<= comparison) toBeGreaterThan (> comparison) toBeGreaterThanOrEqual (>= comparion) Matcher functions generate nice human readable output in case of test failures, including source location information. The failure message can be overridden by specifying a custom message in the matcher function, for example: $expect(1).toBe(1, "Surprising result"); To invert a test check, use the Not member of the Expect class: $expect(1).Not.toBe(1, "Less surprising"); Test Data Most of the time, when you run tests, you probably need some data to test on or to share between tests. Casmine supports that with a separate macro, named $TestData. This is actually a simple struct that can hold whatever you need. A typical use case looks like this: #include "casmine.h" using namespace casmine; namespace { $ModuleEnvironment() {}; $TestData { std::string testValue = "ABC"; }; $describe("Test Case Description") { $it("Test Description", [this]() { $expect(data->testValue).toBe("ABC"); }); $it("Test Description 2", [this]() { $expect(data->testValue).Not.toBe("XYZ"); }); }); } As you can see, this struct is made available as member data in the object behind $describe. As such, you can access all values via data-&gt;member, provided you capture the this pointer in the $it call. The data member is not available and produces a compiler error if there’s no definition of the $TestData struct. The $TestData macro can be placed anywhere in the file provided it appears before the $describe call (where a member of this type is declared, as explained above). Configuration Casmine provides two types of configurations: settings to control the execution of itself and configuration values that are used by the tests. Settings The following settings are used by Casmine: continueOnException (bool, default: false) see below verbose (bool, default: false) free to use in your tests, for example for debug messages no-colors (bool, default: false) do not use terminal colors in the console reporter only-selected (bool, default: false) see below Even though all unexpected exceptions are caught during a test run (to ensure proper shutdown), Casmine will still stop execution if one occurred, unless the value continueOnException is set to true. This does not apply to exceptions checks in your test code, of course, which are expected exceptions. It’s also guaranteed that $afterAll and $afterEach are called if an unexpected exception came up during the run of a test spec. Normally test specs run in random order and according to their type (normal, focused, disabled). If you want to change the type, you have to recompile your tests. Sometimes it might be necessary to run specs in a certain order (for example while working on a specific spec) or when switching between specs frequently. In this case you can enable the only-selected setting and call CasmineContext::forceSpecList() with a list of spec file names (no path, no extension). This could be implemented as application parameter, taking the user input and forwarding that to this function. For a list of registered specs call CasmineContext::getSpecList(). Casmine settings are available in the CasmineContext::settings member, which contains key/value pairs of the form: typedef std::variant<std::string, int, double, bool> ConfigMapVariableTypes; std::map<std::string, ConfigMapVariableTypes> settings; This member is flexible enough to introduce further settings in the future. JSON Configuration For test configuration you sometimes need sensitive data (like passwords). Passing them on the command line or in the test code is not recommended and unsave. Instead Casmine comes with a JSON document member (provided by RapidJSON) that can be loaded from an external JSON file. A typical approach is to allow the user to specify the path to such a config file as application parameter and then load it directly into Casmine, like this: auto context = CasmineContext::get(); std::ifstream configStream; configFile = expandPath(configFile); if (!configFile.empty()) configStream.open(configFile); if (configStream.good()) { rapidjson::IStreamWrapper streamWrapper(configStream); rapidjson::ParseResult parseResult = context->configuration.ParseStream(streamWrapper); if (parseResult.IsError()) { const RAPIDJSON_ERROR_CHARTYPE *message = GetParseError_En(parseResult.Code()); std::cerr << "Error while parsing the configuration file: " << message << std::endl; return 1; } if (!context->configuration.IsObject()) { std::cerr << "The configuration file has not the correct format" << std::endl; return 1; } } else { std::cerr << "Error while opening the configuration file " << configFile << ": " << strerror(errno) << std::endl; return 1; } This is how we use RapidJSON in our unit tests, as you can see in the main.cpp file for wb-tests. Casmine supports easy access to this config data via CasmineContext::configuration. Additionally, you can use helper functions to get values from a specific path through the JSON file (nothing fancy like XPath, just a simple recursive name lookup, which limits this to named elements, such as, no array elements). The names of these helpers are pretty self explanatory: std::string getConfigurationStringValue(std::string const& path, std::string const& defaultValue = "") const; int getConfigurationIntValue(std::string const& path, int defaultValue = 0) const; double getConfigurationDoubleValue(std::string const& path, double defaultValue = 0.0) const; bool getConfigurationBoolValue(std::string const& path, bool defaultValue = false) const; Customizations Casmine enables you to customize several aspects in a consistent way, without compromising the overall readability or handling. Reporters There are 2 predefined reporters currently: Terminal/Console with color support (where possible + meaningful, that means no ugly codes in log files) JSON result file (for easy consumption by CI and other automated systems) Color support is automatically determined (but can also be switched off by assigning true to the settings value no-colors or setting the environment variable CASMINE_NO_COLORS=1). It’s easy to add own reporters (for example to log results in a database) by deriving from the abstract Reporter class and implementing the necessary functions that do the specific handling. The new reporter must be registered via CasmineContext::addReporter. Registered reporters can be removed using CasmineContext::clearReporters. Expects and Matchers Casmine already covers a large part of possible test types, but sometimes it is necessary to expand on that for specific handling, for example to provide own matcher functions for either already covered cases or your own custom types. The main anchor point for extensions is the already mentioned $ModuleEnvrionment macro. It consists of two parts: Local extensions Library extensions Local extensions are valid only in the file where you specify them, while library extensions are reusable by all test specs. The technique behind this extension mechanism works with variadic base types, which is a variation of parameter packs (see the paragraph titled Base specifiers and member initializer lists). In short: the factory struct for the Expect class creation functions is composed of three parts: A struct with all default factory functions (named DefaultExpects). A struct in the local file (the { } block of the $ModuleEnvironment macro). A list of custom structs which can be implemented outside the current spec file. The names of these structs must be specified in the parentheses of the $ModuleEnvironment macro). A full environment call can be similar to this: #include "casmine.h" using namespace casmine; namespace { $ModuleEnvironment(MyExpects1, MyExpects2) { static auto makeExpect(const char *file, size_t line, MyType const& value) { return ExpectTemplate<MyExpect1<MyType>>(file, line, value); } }; ... } The two library extensions (MyExpects1 and MyExpects2), as well as the code in the environment block are all combined together with the default expects into a single local environment struct and take part in type deduction for your custom types automatically. The implementation of a custom expect class now offers you the way to add standard matchers or own matcher functions by defining it (for example) as: template<typename T> class MyExpect1 : public Expect<T>, public MatcherClass<T>, public MySuperMatcher<T> { public: MyExpect1(const char *file, size_t line, T const& value, bool inverse) : MatcherBase<T>(file, line, value, inverse), Expect<T>(), MatcherClass<T>(file, line, value, inverse), MySuperMatcher<T>(file, line, value, inverse) {} }; A custom Expect class always has to derive from 2 classes: Expect which is the base for all Expect classes and MatcherBase which is the base for all Matcher classes. Due to the virtual inheritance used for the matchers, it is necessary to explicitly call the MatcherBase constructor, even if the sub Expect class does not directly derive from it. In addition to these 2 base classes you can derive from any number of your own (or built-in) matcher classes. Each of them can provide a specific set of possible matcher functions. Take a closer look at the built-in matchers to learn how to write such a class. In summary: the actual testing functionality is provided by the matcher classes, while the expect classes are used to select those matchers that provide the correct functionality for a given C/C++ type and provide the environment to easily execute normal and inverted expectations. To supplement the previous description of Casmine, see the following concrete example for a custom class: // The class to test (implementation omitted for brevity). class FtpClass { public: void connect(); void disconnect(); bool isConnected(); std::string toString(); } // The matcher providing specific check functions for this custom class. template<typename T = FtpClass> class MatcherFtp : MatcherBase<T> { public: MatcherFtp(const char *file, size_t line, T const& actual, bool inverse) : MatcherBase<T>(file, line, actual, inverse) {} using MatcherBase<T>::actualValue; using MatcherBase<T>::invertComparison; using MatcherBase<T>::processResult; void toBeConnected() { bool result = actualValue.isConnected(); if (inverComparison) result != result; processResult(result, invertComparison ? "The connection should be closed" : "The connection should be open"); } } // The wrapping Expect class for the FTP matcher. template <typename T> class ExpectFtp : public Expect<T>, public MatcherFtp<T> { public: ExpectFtp(const char *file, size_t line, T const& value, bool inverse) : MatcherBase<T>(file, line, value, inverse), Expect<T>(), MatcherFtp<T>(file, line, value, inverse) {} }; $ModuleEnvironment() { // This factory method takes part in the type deduction process to find an Expect class // for a specific custom type. static auto makeExpect(const char *file, size_t line, const FtpClass &value) { return ExpectTemplate<ExpectFtp>(file, line, value); } }; FtpClass ftp; $describe("Test FTP") { $beforeAll([]() { ftp.connect(); }); $it("Test Description)", []() { $expect(ftp).toBeConnected(); }); }); Library Functions In addition to the main functionality for tests, there are some helper functions/structs to ease certain tasks. They are all used in Casmine, but are available also for general use in your tests or custom matchers. ANSI Styles If you want to print custom messages in your tests or reporters that are colored and/or have a certain style (bold, italic etc.), then you can use the output stream operators in the file ansi-styles.h. Here’s an example: std::cout << styleGreenBackgroundOn << styleItalicOn; std::cout << "Some italic text on a green background"; std::cout << stylesReset << std::endl; These operators will automatically take care to print nothing if the current output pipe is not a terminal (for example when redirected to a file). Type Deduction Correct type deduction is a key aspect in Casmine to provide the right set of matching functions for a specific group of types. In the file common.h are helper templates for more concise factory functions, for example EnableIf, EnableIfNot, IsContainer, IsSmartPointer and others. Here’s how the scalar Expect factory method uses some: template< typename T, typename = EnableIf<std::is_scalar<T>>, typename = EnableIfNot<std::is_pointer<T>>, typename = EnableIfNot<IsString<T>> > static auto makeExpect(const char *file, size_t line, T value, int *dummy = nullptr) { return ExpectTemplate<ExpectScalar<T>>(file, line, value); } It is possible to freely mix the helper templates and the standard ones (as shown here for the pointer switch). Others The header file helpers.h contains a number of other helper functions: splitBySet: splits a string into components using the passed-in separators. utf32ToUtf8: converts std::u32string to std::string. utf16ToUtf8: converts std::u16string to std::string. utf8ToUtf16: converts std::string to std::u16string. randomString: generates a random string from ASCII characters + digits with a maximum length. getEnvVar: safe variant to get an environment variable, returns the given default value if a variable doesn’t exist. expandPath: expands all environment variables and special chars (like the tilde on Linux/macOS) in the given string. Converts only those values that are valid on the current platform. relativePath: returns the second path relative to the given base path, provided both have a common ancestor. length: determines the length of a string literal at compile time (for instance for static assertions). typeToString(): platform agnostic C++ type name demangling. toString(): recursive value-to-string conversion beyond the abilities of std::to_string. Output is always UTF-8encoded, also for UTF-16 and UTF-32 input. containerToString: converts any container object (set, map etc.) into a string.