大牛们看过来:system design讨论

原帖地址:mitbbs

我现在准备上的一个项目,没有什么头绪,或者说是头绪太多。恳请大牛斧正和指教。
现在开发环境是C/C++,Oracle.下面用英语,描述起来方便些。

there are 2 services, service_a and service_b.

service_a majorly does calculation for the given entity. after that if the
calculated result is different to what's persisted in Oracle DB, then update
the corresponding rows for the given entity in Oracle DB. entities are
received continuously from some queuing services.

service_b majorly does read from Oracle DB periodically, let us say every 4
hours. it basically reads all updated entities done by service_a
in the past 4 hours, then outputs them in a certain text file format to
Amazon's S3 for storage.

a little bit more background, in the past (before 2012) service_b
read all entities, not just updated ones by service_a in the past 4 hours to
create a complete snapshot of all entities for various clients. then it
turned out Oracle could not handle that much concurrent write and reads at
the same time, or degraded too much, so right before I joined this small
factory someone changed the logic to have service_b only read delta (updated
entities) in the past 4 hours. they created another process to merge the
delta into base to create the final snapshot somewhere else.

now, manager wants to move even further to get rid of Oracle completely,
maybe because Oracle is too costly to maintain. we are looking if there are
some open-sourced Sql/no-sql DB such as mysql, cassandra, dynamodb, etc.,
which can replace Oracle, such that both service_a and service_b can work
without interruption.

furthermore, it's even better if, after Oracle is replaced, service_b can
read directly from the new DB for ALL entities, not just UPDATED entities,
to create complete snapshot without going through the current steps of
creating delta, merging with baseline, etc. basically service_b goes back to
the old time (before 2012) when it used to work.

any good suggestions for the DB candidate? if it has very good / quick way
to create current DB replica, that would be very nice. the new DB should
also support high data consistency and availability. even if the consistency
might not be matchable to Oracle we hope it can get as close as possible.

currently we are considering the following candidates: cassandra, dynamodb,
mysql. are there other candidates?