I need a scalable NoSql
solution to store data as arrays for many fields & time stamps, where the key is a combination of a field
and a timestamp
.
Data would be stored in the following scheme:
KEY –> “FIELD_NAME.YYYYMMDD.HHMMSS”
VALUE –> [v1, v2, v3, v4, v5, v6] (v1..v6 are just floats
)
For instance, suppose that:
FIELD_NAME = “TOMATO”
TIME_STAMP = “20060316.184356”
VALUES = [72.34, -22.83, -0.938, 0.265, -2047.23]
I need to be able to retrieve VALUE (the entire array) given the combination of FIELD_NAME
& TIME_STAMP
.
The query VALUES[“TOMATO.20060316.184356“] would return the vector [72.34, -22.83, -0.938, 0.265, -2047.23]. Reads of arrays should be as fast as possible.
Yet, I also need a way to store (in-place) a scalar value within an array . Suppose that I want to assign the 1st element of TOMATO
on timestamp 2006/03/16.18:43:56
to be 500.867
. In such a case, I need to have a fast mechanism to do so — something like:
VALUES[“TOMATO.20060316.184356“][0] = 500.867 (this would update on disk)
Any idea which NoSql
solution would work best for this(big plus if it has python
interface)? I am looking for a fast yet a powerful solution. my data needs would grow to about 20[TB].
Advertisement
Answer
Your data is highly structured and regular; what benefit do you see in NoSQL vs a more traditional database?
I think MySQL Cluster sounds tailor-made for your problem.
Edit:
@user540009: I agree that there are serious slowdowns on single-machine or mirrored instances of MySQL larger than half a terabyte, and no-one wants to have to deal with manual sharding; MySQL Cluster is meant to deal with this, and I have read of (though not personally played with) implementations up to 110 terabytes.