Skip to content
Advertisement

NoSQL Solution to Store Arrays?

I need a scalable NoSql solution to store data as arrays for many fields & time stamps, where the key is a combination of a field and a timestamp.

Data would be stored in the following scheme:

KEY –> “FIELD_NAME.YYYYMMDD.HHMMSS”

VALUE –> [v1, v2, v3, v4, v5, v6] (v1..v6 are just floats)

For instance, suppose that:

FIELD_NAME = “TOMATO”

TIME_STAMP = “20060316.184356”

VALUES = [72.34, -22.83, -0.938, 0.265, -2047.23]

I need to be able to retrieve VALUE (the entire array) given the combination of FIELD_NAME & TIME_STAMP.

The query VALUES[“TOMATO.20060316.184356“] would return the vector [72.34, -22.83, -0.938, 0.265, -2047.23]. Reads of arrays should be as fast as possible.

Yet, I also need a way to store (in-place) a scalar value within an array . Suppose that I want to assign the 1st element of TOMATO on timestamp 2006/03/16.18:43:56 to be 500.867. In such a case, I need to have a fast mechanism to do so — something like:

VALUES[“TOMATO.20060316.184356“][0] = 500.867 (this would update on disk)

Any idea which NoSql solution would work best for this(big plus if it has python interface)? I am looking for a fast yet a powerful solution. my data needs would grow to about 20[TB].

Advertisement

Answer

Your data is highly structured and regular; what benefit do you see in NoSQL vs a more traditional database?

I think MySQL Cluster sounds tailor-made for your problem.

Edit:

@user540009: I agree that there are serious slowdowns on single-machine or mirrored instances of MySQL larger than half a terabyte, and no-one wants to have to deal with manual sharding; MySQL Cluster is meant to deal with this, and I have read of (though not personally played with) implementations up to 110 terabytes.

Advertisement