Clara Choi

To save or not to save data?

Posted by Clara Choi on October 08, 2013 0 Comments
I run into this a lot, and I wonder a lot what is the proper answer to this question.
For accounting it is easy, record everything, unless it is not confirmed or rolling. You have patterns like, db_amount || rolling_sum
But... what about other things? when do you record it? and what if you need to update it? Duplication of data isn't a problem as a data scientist, but what about when the duplication needs to incorrect data? Now what? What is atomic and why?

Those are the things I think about a lot, and not just from data analyst point of view, because of course, the "ideal" world would be duplicate all the things, and event ALL the things... what about billing? what about when your data cannot grow unbounded? How do you leave data in a state much like you do with code?

There are answers to these questions, which with every answer, ultimately leads to more questions ... and sometimes I am not sure if I am happy with what I hear :)...

Leave a comment

Please note: comments must be approved before they are published.

More:

You can reach me by emailing me AT clarachoi.ca