The wiki of Carlos Rodrigo

Big Data Now: 2012 Edition

The value of big data to an organization falls into two categories: analytical use and enabling new products.
The three Vs of volume, velocity, and variety are commonly used to characterize different aspects of big data.
It’s not just about input data. The velocity of a system’s outputs can matter too. The tighter the feedback loop, the greater the competitive advantage.
A common use of big data processing is to take unstructured data and extract ordered meaning, for consumption either by humans or as a structured input to an application.
A principle of big data: when you can, keep everything.
For instance, documents encoded as XML are most versatile when stored in a dedicated XML store such as MarkLogic. Social network relations are graphs by nature, and graph databases such as Neo4J make operations on them simpler and more efficient.
A majority of big data solutions are now provided in three forms: software-only, as an appliance or cloud-based.
Many organizations opt for a hybrid solution: using on-demand cloud resources to supplement in-house deployments.
Big data practitioners consistently report that 80% of the effort involved in dealing with data is cleaning it up in the first place,
Data marketplaces are a means of obtaining common data, and you are often able to contribute improvements back. Quality
Data scientists as having the following qualities: Technical expertise: the best data scientists typically have deep expertise in some scientific discipline. Curiosity: a desire to go beneath the surface and discover and distill a problem down into a very clear set of hypotheses that can be tested. Storytelling: the ability to use data to tell a story and to be able to communicate it effectively. Cleverness: the ability to look at a problem in different, creative ways.
The ability of MapReduce to distribute computation over multiple servers.