Category Archives: Big Data

Why document databases are old news…


We’re going to store data the way it’s stored naturally in the brain.

This is a phrase being heard more often today. This blog post is inspired by a short rant by Babak Tourani (@2ndhalf_oracle) and myself had on Twitter today.

How cool is that!!

This phrase is used by companies like MongoDB or Graph Database vendors to explain why they choose to store information / data in an unstructured format. It is new, it is cool, hip and happening. Al the new compute power and storage techniques enable doing this.
How cool is that!!
Well, it is… for the specific use-cases that can benefit from such techniques. Thinking of analytical challenges, where individual bits of information basically have no meaning. If you are analyzing a big bunch of captured data, which is coming from a single source like a machine, or a click-stream or social media, for instance, one single record basically has no meaning. If that is the case, and it is really not very interesting if you have and retain all individual bits of information, but you are interested in “the bigger picture”, these solutions can really help you!

How cool is it, actually?

If it comes to the other situations where you want to store and process information… where you do care about the individual records (I mean, who wants to repopulate their shopping cart on a web-shop 3 times before all the items stick in the cart) there are some historical things that you should be aware of.
Back in the day when computers were invented, all information on computers was stored “the way it’s stored naturally in the brain”.
Back in the day when computers were invented, all we had were documents to store information.
This new cool hip and happening tech is, if anything, not new at all…
Sure, things changed over the last 30 years and with all the new compute power and storage techniques, the frayed ends of data processing have significantly improved. This makes the executing of data analysis, as described above, actually so much better!! Really, we can do things to data, using these cool new things, that we never dreamt possible, 30 years ago.
But these things remain the “frayed ends of data processing”.
If you do have requirements like filling your shopping cart once, and it works all the way through check-out…
If you do have requirements where some kind of “transaction” is required (like buying something, like your bank account, like two actions that are dependent of each other)…
You need transactions…
I know, “transaction” is boring, old-fashioned and a seemingly surpassed entity…
But, I promise you, you will want those things, if you actually have to process something in your application in a way that makes real-world sense.

This was solved ages ago

For that, indeed 30 years ago (which is such a long time, most of the cool young dudes and dudetes developing applications today were not even born), the relational database theory was invented to solve the inherent issues that document based databases bring if you want to introduce these transactions to your application.
Document databases brought these issues back in the day… They bring these issues today!!!
Please believe me, they bring these issues today! This is the reason – contrary to the messages by non-relational database vendors – applications developers find that they need to add actual transactional capabilities to their applications, to either work in real life of bring any kind of scalability to them.
Imagine building an application and actually being successful with it! Isn’t that the dream of every application project? How boring is it then, to find that you are unable to meet demands? Not because you are understaffed or because you lack compute-resources? But simply because your application, based on these data storage methodologies, cannot keep up? Document database is data storage, not data processing.
For that, you would need the likes of PostgreSQL. Postgres is (also) free, it is Open Source… it is even Community Open Source, how cool is that? No annoying vendor telling fantasy stories about what Postgres can do, unlike MongoDB for instance.

So…

Coming back to the opening phrase, We’re going to store data the way it’s stored naturally in the brain.
It is kind of dumb to use a computer to store data like it would be stored in the brain. The human brain is not designed to process YUGE amounts of data, simply because the structure is not designed to accommodate that. Period.
To process large amounts of data, you need structures, either when you store the data or when at the moment you want to start doing stuff with it. Structuring data when you store it, is by far the cheapest method. Technologies like JSON data storage add sufficient flexibility to that, and engines like Postgres have no trouble what so ever processing such data.
Finally, the programs these vendors use to “store data the way it’s stored naturally in the brain” are written in computer-code, also not “naturally like the brain”. Would we need to revert to medieval clerks to start recording the data in these documents? No, I guess not.
Be smart,
Be modern,
Be hip-and happening,
Be efficient and scalable,
Use relational database techniques…


Riga Dev Days 2017, new experiences in many ways.

Riga Dev Days 2017

General

It has been a while since my last blog-post.
One of the reasons is my shift from closed to open source software, databases more specifically. More on that in a later blog-post.

The reason for already mentioning this is this strange hybrid (what a popular word, these days) situation that I am in at the moment.
Thanks to the super enthusiastic, flexible and tenacious organization-team of the Riga Dev Days, I was able to participate.
Happily boarded the Air Baltic flight, I went on my way to Riga!!

Being new at the broader conference scene, I enjoyed being at a mixed source developer conference. Besides the usual suspects – some of which are my best friends – I got to meet many interesting new people.
One of the key phrases of the day is: “the more you learn, the more you realize you know nothing – John Snow…” and it’s true! You never stand to think about it, but the wealth of subjects is just tremendous and the combined knowledge at events like these is down right “Yuge, it’s awesome, tremendous!”

Day one

With a day like this, time flies. Between session (and during sessions) there are discussions, a bit of work and catching up to do.
Still I managed to catch a few sessions, like the one from Michael Hüttermann who made a clear and well rounded case regarding CI/CD in a DevOps world. A nice insight into the effort that goes into what’s behind the proverbial “push of a button”.
Another example was that by Marcos Placona about the many (and very basic) things that you have to keep in mind wen building apps. There is no silver bullet and the best you can achieve is to discourage the hacker so much, they move on. Much like securing your house, do to speak.

The day ended in the medieval basements of Riga, where we had some really good medieval food. Life is good…, well…, it has it’s moments!

Day two

The keynote address by Edson Yanaga, which kicked off day two of the Riga Dev Days, was quite interesting.
Shortening development and deployment cycles and shrinking feature release sets actually helps improving software and deployment quality by creating faster and more accurate feedback loops. By looking at these concept in this way, buzzez like DevOps and Agile actually get some hands and feet. One of the lessons, though, is that doing things this way do not eliminate work or automagically solve various issues for you! It will help in getting predictability and continuity into your software development processes.
A nice eye-opening remark finally, was… “no, I don’t pay you to make something work on your computer, I pay you to make something work on my computer(s)!!”

Another talk I was able to attend was around Blockchains. Something I knew nothing of and was actually quite interested in. Nick Zeeb took us through a very lively and very animated tour of what actually a Blockchain is and what the awesome potential of this technology can be. I was impressed.

With this, the second day draw to and end and therewith also my turn “in the pit”. As this event is held in a movie-theater, every room had a sloped tribune, which was often packed with enthusiastic participants. I had the opportunity to share my thoughts on the comparison between PostgreSQL and Oracle.
The session was very well attended with a lot of questions regarding the possibilities of using these other technologies in scales that were not really considered before. You can find a recording of the actual presentation here as soon as it comes available.

Riga Dev Days was a good conference. I would recommend everyone to either attend or submit an abstract for their event in 2018!!

My picks, no, Agenda… for UKOUG_Tech15

I went over the agenda for UKOUG_Tech15 and took my picks & suggestions.
Then I thought, why not share these…

MONDAY

The Oracle Database In-Memory Option: Challenges & Possibilities
Christian Antognini – Trivadis AG

Standard Edition Something for the Enterprise or the Cloud?
Ann Sjökvist – SE – JUST LOVE IT

All about Table Locks: DML, DDL, Foreign Key, Online Operations,…
Franck Pachot – DBi Services

Silent but Deadly : SE Deserves Your Attention
Philippe Fierens – FCP
Co-presenter(s): Jan Karremans – JK-Consult (Having a link here would be silly, right)

Oracle SE – RAC, HA and Standby are Still Available. Even Cloud!
Chris Lawless – Dbvisit

SE DBA’s Life a Bed of Roses?
Ann Sjökvist – SE – JUST LOVE IT

Oracle Standard Edition Round Table
Joel Goodman – Oracle
Co-presenter(s): Ann Sjokvist, Philippe Fierens, Jan Karremans

TUESDAY

Watch out for #RepAttack… all day long!!
And earn your RepAttack badge-ribbon…

Advanced ASH Analytics: ASHmasters
Kyle Hailey – Delphix

Community Keynote – Dominic Giles

Oracle BI Cloud Service – Moving Your Complete BI Platform to the Cloud
Mark Rittman – Rittman Mead

Infiniband for Engineerd Systems
Klaas-Jan Jongsma – VX Company

Oracle Database In-Memory Option – Under the Hood
Maria Colgan – Oracle

Do an Oracle Data Guard Switchover without Your Applications Even Knowing
Marc Fielding – Pythian

Using Oracle NoSQL to Prioritise High Value Customers
James Anthony – RedStack tech

WEDNESDAY

HA for Single Instance Databases without Breaking the Bank
Niall Litchfield – Markit

Database Password Security
Pete Finnigan – PeteFinnigan.com

Connecting Oracle & Hadoop
Tanel Poder – PoderC LLC

Enterprise Use Cases for Internet of Things
Lonneke Dikmans – eProseed
Co-presenter(s): Luc Bors – eProseed

Bad Boys of On-line Replication – Changing Everything
Bjoern Rost – portrix Systems GmbH
Co-presenter(s): Jan Karremans – JK-Consult

RMAN 12c Live : It’s All About Recovery,Recovery,Recovery
René Antúnez – Pythian

Hopefully it will attend you to some interesting session for you!

Big Data: Hadoop and Oracle technologies explained

MarkRittmanUnder the title “Hadoop and Oracle technologies on BI projects” Mark Rittman flew to The Netherlands on the 14th of July to visit the Oracle Usergroup Holland.

As I had obviously heard a lot about Hadoop, I never really did anything further with it and left it to a synaptic link to Gwen Shapira. This lack of action created a kind of threshold in the understanding of the technology. When I heard about this session I realized this would be the moment to take a step further. It turned out the be the  first real talk that puts “Big Data” in the perspective it needs to be consumable and realistic.

In these current times where “The Internet of Things”, more and more social media and ever further digitization we are heading to a Big Data Disruption. This is both a conceptual as a very real thing if you take a moment to think about it. According to real world experience it is also not something “which will once be”, it is something which is actually here today!

On the technical side of thhadoopings, data is captured in something that is called a “data reservoir” (or “data lake” or “data dump (yard)”). Compared with “regular” data storage, you can conclude that data-governance, or a data-structure, in a Big Data system is applied later  We are used to apply this structure, this governance, beforehand, by applying data definition. Using Hadoop in combination with noSQL give you “schema on read” capabilities making quering of the Hadoop data reservoir possible.

Adding this structure later is harder! This leads to the following:

  • Data is much easier to get into Hadoop then into a star-schema
  • Data is much easier to get out of a star-schema then out of Hadoop

This could be one of the essential things to consider when thinking about engaging in a Big Data project!

As Tanel Poder concluded: “High value, high density data will remain in the Oracle database” which I think is a very true conclusion. In the end, the high value conclusions (or the engineering of Big Data results) will also happen within the Oracle database.

On the horizon is “Oracle Big Data Discovery” which will help with the time consuming and tedious work of sorting and interpreting raw data in the data reservoir. The use of ‘R’, as the data exploration tool of duty, is expected to be replaced by this discovery tooling, over time…

To sum up the concept of the first half of the presentation, to my taste:

  • Hadoop changes business
  • NoSQL scales business
  • Oracle runs business

It takes eons to list all names of the Buddha” nicely sums up the number of different applications that make up and are needed to execute a successful Big Data project.
Plus, “You’d better keep the 13 rules for relational databases close at hand“!

presentation

Part two of the evening was spent on mapping these concepts on actually tools, disclosing data through Hadoop to Oracle SQL and making actual use of Big Data. The exercise was completed by demos and illustrated by screenshots from the slides (link below).
A special word of warning goes out to the security aspect of Big Data, which is something to really pay close attention to. Kerberos authentication and apache Sentry are imperative things to implement in your Big Data environment.

All in all, this evening turned out to be 110% more informative and necessary as I expected when I embarked on the journey to Utrecht! Thank you for sharing, Mark!

Thanks to Piet de Visser for the nice quotes! And a great “hi there” to Klaas-jan Jongsma, René Kuipers and Marti Koppelmans.

If you want to work with Big Data on your Smal(ler) Device, please download the Big data light VM from OTN.

The link to the slides for anyone who wants to review the “extended remix”!