What do data scientists, data engineers and data architects do?
Three key roles in the Big Data universe: data scientists, data engineers and data architects are three professions or roles that are usually confused with each other. The tasks that they carry out are pretty different. However, in the case of data scientists, and data engineers, they usually overlap. Let’s say that the clearest and most well defined roles are the ones of data scientists and data architects.
Data Scientists are usually related with statistic models, automated learning processes and artificial intelligence. They are like unicorns in the business field that know a bit of everything (or should know) They have great mathematical knowledge and are good at programming in various programming languages. They also need to have business knowledge in order to translate business language into useful statistic models.
On the other hand, we have the data architect. These professionals are usually related with the system infrastructure and administration. They build and maintain the infrastructure (Big Data cluster, servers) so that data scientists can work on it. They have taken a special relevance in the Big Data field, where in order to have an on premise infrastructure, it’s necessary to have powerful infrastructure equipment. In other words, the statistic models of data scientists will become real unicorns, and by that we mean, inexistent. They are the ones that will make sure the foundation of the project is stable.
Here’s where there’s usually confusion. Data engineers can be defined as the piece between data scientists and data architects. They are the ones that are in charge of designing the pipelines. The ones that according to a project that has specific needs, choose the best technologies, and the pipelines in order to take data from point A to point B.
One more role, but not less important: Data Analyst
The data analyst or data modeller role is usually confused with the data engineer one. This role must have ETL, APIs and databases modelling knowledge. They define data coding, schemes, compressions and the versioning. They also monitor ETLs, validations, etc. This position is usually confused a lot with the data engineer, but the latter is more involved in design and engineering.
What do you think of these roles? Do you work in any of them?
If you want to contact us directly, send us a message via Contact
3 Roles clave en el universo big data