by Mara Hermann & Marisa Mohr
The daily routine of a mathematician in the field of Data Management & Analytics can be diverse: Data collection, preparation and analysis, the design of artificial intelligence (AI) models, and much more. The opportunities to get involved in a data project are usually not limited to one’s own field. We, Mara (Senior Big Data Scientist) and Marisa (Senior Machine Learning Engineer), are two mathematicians who juggle data in a variety of ways every day. In this blog post, we describe what a day as a data juggler is like and how we use mathematics in our everyday lives.
If you study maths, you are faced with a wide range of possible career paths. But you should definitely take a look at the field of data management & analytics – not just because the Harvard Business Review called the data scientist’s profession the sexiest job of the 21st century [1]. In recent years, many specialised job titles have emerged, for example “Data Engineer”, “(Big) Data Scientist” and “Machine Learning (ML) Engineer”. However, they all have the same aim: to process data in such a way that useful information can be extracted (learned) from it and computers can act intelligently based on this knowledge. In particular, working with and implementing AI algorithms requires more than just AI experts – it’s a team sport. Regardless of their job title, it takes many different specialists working together as a team and complementing each other. Other areas of computer science such as database management or software engineering are also becoming increasingly important.
Marisa, what is your role as an ML engineer in the team and when do you still use maths?
Due to the above-mentioned diversity and the numerous connections to other team members, it is difficult to describe a typical day of an ML engineer because every day is characterized by new challenges – fortunately. However, even with the most complex challenges, our mathematical-analytical approach does not make us despair.
The mathematical modelling of data in a learning algorithm, be it through a slightly more applied, specialised linear regression, or through a fancy artificial neural network, usually takes up no more than the last 5-10% of a whole data project. For a prediction to work really well, the end-to-end idea is crucial. Where does the data actually come from? And what data do I need to arrive at a valid result? Do I have the right data? Can I get to more profitable data, or do I have to change the prediction goal? It’s crucial to understand the big picture. After all, you need exactly the data that fits the problem you want to solve.
All of AI […] has a proof-of-concept-to-production gap. […] The full cycle of a machine learning project is not just modeling. It is finding the right data, deploying it, monitoring it, feeding data back, showing safety — doing all the things that need to be done to be deployed.
Andrew Ng [2]
In general, an ML engineer is a person who helps deploy machine learning or artificial intelligence algorithms in a productive environment so that they can be used in the day-to-day business without difficulty. That sounds like a lot of infrastructure operations and software engineering, and yes, that can be a big part of an ML engineer’s job. You have to understand the existing IT landscapes and systems at the customer level to decide how to build a pipeline in those existing systems between the data and the output of a prediction, and how to deploy everything at the end. But as mentioned before, AI is a team sport. Of course, as an ML engineer, I’m not the specialist in everything, but it’s important to stay on top of everything.
Now, how much mathematics is needed in this interdisciplinary field as an ML engineer primarily depends on the level and interest of the individual in the mathematical-statistical techniques that are being used. There is this type of ML engineer who spends all day building infrastructures or programming software to make an intelligent algorithm run productively in the client system. This kind of ML engineer is certainly more influenced by computer science than I am as a mathematician. I admire that, but I could never get lost in coding, and the good thing about being an ML engineer is that you don’t have to. The profession is so multi-faceted and multi-dimensional that everyone can follow their own passion and take their personal role in the team – with the bonus of dabbling in other roles every now and then.
As a mathematician, I have taken on various roles over the years. During a project phase, I often take on the role of a general strategist or project manager, ensuring that the team follows the same vision to bring together input and intelligent output in the productive environment. Then, when data modelling specialists are required in the project, I have the opportunity to follow my mathematical passion in the form of smaller data explorations and visualizations, through the evaluation of mathematical relationships in the data, to the selection and training of learning algorithms. The latter also includes consideration of accuracy, training time, model complexity, number of parameters, and number of features. In addition, parameter settings and validation strategies have to be selected, underfitting and overfitting have to be identified by understanding the bias-variance trade-off, and confidence intervals have to be estimated. A deep dive into maths for ML can be found on Medium [3]. As a mathematical minded ML engineer, my role can therefore be similar to that of a data scientist from time to time.
This role change and diversity is what I love about working as an ML engineer, or working in a data project team in general. Another ML engineer could certainly take many more technical roles, especially when it comes to gathering the appropriate data without which no ML or AI model works. And that’s where Mara comes in.
Mara, what do you do all day as a data engineer and when do you still use mathematics?
After my studies in mathematics, I started working as a data scientist for an IT company. When I applied for the job, I was asked in the interview what title I would prefer: data engineer or data scientist. At the time, I was convinced that the latter was the only reasonable choice for a mathematician like me. Even during my studies, I was a working student in the fields of data science and in addition to that, I also attended lectures on data mining, neural networks and other related topics.
The connections between mathematics and data science are numerous – in fact, data science is mainly the application of mathematical models to various use cases. And I wish this fact would be taught more often and more emphatically at university.
Have you ever wondered what all that mathematical theory is good for? If you are a mathematics student – have you ever been frustrated about all the types of matrix factorizations one has to learn in numerical mathematics? Or perhaps you are a high-school graduate contemplating the high art of analysis and algebra but you fear it will end in nothing?
I can soothe you: The use cases for mathematics and its theories are boundless.
One of my favourite examples that I encountered during my job as a working student are recommender systems. A great introductory article on this topic can be found on Medium in which recommender systems are defined as “algorithms aimed at suggesting relevant items to users” [4]. Those items could be for instance products in an online shop or movies on a streaming platform. The interaction between items and users can be represented by a sparse matrix where each entry describes e.g. how a user rated a specific movie or if a user bought a given product. One approach to retrieve information and learn recommendations from this matrix is to decompose it into two smaller and denser matrices, the so-called matrix factorization. One matrix then describes the user representation and the other one the item representation – a great illustration of how a mathematical framework can be used in practise, just to name one example. Also other mathematical methods find use in the theory of recommender systems.
Now I fancied about how various and “sexy” [1] the applications of pure (and sometimes dry) mathematics in data science can be. But if you read this article carefully, you may have noticed that I wasn’t asked about my work as a data scientist but as a data engineer. Why?
As already mentioned, working on AI or – generally speaking – a data project is a team sport and in this course you also get in touch with other roles and switch positions from time to time. With my mathematical background I always had great respect for the role of a data engineer which I thought would be reserved for “real” programmers with an IT background. In the beginning of my studies I wouldn’t have thought that I would ever be interested in coding and, like Marisa, I will probably never be as much into programming as someone who studied computer science. But data engineering is so much more than sitting in front of the laptop, producing green letters on a black screen while typing at the speed of light.
The “unsexy” sibling of data science sure inherits more aspects from computer science than from mathematics [5]. As a data engineer, one designs, implements and monitors data pipelines which may feed a Data Scientist’s ML models. Additionally, data storage and quality are a huge part of the cake. Programming skills and willingness to permanently learn new technologies are indispensable in this job.
With this role description in mind, it’s true that you don’t necessarily need maths for being a data engineer. But that doesn’t mean that mathematicians can’t be good or even excellent data engineers at all. Their education entails a lot more than knowledge in algebra, analysis and many other subjects. It is often said that mathematics and philosophy are closely interrelated, some universities like Oxford even offer lectures combining both disciplines [6]. Even without attending such a course, a mathematics student acquires a lot of soft skills which are basic tools in the everyday life of a data engineer: One has to handle complex systems consisting of different data sources connected through various pipelines. With logical and analytical thinking one can better understand and design ETL (extract, transform and load) processes. Thoroughness and checking for accuracy are key to monitoring data pipelines and ensuring high data quality. Resilience, deduction and reasoning are of great help during performance tuning or debugging data pipelines. With some of these capabilities in your tool kit you have a great foundation for the role of a data engineer, practical experience comes with time.
Thus, the opportunities for a mathematician in the data sector are broad. Different types of people and skills are required and there are numerous further training possibilities. Also, data projects can be very diverse, since data is everywhere: e-commerce, food and fashion retail, logistics, mobility, smart buildings,… One can always find a use case which fits one’s taste. I can definitely recommend taking the chance and gaining an insight into this branch.
Regardless of which field of study or career path you choose, I can only encourage you to look beyond the horizon and also get a taste of other roles and fields than the ones you are already familiar with. Be it positive or negative, it will be a learning experience for you. And you will be an enrichment for every team if you can think out of the box.
Literature:
[1] https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
[3] https://towardsdatascience.com/the-mathematics-of-machine-learning-894f046c568
[4] https://towardsdatascience.com/introduction-to-recommender-systems-6c66cf15ada
[5] https://www.stitchdata.com/blog/5-things-you-should-know-for-career-in-data-engineering/
[6] https://www.ox.ac.uk/admissions/undergraduate/courses-listing/mathematics-and-philosophy