Getting started with Data Science
Introduction
- Did you recently decide to become a Data Scientist? But you don't know what to do next or how can you get started? -- Well it's common to be confused about these things and specially on how to begin your journey in this amazing world of Data Science ๐
- Just like learning any other technique, it's better to approach learning Data Science from scratch and in multiple phases. Remember, this is by no means a magic formula to make you a Data Science expert overnight - we also don't provide any certifications ๐ . The intention of this guide is to provide the essential knowledge and skills to get you started in the Data Science field. And most importantly, to help you plan your journey in the right direction.
- One more thing, the field of Data Science is quite vast. The learning approach could vary based on what you want to become -- Data Engineer, Data Scientist, Research Scientist, ML Engineer, etc. In this guide, we will focus on Data Scientist designation while also touching some peripheral topics.
-
But before we get started, let's have a look at the different designations in the Data Science field and what they are all about.
- Data Engineer: Data Engineer handles data i.e. they are responsible for the data processing and data analysis.
- Data Scientist: Data Scientist handles model training i.e. they are responsible to leverage the data to generate predictive and actionable models.
- ML Engineer: ML Engineer deploys models i.e. they are responsible to take the model and perform accessible and scalable deployment.
Note
Research Scientist is another Data Science related designation that is usually more research and academics oriented, when compared to Data Scientist, which is more practical and industry oriented.
- Now, let's get started by dividing the herculean task of masting Data Science into three levels, namely L1, L2, and L3. L1 is the most basic level, and L3 is the most advanced level.
Note
Advancing from one level to another is subject to your own understanding and progress. There is no time related or checkpoint related explicit grading system that I will recommend. I believe, that if it seems you are not learning anything new by following the steps of a certain level, you are qualified enough to move on to the next level.
- "To learn is to read, understand and apply." -- with this in mind, each level is further divided into two subtasks of theory and practice. The idea is to have at least one action item for each subtask in each level. More details are below,
Levels | Theory | Practice |
---|---|---|
L1 | Beginner Online Courses | Course Assignments and Code replication |
L2 | Domain Specialization Courses | Projects and Competitions |
L3 | Research Papers and Books | Create Products and Publish Papers |
L1: the absolute beginners ๐ฆ
- This is the starting point of our journey. Here we want to get a basic understanding of the Data Science field and start getting our hands dirty with the basics. Let's try to go through the individual subtasks in detail.
- Theory: We will limit our scope to going through online courses that provide high level understanding. Here is a list of some free online courses or materials for you to go through.
- Practice: it can start with any assignments mentioned in the theory courses or materials. On top of it, we can also perform code replication i.e. to look into basic code snippets and try to replicate them. Pick any existing EDA script or ML code or any other code snippet and try to replicate it on your own. This provides the opportunity to perform guided practice as you can always refer back to the original code if you are stuck!
L2: the intermediate level ๐จ
- Now we have moved on to the intermediate level. Data Science in itself is a very vast field and includes multiple sub-domain like NLP, CV, RL, etc. By now, you should have some inclination towards one or more of these fields. It's time to get some domain-specific specializations. Let's go through the individual subtasks in detail.
- Theory: Based on different fields and generic topics, here is a list of introductory materials that you can refer to,
- Practice: in terms of practices, code assignment is mandatory. But another piece of advice is to participate in competitions and hackathons published on platforms like Kaggle. The plan could be to first participate in old competitions and get a feel of coding and solving problems. And then to participate in live competitions and compete with some of the best Data Science enthusiasts out there! Another thing that you can do is to create a simple AI/ML project from scratch, this will give you a much-needed practical experience. Here is an article I wrote on how to ideate and publish your projects. You can also refer to this Github repo to get some inspirations or AI project ideas.
Note
Please be aware that the intention of participation should be to learn. If you enter any competition without proper preparation and practice, you might not be able to get the best results. But that should never demotivate you. Your intention should be to learn, if it's that, I am sure you will come out of any competition with learning something new. Winning or Lossing is just the by-product.
L3: the to be experts ๐
- We have reached the final and an never-ending level . By now you have already mastered the basics of Data Science and even some advanced topics. But learning is a never-ending process, as the knowledge is not constant. Every day there is new research that takes the cumulative knowledge one step forward. This progress is quite fast in AI/ML field, so while learning all the existing stuff is quite important, it is necessary that you keep learning the new things and not get left behind! With this in mind, let's look into the individual subtasks in detail.
- Theory: here you should read about the latest trends in the field. The intention is to be up-to-date and even to be ahead of the curve. For this you can refer cited papers, trending blogs, and famous books. Lots of researchers publish their work on Arxiv which is free for all and with sites like Arxiv Sanity you can find the trending papers. Several top AI labs have their own website where they frequently publish their latest research like - Google AI, Meta AI, DeepMind AI, OpenAI, Uber AI, etc. Apart from this, people also follow influencers and groups to get the latest news. People even subscribe to AI newsletters for the same.
Note
If this sounds like a challenge, then you are not alone. It's quite a task to follow a lot of these sites and people to get the latest news in the field of AI and ML. That is why I created The ML Dojo. ML Dojo is a daily report on the latest and upcoming research, experiments, topics, articles, papers, โฆ (you name it), in the field of Artificial Intelligence and Machine learning. Give it a try, it's completly free
- Practice: instead of just reading research papers, the intention should be to find your topic of interest, research, and publish your own work. For this, you can even collaborate with your colleagues or friends. Writing a paper of your own is a challenging but educating process, as you get to ideate, experiment, and write (to defend) the work. It gives you the ability to look at a problem from multiple perspectives and then solve it! Another thing to do is to take the project to the next level -- by creating (and maybe open-sourcing) your own products. While projects were created for the purpose of learning and your own usage, products are created for the sake of other people's usage. This will teach you to look at the problem from someone else's perspective.