My Guide on Data Science ProgramsThu 01 December 2016
I recently got accepted to the Udacity Artificial Intelligence Nanodegree program. While I am planning to write about that when it starts, now is a good time to talk about the experience I've had with previous Data Science programs.
As a beginner it's hard to figure out what's worthwhile and what's snake oil, and these days there seems to be hundreds of books, MOOCs, bootcamps and traditional degrees that are vying for your money and time. For those doing research I hope my experience gives you more information to make an informed decision.
In the past five years I've enrolled in the Thinkful Data Science course, the Coursera Data Science specialization, the Udacity Data Analyst Nanodegree, and was accepted to the Berkeley Masters in Data Science masters degree program. Between these programs I've spent thousands of dollars and hours trying to improve my skills with these programs. Not all of this time was well spent and I bet I could have spent about a third less to get where I am now.
The details are in the About Me but essentially I do data science work for a living. The programs above actually did help me transition into the career I wanted, but as mentioned I could have made better decisions.
The six categories I've been exposed to are
- Self Learning from Internet
- Coursera Data Science Courses and Specialization
- Udacity Data Analyst Nanodegree
- Berkeley Masters in Data Science
- A PhD
Here's my thoughts on each one
This is certainly the easiest, cheapest, and most flexible. The hard part though is piecing together an education that is cohesive and increases in difficulty at an appropriate rate. You'll have to spend a lot of time both figuring out what you want to learn, finding good resources on that topic and actually learning it. Figuring out what you want to learn might seem easy until you realize how many buzzwords now exist in the data science space. Do you want to learn about Machine Learning or reccomender systems? What about Deep Learning, or how about Convolutional Neural Networks. What about Theano, TensorFlow, or Caffe? When googling by yourself it takes a lot of effort to figure out what topics are relevant and which ones are buzzwords.
Sources in this category that I recommend are Kaggle and articles that appear on HackerNews and DataTau but there are literally thousands. You can also take the free courses from Coursera and Udacity. I'll be talking more about their paid programs below.
Similar to Internet learning you'll have to first find books that are appropriate to the topic you want to learn. When you do find an appropriate book however the good news is that it likely covers the topic into much more detail than a series of blog posts does. However the downside is that it's hard to get feedback from a book and if you're left with questions you have to revert to the Self Learn from the Internet method.
An example of a book that I found to be informative includes Elements of Statistical Learning. Again like the Internet category there are hundreds, if not thousands, of Data Science related books in every category and I could not possibly list them all.
If you've read this far you must really be interested in learning Data Science. If so that's good because everything from here down requires more commitment and money.
Coursera has two ways you can learn. One is taking free courses, but the more notable is paying for their specializations. For the money Coursera's Specializations give you two things that Books and Self Learning do not. One is that they guide you through a series of Data Science topics over the span of a couple of months, reducing the need for you to piece together your own curriculum. The other, and much more important, is they give you a community and feedback on your assignments. The specializations require you to pass quizzes and finish small two week projects that are peer reviewed. This is great because getting feedback from others is in my opinion the best way to learn.
However the downside is that the reviewers are usually your peers, and they unfortunately they don't know much more than you typically. Another downside is their motive for feedback is usually just so they can pass the course themselves, giving feedback is a requirement for passing. Therefore I personally found the feedback to be pretty weak. I made it through 8 out of 10 of the courses in the Specialization before I stopped participating. The cost of the Coursera Data Science program was a fixed $500 dollars.
Udacity, like Coursera, has both free and guided courses. However the major difference I found was that Udacity gives you much more access to more experienced mentors. The Udacity projects are much more substantive, some taking more than a month to complete, and during the process you get continuous feedback from those more experienced mentors. The feedback was critical and forced me to really learn and execute concepts correctly. It was much different than Coursera, from which i felt other peers would just rubber stamp my work so they could pass. The cost of the program was $200 dollars a month for each month enrolled in the program with the program ending when you finished all seven projects with passing reviews. If you finished within 12 months Udacity refunded half the money, a clever way to keep you motivated.
A potential downside of the Udacity program was that was self paced, which is great if you have motivation, but bad when you didn't.
The Master Degree programs are some of the newest kids on the block in terms of Data Science accreditations. The allure is that they combine traditional education with newer data science topics. I enrolled in the Berkeley program because I felt that I would get a comprehensive education with access to experts in the field, along with the Berkeley accreditation. After all a number of the leading experts who pioneered the data methods that are commonplace now graduated from Berkeley, Stanford, and other high pedigree schools.
However the programs are expensive, REALLY expensive. The cost for Berkeley's program came out to $60,000 dollars, not including additional costs like applications fees, travel fees to the campus if you have to visit, books etc. And the barrier to entry is high, you'll need to have a bachelors already and complete a Masters degree application.
Although I was accepted and attended roughly a month's worth of the lectures I ultimately ended up dropping out because I felt the price to value ratio was lot, especially since I already had spent tens of thousands of dollars on two other bachelors and masters degrees. The program itself was well thought out and well put together, but given the much cheaper alternatives above I couldn't justify staying in the program after I learned more about it.
The last and hardest path is to get a PhD in a field related to data science. After all nearly all the people that have forged the way in Data methods have done so through research and doctorate level work. I myself was enrolled in a PhD program but for those considering this option you already know it's a very heavy investment requiring at least half a decades worth of work. You're going to have to read a lot more than this blog post, and do a lot of soul search, before deciding if this option is really what you want to do.
There are Data Science Bootcamps run by companies like Metis. These typically involve studying on site in places like New York through an intensive 12 week program. The cost seems to be in the multi thousands. For the money though you get to build a network and get coaching in person.
I never applied for this option because of the opportunity cost, I would have had to quit a paying job, or quit a degreed program I was already in, to participate in these. But if you have the time and money these may be a good option for you.
If you can find someone that will mentor you at your job that is by far the best option. Nothing beats real world experience in learning how to actually do the job. You may need to refer to some of the learning above to get the fundamentals but experience is a great teacher.
If you're just starting out I suggest you start with an Udacity Nanodegree. I think it has the best balance of low cost, but with expert mentorship and just difficult enough projects to grow. After that you'll also have enough experience to hone in on useful internet and book references without getting fooled by marketing pitches. And since a Nanodegree only typically takes a year to complete, the time investment is pretty low.
I personally don't think the MIDS program is worth the cost. A traditional statistics or mathematics masters costs a third of the price at most universities and I believe it's much more valuable.
As for the PhD if you're reading this blog post as your first reference I'll say having a PhD is a pretty significant achievement, but it's not a light decision and you should really read more and think on if this is ultimately the right choice for you.
As always if you have any further questions feel free to send me a message on LinkedIn or Github.