Wednesday, May 24, 2017

Skill-sets of Data Scientists/Analysts


Ten years ago, data scientists weren’t a career you heard much about (if they even existed), but the job markets follow the technology curve and new titles are born what seems like every day. A majority of those interested in the fields of data and market analytics are young, enthusiastic, having experience with technology and being only a handful of years removed from college. As so many of us try to carve out the trajectory for a fulfilling and prosperous career, it’s natural to wonder the skillset required to succeed as a data scientist or data analyst.

The article above from KDnuggets does a great job of outlining their ideas of what is needed to build a career in the data industry. Personally, I really like how the article combines both technical skills and personal attributes, as in reality, what it takes to excel in anything is truly a mix of both. I want to briefly touch upon some of the areas the article outlines to talk about why they are so important in the grand scheme of things.
The first point is education. Simple, but true; one needs to have training and the most formal training that everyone in the data industry have in common is a college degree, and as the article states, most have masters also. College is a time that we learn about the adult world and spending any less than four years learning about business, marketing, technology and how they all intertwine would be an injustice to the subject.

The second point is only touched upon quickly, but I want to say a bit more – an intellectual curiosity. I believe this is something that will allow someone to be great in any field. Someone with an intellectual curiosity will look for a career instead of a job, something they enjoy rather than collecting a paycheck. When you are curious about the field of your work, it will always be rewarding and you will strive to be better than satisfactory. Though this is a short statement of this article, but is something I feel isn’t spoken about enough in general.

Doman knowledge/business acumen is the third point, to which is something that connects with the first point is. As the outlined above, the field of data is one that brings together so many different things that it’s necessary to understand both domains and business as whole. If you want to make data useful, you need to understand how businesses are going to want to use it and what they are going to get out of it, so you need to be able to easily put yourself into their shoes.

Communication skills is the next point and I think it again proves how the field of data is a crossroads of so many areas; computer nerds are often characterized as antisocial and unable to speak to each other effectively, but data analysts and scientists cant be that way. The ability to communicate, whether it be through written form or vocal (which often ties into giving presentations), is extremely valuable.

I wont go into as much detail on the remaining points, but they are still crucial to one’s success. Such as, being able to map your career and goals (Google Analytics isn’t the only place for goals) are crucial to long-term success. The technology and companies that are prevalent in this industry will shift quickly, so it’s important that the days of getting hired by one company and staying with them until retirement may have ended.


The remaining skills are more technical – such as knowing how to code, understanding machine learning and data mining, understanding processing platforms, SQL and unstructured data. These are the learned skills that only an education and practice can give someone. To be frank, these are not fields that I am as well versed in, so feel free to read through the link and let me know what you think!

Spotify's Usage of Big Data

Spotify is another big name company that has hugely benefitted from the use of data. According to the attached article, Spotify users create over half a terabyte a day, which require them to have four data centers globally to house all of the information. Their use of Luigi, their python framework guides them through the majority of the data and creates a valuable user experience. Spotify has features for their users such as recommending songs based on their history and organizing the user’s preferences into playlists for them to enjoy (many of the same features that companies like Netflix use, just translated into the world of music rather than video).


Although Spotify is not the only company used to stream music, it has been part of the revolutionizing of its industry that relied on cd’s and tape players just about a decade ago. Now, by categorizing musicians, they are able to predict the types of music and specific artists one will want to hear. This has translated to over six million paid Spotify users and massive amounts of ad money (users that do not pay for the service will periodically listen to ads between songs). This just another example of how data helped shape an industry that turned digital.

https://datafloq.com/read/big-data-enabled-spotify-change-music-industry/391

Amazon Echo Collecting Big Data

https://www.technologyreview.com/s/603380/alexa-gives-amazon-a-powerful-data-advantage/

Amazon is known for many things, but small isn’t usually a word used to describe them. Recently, Amazon has been using a small device to collect massive amounts of data and that device is the Echo. Wirelessly connected and Bluetooth enabled, Echo can have a virtual assistant named Alexa that is willing to assist you in almost any way that a digital friend can – such as searching the web for the weather, the score to tonight’s game or if there will be traffic on the way to work this morning, all at your verbal request.

But the Echo isn’t just performing your requests, it’s also recording them; all of its actions are recorded, and through this process, Amazon is able to collect massive amounts of data. This item is unique because the range of its possible actions are so vast, that it can collect data of so many different things.


The attached article also goes beyond the normal limits of speech recognition and will only continue improving. As competitors such as Google launch similar hardware, Amazon, as it always does, will need to keep improving itself. The article even outlines the possibility that the Echo will soon be able to listen to two voices speaking to it simultaneously. It will be extremely interesting to watch what new types of devices companies use to collect data.

Tuesday, May 16, 2017

Facebook and Google Fighting Fake News

Many would consider today’s era as the age of information, which can be a dangerous double aged sword. Those under pressure to produce news and information often jeopardize integrity and have more clear biases, making it normal for articles to have less credibility and marring what truth really is. That being said Facebook and Google are teaming up to fight fake news, which makes sense as they are two of the largest sites that deal with traffic of news. According to the article below, Google will be releasing a fact check feature, which will allow them to spot check news immediately, which will be a valuable tool. Facebook will not directly interfere in ways such as taking down posts, but will likely call attention to posts that seem questionable.


Although nobody is in favor of fake news, it’s worth keeping in mind that fake news is still freedom of speech, making it legal. This may be a major reason that these mediums are not taking down the posts them deem fake, rather informing the viewer that the information in front of them is not reliable. As these types of trends continue, it will be interesting to follow trends, possible legislation and other companies stepping out to confront this issue.

http://bgr.com/2017/04/09/facebook-vs-google-anti-fake-news/

Big Data Companies Going Public? Do They Succeed?

Some companies are known for their use of big data. In 2017, innovation is at its forefront, and those that are willing to be creative and do something new are being handsomely rewarded. That being said, with business models that are often harder to understand than traditional ones, how do these businesses fare when reaching the public markets? The recent IPO of Snapchat has been a hot topic in the past month and hasn’t led Wall Street to a consensus on how well the company will do. That being said, there are some great examples of big data companies that excel and grow after going public and ones that become long-term underachievers.


It’s easy to compare Snapchat to Facebook as both were led by highly-educated youngsters that approached a college innovation passionately, but there’s nothing to tell us that it will replicate its success. Twitter is another example of big data social media that gains traction and usage, but has never made the financial growth of Facebook when a big data company like Amazon has become one of the biggest growth companies of all time. Though it may not be fair to compare all of these companies since they are vastly different in areas, they are all household names that have made the best of big data. If IPO’s of big data companies in the past tell us anything, we don’t know what to expect. As companies like Uber, Airbnb and others brace to possibly go public, they may want to hold their breath as they go.

http://money.cnn.com/2017/03/06/investing/snap-snapchat-unicorns-ipos/

Nate Silver on Big Data in the 2016 Presidential Election

It seemed that the world was stunned waking up the morning after the 2016 presidential election. In the wake of a drama-filled election season, statisticians all prematurely announced the race to be over in favor of Hillary Clinton. Just four years prior, Nate Silver was able to predict 49 of 50 states outcomes, allowing him to quite easily announce the winner, but four years later, his story was very different. Many tried to blame Silver, saying that his data, methods or beliefs were flawed, but among the least surprised was Nate Silver himself.

As the world of data grows so quickly, entering virtually every enterprise and business realm, it's so easy to get caught up in all of the stories that data can tell and all of the things it can predict, but that doesn't mean they're immune to flaws. Silver has established an empire with fivethirtyeight, using data to tell stories about areas such as politics, sports, current events and many more with his team. Again, Silver is the first to address the limitations of such reporting.

I want to focus on particular on a recent interview with Nate Silver regarding the polls leading up to the election. He has been asked many times about how he, and so many others, were wrong about the outcome. He showed insights to his deep understanding of not only what data tells us, but what we need to keep in mind when using it. For example, he states that his model, gave Donald Trump a 30% chance to win, which is not an unlikely outcome. The favorite does not always win, rather his prediction should have been used to predict the fact that the race would be competitive, which it was.

In later questions, he also talked about stability (or lack of) in data. Throughout election season, projected percentages of votes swayed extremely often, particularly as certain news was released by the media. This variance shows how quickly these predictions can shift. A second thing that needs to be kept in mind about predictions elections is that not only are you predicting the way that people will vote but also the number of people that will vote. Obviously, as more variables are added (here is two but there are many, many more to be taken into account), the more challenging it becomes to feel confident that any projection can err. Lastly, this projection was one that attempted to predict human behavior which holds an inherent problem; humans don’t always do the rational thing, they don’t always do what they say they will and they don’t always do the thing that people think they will. These ideas and so, so many more are ones that need to keep in mind beyond just what the numbers say at their conclusion (their projected election winner here).


Silver also talks about the problem that arises when data turns into reporting, as different ways of analyzing and interpreting data can provide contradictory conclusions. It’s no secret that people take reports seriously, and when there is data backing them, their public credibility only becomes stronger. In logic, there people are careful not to find premises to support their conclusion, rather have a conclusion that is supported by their premises and this is no different; the story told from your data should be the story that stands out, not the one that you want to tell and can find evidence within your data to support. Silver also states that in these situations is often better to not tell a story at all, rather than giving a misrepresentation. Though he is more than qualified to provide his educated opinion, it isn’t always an opinion that is shared. In the age of fake news, integrity is not always at the forefront of people’s decision-making. Regardless, big data will be a huge part of politics moving forward and it will be incredibly interesting to see the differences come the 2020 election.

http://data-informed.com/nate-silver-big-data-has-peaked-and-thats-a-good-thing/

Sunday, May 7, 2017

Big Data as a YouTuber

The days of careers needing to be dry and boring are nearly all gone. What started as a silly place to post videos, YouTube quickly grew into a mecca of online traffic generation. Further, countless people have made careers on YouTube by showing their ability to gain viewership and developing it into a business, finding ways to enhance their videos and providing a full experience. YouTube also provides a number of metrics, all uses of Big Data to their users in order to gain a deeper understanding of their channel.


Being a career YouTuber is by no means a 40 hour workweek career and is definitely something on the entrepreneurial side – to put it simply, you get out of it as much as you put into it. The attached article mentions how YouTubers feel that they never have time off due to creating rigorous filming and editing schedules. Of course, beyond the physical hours being spent creating content, it can be draining on the creative end as well; trying to come up with new ideas and ways to improve a channel constantly can wear someone down. To some, having a platform that touches so many people, likely discussing content you are passionate about is an incredible job to have, but that doesn’t make it an easy one - you will always have the data YouTube offers you, but finding new ways to interpret it and use that to execute new things will always prove to be a rewarding challenge.