Wednesday, May 24, 2017

Skill-sets of Data Scientists/Analysts


Ten years ago, data scientists weren’t a career you heard much about (if they even existed), but the job markets follow the technology curve and new titles are born what seems like every day. A majority of those interested in the fields of data and market analytics are young, enthusiastic, having experience with technology and being only a handful of years removed from college. As so many of us try to carve out the trajectory for a fulfilling and prosperous career, it’s natural to wonder the skillset required to succeed as a data scientist or data analyst.

The article above from KDnuggets does a great job of outlining their ideas of what is needed to build a career in the data industry. Personally, I really like how the article combines both technical skills and personal attributes, as in reality, what it takes to excel in anything is truly a mix of both. I want to briefly touch upon some of the areas the article outlines to talk about why they are so important in the grand scheme of things.
The first point is education. Simple, but true; one needs to have training and the most formal training that everyone in the data industry have in common is a college degree, and as the article states, most have masters also. College is a time that we learn about the adult world and spending any less than four years learning about business, marketing, technology and how they all intertwine would be an injustice to the subject.

The second point is only touched upon quickly, but I want to say a bit more – an intellectual curiosity. I believe this is something that will allow someone to be great in any field. Someone with an intellectual curiosity will look for a career instead of a job, something they enjoy rather than collecting a paycheck. When you are curious about the field of your work, it will always be rewarding and you will strive to be better than satisfactory. Though this is a short statement of this article, but is something I feel isn’t spoken about enough in general.

Doman knowledge/business acumen is the third point, to which is something that connects with the first point is. As the outlined above, the field of data is one that brings together so many different things that it’s necessary to understand both domains and business as whole. If you want to make data useful, you need to understand how businesses are going to want to use it and what they are going to get out of it, so you need to be able to easily put yourself into their shoes.

Communication skills is the next point and I think it again proves how the field of data is a crossroads of so many areas; computer nerds are often characterized as antisocial and unable to speak to each other effectively, but data analysts and scientists cant be that way. The ability to communicate, whether it be through written form or vocal (which often ties into giving presentations), is extremely valuable.

I wont go into as much detail on the remaining points, but they are still crucial to one’s success. Such as, being able to map your career and goals (Google Analytics isn’t the only place for goals) are crucial to long-term success. The technology and companies that are prevalent in this industry will shift quickly, so it’s important that the days of getting hired by one company and staying with them until retirement may have ended.


The remaining skills are more technical – such as knowing how to code, understanding machine learning and data mining, understanding processing platforms, SQL and unstructured data. These are the learned skills that only an education and practice can give someone. To be frank, these are not fields that I am as well versed in, so feel free to read through the link and let me know what you think!

Spotify's Usage of Big Data

Spotify is another big name company that has hugely benefitted from the use of data. According to the attached article, Spotify users create over half a terabyte a day, which require them to have four data centers globally to house all of the information. Their use of Luigi, their python framework guides them through the majority of the data and creates a valuable user experience. Spotify has features for their users such as recommending songs based on their history and organizing the user’s preferences into playlists for them to enjoy (many of the same features that companies like Netflix use, just translated into the world of music rather than video).


Although Spotify is not the only company used to stream music, it has been part of the revolutionizing of its industry that relied on cd’s and tape players just about a decade ago. Now, by categorizing musicians, they are able to predict the types of music and specific artists one will want to hear. This has translated to over six million paid Spotify users and massive amounts of ad money (users that do not pay for the service will periodically listen to ads between songs). This just another example of how data helped shape an industry that turned digital.

https://datafloq.com/read/big-data-enabled-spotify-change-music-industry/391

Amazon Echo Collecting Big Data

https://www.technologyreview.com/s/603380/alexa-gives-amazon-a-powerful-data-advantage/

Amazon is known for many things, but small isn’t usually a word used to describe them. Recently, Amazon has been using a small device to collect massive amounts of data and that device is the Echo. Wirelessly connected and Bluetooth enabled, Echo can have a virtual assistant named Alexa that is willing to assist you in almost any way that a digital friend can – such as searching the web for the weather, the score to tonight’s game or if there will be traffic on the way to work this morning, all at your verbal request.

But the Echo isn’t just performing your requests, it’s also recording them; all of its actions are recorded, and through this process, Amazon is able to collect massive amounts of data. This item is unique because the range of its possible actions are so vast, that it can collect data of so many different things.


The attached article also goes beyond the normal limits of speech recognition and will only continue improving. As competitors such as Google launch similar hardware, Amazon, as it always does, will need to keep improving itself. The article even outlines the possibility that the Echo will soon be able to listen to two voices speaking to it simultaneously. It will be extremely interesting to watch what new types of devices companies use to collect data.

Tuesday, May 16, 2017

Facebook and Google Fighting Fake News

Many would consider today’s era as the age of information, which can be a dangerous double aged sword. Those under pressure to produce news and information often jeopardize integrity and have more clear biases, making it normal for articles to have less credibility and marring what truth really is. That being said Facebook and Google are teaming up to fight fake news, which makes sense as they are two of the largest sites that deal with traffic of news. According to the article below, Google will be releasing a fact check feature, which will allow them to spot check news immediately, which will be a valuable tool. Facebook will not directly interfere in ways such as taking down posts, but will likely call attention to posts that seem questionable.


Although nobody is in favor of fake news, it’s worth keeping in mind that fake news is still freedom of speech, making it legal. This may be a major reason that these mediums are not taking down the posts them deem fake, rather informing the viewer that the information in front of them is not reliable. As these types of trends continue, it will be interesting to follow trends, possible legislation and other companies stepping out to confront this issue.

http://bgr.com/2017/04/09/facebook-vs-google-anti-fake-news/

Big Data Companies Going Public? Do They Succeed?

Some companies are known for their use of big data. In 2017, innovation is at its forefront, and those that are willing to be creative and do something new are being handsomely rewarded. That being said, with business models that are often harder to understand than traditional ones, how do these businesses fare when reaching the public markets? The recent IPO of Snapchat has been a hot topic in the past month and hasn’t led Wall Street to a consensus on how well the company will do. That being said, there are some great examples of big data companies that excel and grow after going public and ones that become long-term underachievers.


It’s easy to compare Snapchat to Facebook as both were led by highly-educated youngsters that approached a college innovation passionately, but there’s nothing to tell us that it will replicate its success. Twitter is another example of big data social media that gains traction and usage, but has never made the financial growth of Facebook when a big data company like Amazon has become one of the biggest growth companies of all time. Though it may not be fair to compare all of these companies since they are vastly different in areas, they are all household names that have made the best of big data. If IPO’s of big data companies in the past tell us anything, we don’t know what to expect. As companies like Uber, Airbnb and others brace to possibly go public, they may want to hold their breath as they go.

http://money.cnn.com/2017/03/06/investing/snap-snapchat-unicorns-ipos/

Nate Silver on Big Data in the 2016 Presidential Election

It seemed that the world was stunned waking up the morning after the 2016 presidential election. In the wake of a drama-filled election season, statisticians all prematurely announced the race to be over in favor of Hillary Clinton. Just four years prior, Nate Silver was able to predict 49 of 50 states outcomes, allowing him to quite easily announce the winner, but four years later, his story was very different. Many tried to blame Silver, saying that his data, methods or beliefs were flawed, but among the least surprised was Nate Silver himself.

As the world of data grows so quickly, entering virtually every enterprise and business realm, it's so easy to get caught up in all of the stories that data can tell and all of the things it can predict, but that doesn't mean they're immune to flaws. Silver has established an empire with fivethirtyeight, using data to tell stories about areas such as politics, sports, current events and many more with his team. Again, Silver is the first to address the limitations of such reporting.

I want to focus on particular on a recent interview with Nate Silver regarding the polls leading up to the election. He has been asked many times about how he, and so many others, were wrong about the outcome. He showed insights to his deep understanding of not only what data tells us, but what we need to keep in mind when using it. For example, he states that his model, gave Donald Trump a 30% chance to win, which is not an unlikely outcome. The favorite does not always win, rather his prediction should have been used to predict the fact that the race would be competitive, which it was.

In later questions, he also talked about stability (or lack of) in data. Throughout election season, projected percentages of votes swayed extremely often, particularly as certain news was released by the media. This variance shows how quickly these predictions can shift. A second thing that needs to be kept in mind about predictions elections is that not only are you predicting the way that people will vote but also the number of people that will vote. Obviously, as more variables are added (here is two but there are many, many more to be taken into account), the more challenging it becomes to feel confident that any projection can err. Lastly, this projection was one that attempted to predict human behavior which holds an inherent problem; humans don’t always do the rational thing, they don’t always do what they say they will and they don’t always do the thing that people think they will. These ideas and so, so many more are ones that need to keep in mind beyond just what the numbers say at their conclusion (their projected election winner here).


Silver also talks about the problem that arises when data turns into reporting, as different ways of analyzing and interpreting data can provide contradictory conclusions. It’s no secret that people take reports seriously, and when there is data backing them, their public credibility only becomes stronger. In logic, there people are careful not to find premises to support their conclusion, rather have a conclusion that is supported by their premises and this is no different; the story told from your data should be the story that stands out, not the one that you want to tell and can find evidence within your data to support. Silver also states that in these situations is often better to not tell a story at all, rather than giving a misrepresentation. Though he is more than qualified to provide his educated opinion, it isn’t always an opinion that is shared. In the age of fake news, integrity is not always at the forefront of people’s decision-making. Regardless, big data will be a huge part of politics moving forward and it will be incredibly interesting to see the differences come the 2020 election.

http://data-informed.com/nate-silver-big-data-has-peaked-and-thats-a-good-thing/

Sunday, May 7, 2017

Big Data as a YouTuber

The days of careers needing to be dry and boring are nearly all gone. What started as a silly place to post videos, YouTube quickly grew into a mecca of online traffic generation. Further, countless people have made careers on YouTube by showing their ability to gain viewership and developing it into a business, finding ways to enhance their videos and providing a full experience. YouTube also provides a number of metrics, all uses of Big Data to their users in order to gain a deeper understanding of their channel.


Being a career YouTuber is by no means a 40 hour workweek career and is definitely something on the entrepreneurial side – to put it simply, you get out of it as much as you put into it. The attached article mentions how YouTubers feel that they never have time off due to creating rigorous filming and editing schedules. Of course, beyond the physical hours being spent creating content, it can be draining on the creative end as well; trying to come up with new ideas and ways to improve a channel constantly can wear someone down. To some, having a platform that touches so many people, likely discussing content you are passionate about is an incredible job to have, but that doesn’t make it an easy one - you will always have the data YouTube offers you, but finding new ways to interpret it and use that to execute new things will always prove to be a rewarding challenge.

Big Data in Chess: CAPS Score to Evaluate Players

I have previously written a post about big data in relative value of pieces in the game of Chess. Chess is an art and a science that computers have been involved in for decades, but people continue to question their objective capabilities as some game-type situations pose troubles. Of course, these are all things within the game, but smart people in high places have been able to use Chess data a step removed to evaluate a player’s strength all using data.


This past year, Chess.com, the world’s biggest Chess website hosted The Grandmaster Blitz Battles, pinning some of the most talented Grandmasters against eachother in fast games, giving the fans something to tune in for. The site posted this article prior to the final match, in which Carlsen and Nakamura, two of the fan favorites, would duke it out for the title. The article uses CAPS, or Computer Accuracy and Precise Score to score a player, which is completely different from their regular player rating. For example, a regular rating only takes into account the outcome of a players games and their current rating, so win, loss or draw and whether the opponent was higher or lower rated (You can probably tell this isn’t the deepest way to look into something). Though the CAPS is not an official rating, it is used to analyze everything about a player’s performance, move by move. Finally, the data on these players is used to draw conclusions and predictions prior to their epic match. Read through the article and let me know what you think!

Big Data in Sports: The NFL Draft

A few weeks ago, I wrote a short post about how sports use big data and wanted to follow-up with a longer article. Sports are so unique due to the fact that they combine two dominant realms for data: games and business, which can also be looked into as entertainment and money. The game side of this touches on everything that takes place outside of the sidelines, such as personnel decisions, the way gameplans are created, the ways practices are structured, etc., but also obviously the way things are handled within games too. The business side of things often leads into having a better understanding of a team’s fanbase, allowing them to get to know their followers better and provide a better experience for them (and making the team more money at the end of the day). This is all a topic I plan to discuss further in other articles, but here I want to focus on a recent event, the NFL Draft, and how Big Data is used in decision-making for this event.

To provide basic-level context, each year the NFL has a draft for all eligible college football players to be selected by one of the 32 teams in hopes to have their pro football dreams come true (most professional sports do this in some form). This event becomes highly competitive, as the worst teams get the earliest picks, who are the more highly touted prospects. Teams have a chance to fill their positional needs but also take into account other things such as their opponent’s moves, etc. Lastly, teams have the option to trade their picks, which puts a twist on the value scale of each selection. For this reason, teams take all their options into account, starting with the first overall pick, being careful not to tip their hand to their competitors.

The basic idea of data-backed decision-making generally boils down to simplifying many things into numbers and working from there. The NFL draft is no exception – for years, the value of picks has been studied and charts such as this one have been created to define the relative value of each pick. As you can see, the value quickly declines as the picks progress, especially starting at the top of the first round – so making the decision to trade up for picks becomes a huge risk because you need to give up significantly more to only move up a few spots. This risk is even larger than just what you give up, because nothing is promised with draft selections. Essentially, scouts get paid good money to review pro prospects and evaluate how successful they will be based on countless factors (size, speed, game IQ, in-game stats, etc.), but it’s extremely frequent that selections don’t turn out of expected. That being said, is the idea of the pick worth more than the pick itself? Is there a way to make player selection a more sound process? It shouldn’t be a surprise that this is where Big Data comes in.
Just about a year ago an article gained a lot of traction after the Minnesota Vikings hired a front-office position to be a strategist despite his lack of football history (available here). Showing that there is more brains than brawn required to succeed in today’s sports world. He used countless variables and statistics to evaluate each player available and was able to draft an extremely successful class that year. By breaking down football into numbers in ways others weren’t always able to, he was able to find success.


Fastforward to very recently, the 2017 NFL draft. Every year questionable decisions are made in the draft, but the Chicago Bears traded up to the number two selection and selected an unproven quarterback. Though this a position of need for them, many, including pro scouts and their own fanbase were not thrilled with the decision. Only time will tell is this was a bad decision, but without a doubt it was a risky one. It did not seem like a decision that was soundly backed by data and makes us wonder if some teams resist data informed trends for the eye test.

Wednesday, May 3, 2017

Big Data in Healthcare

Healthcare will always be a pressing issue in our political world. Whether its figuring out the best structure for paying for it, to regulating drug costs with privately owned companies, there are many things that need to be taken into consideration for this topic. The McKinsey article above breaks down how big data has already started to revolutionize the healthcare industry. Healthcare expenses currently account for roughly 17% of our country’s GDP, which is a staggering figure.
This all leads into the breakdown of how doctors make decisions with patients, as far as when to prescribe medicine and what to prescribe. When individual data is collected and compared to large amounts of clinical data, doctors are able to swiftly make educated decisions to help a patient.

As previously mentioned, there are a number of different factors that are taken into account when investigating the healthcare field and the article outlines a number of these things, such as way of living, type of care, best provider, right value, and right innovation. So many things are presented to us in all of these different fields, but it’s extremely hard to make a good, informed decision on our own. Despite the capabilities of big data, I believe that there will always be human challenges associated with the issue, no matter how much data we are able to collect.

http://www.mckinsey.com/industries/healthcare-systems-and-services/our-insights/the-big-data-revolution-in-us-health-care

Big Data in Education


The days of school teachers recording grades with pen and paper are long over. The article referenced above outlines how the Department of Education has invested over 200 million dollars into using big data to our educational systems. This can range from anything to testing and grades to student behavior to career development.
The educational system and the investment we make into it will always be a hot button political issue in our country. At the end of the day, if we are able to enhance student results and intelligence with big data, it will create a smarter society overall. The investment that it takes to create things like effective software and algorithms are nothing compared to the profound overall effect that they can have on the future of our country.

Further, the article talks about how the data can create a customized learning experience for each student. Although schools will never be able to afford a personal teacher for each student, the programs can provide a similar service seeing that they are able to cater to each child’s learning needs almost instantly. This can go further and improve things like dropout rates while increasing test scores. Big data is the wave of the future, so there’s no reason that our country should not be making a tremendous investment into applying it to our education systems.

Big Data in Customer Service

https://www.fastcodesign.com/1669551/how-companies-like-amazon-use-big-data-to-make-you-love-them

Most of the successful tech companies today leverage big data in order to control every aspect of the user experience. Here, I want to outline everyone’s favorite online retailer that can deliver home goodies to your doorstep with just the click of a button. Amazon has had one of the most successful stocks in the past decade and it’s no surprise seeing how they have built an empire that will be nearly impossible to dethrone. The article at the top of the page describes one user’s experience with Amazon’s customer service, which was incredibly smooth and quick. We have all had a customer service experience that kept us on hold for an hour, just to be shuffled around departments continually. Instead of having this experience, Amazon used aggregated personal data on the writers account to properly assist him quickly. This is an excellent example of how forward-thinking companies set their own standard about how things should be done in an industry rather than striving for a satisfactory job. Amazon is well aware that most consumers are conditioned to expect terrible things from a customer service department, so providing them with a great experience can secure future revenue in itself.
The article also outlines how big data is one of the most talked-about topics in addition to the growing capabilities for relationships between businesses and consumers – there is an obvious link between these two subjects. As a company who is making trends rather than following them, Amazon is finding new ways to use big data to do new things from a business standpoint. Although it is a challenge to collect so much data, then organize it in such a way that a small department like customer service can use to effectively to make a call quick and easy takes true dedication. It’s no secret that the reason so many customer service departments perform so poorly is that they are marginalized within the framework of their respective organization. Even as Amazon has grown tremendously, to a company valued nearly a half a trillion dollars, they still stay fundamentally sound, which is demonstrated here.
In addition, the article outlines three of the writer’s rules for how to properly use personal data of consumers effectively. These rules include giving employees the right tools to use it, letting the customer know that you know and then listen to them, and give the customer a sense of control. Though these may seem like simple pointers, they are three keys to making a smooth transition between amassing large amounts of data and using that data to improve your business. The first step is the most straightforward – so frequently data is collected from every different direction, when in reality it is not ever meaningfully used, which makes it virtually pointless. This step connects the dots and allows your customer service representatives to executive in delivering a quality experience for the customer.
Next, letting the customer know that you know and then listening to them can be helpful in a handful of different ways. It combines efficiency, by letting the customer know that you know what the issue is and that you are able to deal with them (which can in turn save the time of them telling a full story of the issue) with having a human element of then listening to them after letting them know you can help. Even if the consumer is aware that you can help, venting is still a very real aspect of customer service.

Lastly, giving the customer a sense of control goes further with the human element. At this point, it is likely established that a solution is very possible, but the more control a consumer feels, the more positive their experience will be. Overall these three steps mix business effectiveness with connecting on a human level to show how businesses like Amazon are so successful in areas such as customer service.