Getting to know the R programming language

The R programming language is an essential tool for data scientists. Its versatility at handling big data has given R a place in a variety of industries. Don't be intimidated by its academic origins, beginners can get the hang of it too, writes WorkingNation's Jaimie Stevens.
WorkingNation’s Jaimie Stevens.

This article was inspired by my younger sister, Patience Stevens, who is learning the R programming language as part of getting her Ph.D. in Psychology at Carnegie Mellon University.

R first appeared in 1993. It is a language and environment for statistical computing and graphics.

Out of the top 20 programming languages, R has shown the most consistent upwards movement over time. In April 2018, R ranked 12th overall in popularity, according to the TIOBE index.

It is named partly after the first letter of the first names of its creators, Ross Ihaka and Robert Gentleman, and also partly as a play on the name of the programming language S, of which R is an implementation.

R and its libraries are inclusive of a large variety of statistical and graphical techniques. It is effective for handling data and facilitating large amounts of storage.

Its analytical capabilities include linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, and clustering. It also is a foundational tool for machine learning.

R is commonly referred to as the “lingua franca”—or common language—of statistics. As technology improves, and the information that data companies and research institutions collect grows more complex, R has become the most prominent language to analyze data.

R has stronger object-oriented programming facilities than most statistical computing languages, which means it’s more organized around objects than actions and data rather than logic.

Data scientists commonly debate whether R is stronger than its strong alternative, Python. (You can read more about Python here.)

Where is R used?

R’s versatility means that it is used for a variety of functions suiting the big data demands of companies.

Leading social media companies use R programming language.
Image: Shutterstock

Social media sites use it to analyze user behavior. Tech giants use it to evaluate online advertising and e-commerce. Weather services use it for weather forecasts. Travel websites use it to suggest best hotels for its users. It is a fundamental tool for analytics-driven organizations, such as Google, Facebook, LinkedIn, and Twitter.

Facebook has said that when they use R for things from correlations with News Feed numbers to Facebook friends, R moves most quickly because there is no need to invest time developing new tools and writing code—the language and its packages enable their data scientists to just clean and explore the data.

RELATED STORY: Hone your coding skills by making practice perfect

On the academic end—my sister, for example—uses it to study reading and language comprehension in a developmental context.

Through the graphs and statistical tests that R facilitates, Patience can understand how children’s comprehension of passages about specific topics improves as they gain more knowledge about that topic, and model the development of comprehension abilities from linguistic and visual input through the simulation of fake language.

Since R programming applications range from hypothetical, computational statistics, and the hard sciences, it encompasses fields from chemistry, astronomy, and genomics to drug advancement, business, healthcare, marketing, medicine, manufacturing, sports, retail and supply chain.

What are the advantages of R?

The greatest advantages of R for Patience are that it is easy to learn and tidy to use, particularly for data analysis.

By “tidy to use,” she means that it has built-in structures for saving data in a format that can be easily manipulated and plotted while allowing the code to be easy to interpret for someone reading the code for the first time.

R also produces strong static graphics, which can produce production-quality graphs, including mathematical symbols. One of data analysts and data scientists favorite parts of R is its data visualization capabilities.

R is a vector language so that it can do many things at once. You can also run it without a compiler, unlike C or Java. It can directly interpret code into a program, which makes the development of the code easier.

The best part of R, of course, is that it is an open-source project and supported by an active developer community.

This allows for its extensive user-created packages, which enables further reporting tools, graphs, statistical techniques and import and export capabilities.

When someone creates a new predictive model or visualization, they publish it in open source that anyone can access and use. There are more than 400 R Meetup groups and 95,000+ members of LinkedIn’s R Group. It’s a happening language, with the support of leading statisticians.

What are the disadvantages of R?

The main problem Patience encounters with R is that it is pretty slow at doing loops. It has different forms of an “apply” function that users are encouraged to implement instead of loops wherever possible, which can be tedious.

The syntax is different, so it’s harder to move into R from other languages. It reads less like English and is considered more difficult for beginners to wrap their heads around.

R is slow overall—which can be problematic when looking at large datasets. It also takes up a lot of physical memory, requiring a hard drive and slowing down computation.

What kind of job can I get with R?

The demand for people who know R is growing, and it doesn’t show any sign of plateauing, with companies like Oracle and Microsoft reaching out to R coders.

As R provides support functions for data science applications and statistical analysis, it is an excellent option for data scientists and researchers, naturally. To truly master data science, you’ll also need to approach fields of study such as probability, statistics, data visualization, data manipulation and machine learning.

Though R was developed initially for academia, today is it increasingly used in other settings. R is used for jobs as diverse as finance, genetics, medicine, real estate, advertising, and biology. This is because R is a Turing-complete language, meaning any task can be programmed in R.

According to Glassdoor, the average salary of a data scientist is $120,931.

How do I get started with R?

My sister’s first recommendation to learning R was DataCamp’s free introduction to R tutorial and the follow-up course Intermediate R programming. These allow you to learn—for free—from home!

I also found these two alternatives online: edX’s Introduction to R Programming by Microsoft and the R Programming course by Johns Hopkins on Coursera.

If you benefit from reading, you could also check out Jared Lander’s R for Everyone, or R in Action by Robert Kabacoff. R-Bloggers is great for knowing what is happening in the R world.

There is some debate over whether R is a great first language for beginners, mainly because of its difficult syntax. It was developed by statisticians, so falling into that mind frame may be difficult for some.

Despite these limitations, R is a great alternative for coders pursuing data science and research. Though R pertains to specific career paths, the fundamental concepts it can be beneficial to understand no matter what industry you seek. It has a strong community, and it’s open source, so join up and take advantage of this learning opportunity.

This article is part of WorkingNation Associate Producer Jaimie Stevens’ “Starting Out in Tech” series where she shares her insight into becoming a computer programmer. Catch up on her previous articles here.

Join the Conversation: Share your thoughts on the latest Starting Out in Tech column on our Facebook page.