New Study Finds and Fixes Flaws in Database-Backed Web Applications
Just beneath the surface of today’s internet, there’s a deep ocean of data. Many of the most popular websites for online discussion, maps, and shopping now sit on top of massive databases. Yet users still expect near instantaneous load times regardless of the complex queries running behind the scenes.
A new study from University of Chicago and University of Washington researchers provides assistance to web developers confronting these growing performance challenges. By detecting and analyzing hundreds of inefficiencies in how a common framework for web applications interacts with databases, the study found they could make sites run several times faster with just small changes to code.
Many developers today use Object Relational Mapping (ORM) frameworks such as Ruby on Rails, Django, or Hibernate to build web applications that run on databases. Instead of learning an additional language, such as SQL, to handle calls to the database where persistent data for the application is stored, developers can use the ORM framework within their favored language to translate those interactions. However, this abstraction can lead to costly mistakes.
“If you want to create a web application or you have an idea and you want to do it quickly, ORM is a kind of middleman that does this translation, that's why this is so popular,” said Shan Lu, associate professor at UChicago CS and co-author of the study. “The problem is, now you are no longer aware of all the operations that are going on, and these interactions with the database are usually the most expensive part.”
To look for misapplications of ORM that could slow performance, Lu’s graduate student Junwen Yang looked at a dozen of the most popular open source web applications using Ruby on Rails, the most popular ORM framework. The team, which also included Pranav Subramaniam of UChicago and Washington’s Cong Yan and Alvin Cheung, reviewed code and previously-reported bugs from Github for each project, and built synthetic databases for each application.
Their analysis examined more than 200 performance issues from past and current versions of the applications, which were then grouped into 9 “anti-patterns” of ORM misuse. Some of these stemmed from simple confusion over the appropriate API call, such as the choice between “any?” versus “exists?” when querying the database.
“The documentation describes them as doing the same thing, but the performance sometimes can be ten times different or more,” Lu said. “So it looks simple, but when you actually look at it, I understand why there are so many problems, because these options can be so deceiving for developers.”
“Sometimes this is due to abstraction,” said Cheung, assistant professor of computer science & engineering at Washington. “It's very hard for a library developer to anticipate all possible uses of their functions, so they might have used one version in their library implementation, only to find out later that programs that call their library function could run much faster if they implemented their function some other way.”
Other performances issues could be traced back to the design of the database, or even human decisions about the way content is displayed by the application, such as returning all results for a search instead of breaking results into pages. Even simple features that only require a few lines of code can create serious performance issues if the underlying database query is very complex.
Charitably, the project didn’t stop at identifying these ORM problems, but also provided fixes for the 64 new issues the research team discovered. On average, these fixes improved web page load time from over 4 seconds to less than a second; the most extreme upgrade loaded a problematic page 39 times faster than before. These fixes were submitted via Github to the web application teams, and in many cases, have already been implemented into the latest versions, Yang said.
The fixes also were merged into the research team’s Hyperloop project, an ongoing effort to discover and correct common ORM-related inefficiencies in both software and databases. Rails developers can now download a plug-in called PowerStation for their coding environment that highlights ORM misuse and offers suggestions on how to speed up application performance. Because of the popularity of Ruby on Rails in the web development community, the paper has already received widespread attention, including from the Morning Paper blog and Hacker News.
“These web applications are so popular, and to some extent reasonably well developed, so all of us were just amazed that it was so easy to find hundreds and hundreds of performance problems in them,” Lu said. “That was fascinating to me.”
Read more about the study and the Powerstation analyzer in a blog post from the University of Washington Database Group.