Facebook has open-sourced one of Instagram’s secret tools for finding and fixing bugs in the app’s vast Python codebase.
Called ‘Pysa’, it’s a security-focused analysis tool Facebook built that came after the success of Zoncolan, a tool that helps Facebook scan the more than 100 million lines of Hack code for potential security issues.
Pysa that is the acronym for ‘Python Static Analyzer’, was built on top of Facebook’s existing type checker for Python called ‘Pyre’.
It is proven useful to analyze data flows for many security and privacy issues.
How data flows through a program’s code is very important. Since most modern security exploits take advantage of unfiltered or uncontrolled data flows, Pysa can be a versatile tool to find those invisible bugs.
According to Facebook on its post:
“Pysa helps us detect a wide range of issues. For example, we use it to check whether our Python code properly makes use of certain internal frameworks, which are designed to prevent access to, or disclosure of, user data based on technical privacy policies. Pysa also detects common web app security issues, like XSS and SQL injection.”
Pysa tracks flows of data through a program, after users define the ‘sources’ (places where important data originates) as well as the ‘sinks’ (places where the data from the source shouldn’t end up).
Pysa will then run on places where user-controlled data enters applications.
After that, the tool will then perform iterative rounds of analysis to build summaries to determine which functions return data from a source and which functions have parameters that eventually reach a sink.
It does this by scanning the code in a “static” form, before the code is run/compiled.
Pysa that is a so-called static analyzer, works by looking for known patterns that may indicate a bug. If Pysa finds that a source eventually connects to a sink, it will report it as an issue.
Visualizing this process creates a tree, with the issue at the apex and sources and sinks at the leaves.
Pysa was developed internally. With constant refinement, the tool has reached maturity. Facebook said that the tool was able to detect 44% of all security bugs in Instagram’s server-side Python code, in the first half of 2020 alone.
This concept isn’t new for Facebook.
“Like Zoncolan has done for Hack code, Pysa has helped us scale our application security efforts for Python, most notably the codebase that powers Instagram’s servers,” said Facebook.
Pysa’s main objective, is to be able to scan large codebase very quickly.
According to Facebook security engineer Graham Bleaney, Pysa’s ability to find security issues wouldn’t be that useful if it took days to scan Instagram’s entire codebase.
This is why Pysa was was built for speed.
It needs to run through million so flines of codes fron anywhere between 30 minutes to a few hours. This allows Pysa to find bugs in near real-time and allows developers teams feel safe about integrating the tool in their regular workflows and routines without having to fear that using it might delay shipping their code or not hitting hard deadlines.
What’s more, Pysa is also extendable.
Instagram that mostly runs on Python, was never developed as a cohesive unit from the get-go. Just like most other platforms, Instagram’s codebase was put together and improved as the social media grew. What this means, Instagram’s codebase includes a lot of Python frameworks and libraries for different Instagram components and features.
Because of this, Pysa was built under a plug-and-play model, meaning that the tool can be also extended to adapt to new frameworks on the fly.
“Because we use open source Python server frameworks such as Django and Tornado for our own products, Pysa can start finding security issues in projects using these frameworks from the first run,” Bleaney said.
“Using Pysa for frameworks we don’t already have coverage for is generally as simple as adding a few lines of configuration to tell Pysa where data enters the server.”
While the tool is versatile, it does have some limitations.
For example, Facebook said that there is no way to build a perfect static analyzer.
“Pysa has limitations based on its choice to address security issues related to data flow, together with design decisions that trade off performance for precision and accuracy. Python, as a dynamic language, has unique features that underlie some of those design decisions.”
Pysa is also built to discover only data flow and related security issues. What this means, it cannot catch all security or privacy issues.
Pysa is also not a good tool for authorization checking.
Facebook has formally open-sourced Pysa on GitHub, along with several definitions required to help it find security issues.