by Beyang Liu

Sourcegraph: Universal code search and intelligence

analysis
Jan 15, 202012 mins
Development ToolsSoftware Development

How open source Sourcegraph helps developers increase programming productivity and improve code quality

abstract arrows direction process magnifying glass search investigate
Credit: Getty Images

The amount of code in the world is exploding. As software becomes the fundamental driver of innovation in nearly every industry, software developers find themselves dealing with larger, more interdependent codebases. Most organizations set new records for the size of their codebase every day.

In this world, traditional developer tools such as editors and IDEs fall short. They were designed for individual developers working on individual pieces of code, rather than for software teams developing large codebases at scale. In modern software organizations, searching across massive codebases, comprehending unfamiliar code, and sharing institutional knowledge become first-order concerns. Software teams need a tool that enables this universal code intelligence.

Code search must be universal to be effective—it must encompass all languages, all repositories, all code hosts, and all configuration files. Search that is limited to only Python or only GitHub is like Google indexing only websites built with Ruby on Rails or Apache HTTP Server—a nonstarter for development teams that work in the modern universe of code.

Leading technology companies such as Uber, Lyft, and Yelp are using Sourcegraph to wrangle this universe of code. Companies like Google and Facebook have spent hundreds of millions of dollars to build internal tools similar to Sourcegraph. GitLab, the code hosting and devops company, recently announced a partnership with Sourcegraph to natively integrate some of Sourcegraph’s features into GitLab’s UI.

Top reasons to use Sourcegraph

Sourcegraph is a developer platform designed to tackle the problems faced by modern software teams. Sourcegraph addresses critical pain points felt by software engineers and engineering leaders.

For individual developers, here are the top reasons to use Sourcegraph:

  1. Stay in flow, avoid the death by a thousand context switches
  2. Find the needle in the codebase haystack
  3. Make code reviews fast, thorough, and less painful—no more TL;DR
  4. Learn by example instead of poor or non-existent documentation
  5. Make big refactors and code changes tractable
  6. Share and discuss code easily, especially with remote colleagues
  7. It’s open source

And here are the most common reasons engineering leaders introduce Sourcegraph to their organization:

  1. Boost the day-to-day productivity of the team
  2. Encourage knowledge sharing
  3. Drive organization-wide adoption of new tools
  4. Accelerate the onboarding of new engineers
  5. Reduce incident response time
  6. Maintain and spread code quality standards
  7. Build better internal developer tools with the code-as-data API
  8. It’s easy to deploy and scales with your team and codebase

Stay in flow

Programming productivity often dies a death by a thousand context switches. A familiar scenario is one where a developer is in the middle of implementing a feature or bug fix, but suddenly needs to jump into a different part of the codebase. Perhaps they need to look up a certain library function or figure out how to use it. Perhaps a colleague has a question about some other piece of code. Now, the developer has to open up those files in their IDE and, in doing so, destroy their current working state, which will have to be painfully recalled and reconstructed later.

These interruptions are destructive, because they take the developer out of Flow State, and the negative impact on productivity is significant. Sourcegraph’s browser-based code search and exploration interface lets a developer maintain their editor state while exploring other parts of the code. This preservation of working state makes context switches far less costly, letting individual developers get more done with less aggravation.

sourcegraph 01 Sourcegraph

Find the needles in the haystack

A common task in day-to-day software engineering is looking up a specific string or pattern in code. This could be an error message that’s showing up in production logs, an anti-pattern that should be removed, or simply some unique string that the developer associates with a particular point of interest in the source code.

sourcegraph 02 Sourcegraph

Finding these needles is often painful. IDEs have search capabilities, but the code in question may exist outside what the IDE has opened. Command line tools also don’t have access to code outside the local filesystem and can be cumbersome to use. Code hosts search only over the code they host and often that search is slow or low-quality. Code search must be universal to be effective.

With Sourcegraph, developers have code search that spans their entire universe of code, with full support for regular expressions and more advanced pattern matching like the Comby syntax. Sourcegraph’s search engine is optimized for source code, so it is incredibly fast. It was also designed from the ground up to scale to large codebases and organizations. Some organizations have hundreds of thousands of repositories, and Sourcegraph puts them all at every developer’s fingertips.

An expressive and powerful search syntax lets the user filter results by file, language, repository, and myriad other attributes. Sourcegraph is also aware of code semantics and allows searching directly for symbols.

sourcegraph 03 Sourcegraph

Learn by example

“How do I use this?” is a question developers ask dozens of times per day. More often than not, the best documentation is a usage example. Sourcegraph’s global find-references feature lets a developer look up usage examples across the universe of code, even if the ideal usage example exists in another repository. This is especially helpful in codebases that are old, unfamiliar, or poorly documented.

Make code reviews fast and thorough

A common quip about code review says that if you submit a 10-line changeset, you’ll get 10 comments, but if you submit a thousand-line changeset, you’ll get no comments—and an automatic approval.

Quality code reviews are often painful and slow, because traditional tools lack many essential features to help the reviewer quickly understand code changes. Sourcegraph adds IDE-like code navigation and tooltips to developers’ existing code review workflow.

Sourcegraph hover tooltips let the reviewer quickly peek at function definitions and documentation without having to pull down the changeset into a local IDE. Without leaving the code review interface, Sourcegraph lets you jump to a definition to more fully understand how a referenced piece of code works.

sourcegraph 04 Sourcegraph

Sourcegraph integrates these code navigation features directly into the UI of popular code review tools like GitHub Pull Requests, GitLab Merge Requests, and Phabricator, so the developer experience improves without any switching cost.

Better code reviews reduce bugs, uphold code quality standards, and increase the spread of institutional knowledge across the engineering organization.

sourcegraph 05 Sourcegraph
sourcegraph 05 Sourcegraph

Make big refactors tractable

As codebases grow, large-scale refactors become an unavoidable bottleneck to improving code quality and implementing new features. For example, the API of a shared library may need to be updated to support a new feature, but doing so may require updates to dozens or even hundreds of downstream dependents. The number of places in code that must change as a result of updating one shared dependency can easily balloon to thousands of points spread across different components owned by different teams.

Sourcegraph not only aids developers in understanding the impact of a refactor (by letting them search and discover all places a particular library function is used), it also provides an apparatus to execute the refactor and manage the campaign of changesets and code reviews. Sourcegraph Campaigns is the first tool of its kind accessible to all software enterprises. Like Sourcegraph code search, Campaigns supports the new Comby pattern matching syntax, which is more user-friendly and expressive than regular expressions.

sourcegraph 06 Sourcegraph

Encourage knowledge sharing across your organization

Modern software teams collaborate to share institutional knowledge of code. But discussing code is often hard for these reasons:

  • You can’t share hyperlinks to files you have open in your IDE
  • Traditional browser code viewing tools don’t have good code navigation

Sourcegraph offers the best of both worlds: precise and accurate code navigation in a web interface. This makes it easy both to share links and for the recipient to start exploring and understanding the linked-to code immediately, without the hassle and friction of pulling it up in a local IDE.

sourcegraph 07 Sourcegraph

Code link sharing becomes even more important for remote engineering teams. Sourcegraph links are shared hundreds of times per day over chat, on issue trackers, and in official documentation and wikis. These become essential conduits of knowledge especially when it is impossible to call over a colleague to one’s desk.

It’s open source

Sourcegraph is open source software. The issue tracker is public and the team is very responsive to bug reports and feature requests. Modern software developers should favor open tools for the same reasons they favor open source libraries: foundational knowledge upon which your software and team are built should be open to all, so that all may understand how it works and all may help improve it.

Boost the overall productivity of your team

How does a software project get to be a year behind? One day at a time. Sourcegraph helps your team stay ahead of deadlines by streamlining day-to-day tasks. It lets developers minimize the impact of context switches, stay in flow, do faster code reviews, and find the answer to questions like “How do I use this?” that are asked dozens of times each day. These efficiency boosts quickly add up. 

Drive organization-wide adoption of new tools

Most Sourcegraph users use it multiple times per day, but many developer tools are used much less frequently. It can be a challenge for CIOs and Directors of Developer Productivity to drive adoption of new tools.

Observability and performance monitors, distributed application tracers, code coverage analyzers—these are all tools that may not be easily discoverable or accessible by all members of your team.

sourcegraph 08 Sourcegraph

Sourcegraph’s extension API lets third-party tools add annotations into the Sourcegraph web UI and the UI of code hosts such as GitHub and GitLab. Extensions exist for popular off-the-shelf tools like Codecov, Datadog, and Sentry, and internal developer tools teams can create private extensions for in-house tools as well.

Accelerate onboarding new engineers

It can be a struggle to onboard new engineers, especially if the engineering organization or codebase is large. Sourcegraph reduces the time between start date and first commit by enabling faster comprehension of existing code. New hires often spend the majority of their time jumping around unfamiliar parts of the codebase to build a mental model of the organization’s code. Sourcegraph’s universal code navigation lets them explore the entire codebase with minimal context-switching, and the ability to share links lets them ask specific questions that don’t waste the time of senior engineers.

Reduce incident response time

Every minute counts when responding to a production incident. Sourcegraph code search cuts down on the time it takes to root-cause an issue by making it easy to locate error messages in the source code. Oftentimes, the error message originates from an upstream dependency and is therefore hard to find using an IDE or command-line search tool. Sourcegraph indexes all of the code relevant to your organization and makes error messages instantly findable.

The Sourcegraph extension API also enables integration of devops tools into Sourcegraph. For example, the Sentry extension displays the number of production alerts a particular line of instrumentation code is generating. This provides valuable contextual knowledge when debugging incidents.

sourcegraph 09 Sourcegraph

Maintain and spread code quality standards

Sourcegraph enables organizations to maintain and spread code quality standards through a few vectors:

  • Efficient but thorough code review, with Sourcegraph code navigation and tooltips, prevents poor-quality code from being merged.
  • Automated code quality checkers (e.g., Codecov) can be integrated into code review through the Sourcegraph extension API. Sourcegraph adds these annotations to the existing code review tool.
  • Code link sharing and code navigation in the browser enable developers to reference examples of patterns to be emulated and anti-patterns to be discouraged.

Expose your codebase as a dataset via API

Sourcegraph exposes a powerful GraphQL API. The API is used by internal developer tools teams to build internal tools that leverage Sourcegraph capabilities such as universal code search, code navigation, and code statistics. Access tokens enable trusted tools to authenticate to Sourcegraph securely. Sourcegraph ships with an interactive API explorer, which makes it easy to learn and experiment with the API.

sourcegraph 10 Sourcegraph

Installing and scaling Sourcegraph

Sourcegraph is runnable with just a single Docker command:

docker run —publish 7080:7080 —publish 2633:2633 —publish 127.0.0.1:3370:3370 —rm —volume ~/.sourcegraph/config:/etc/sourcegraph —volume ~/.sourcegraph/data:/var/opt/sourcegraph sourcegraph/server:3.10.0

 (See https://docs.sourcegraph.com for the latest version)

For larger codebases and organizations, Sourcegraph can be deployed as a Kubernetes cluster composed of modular services that can be independently scaled to meet the enterprise’s needs.

Sourcegraph also ships with observability and monitoring tools like Jaeger tracing, Prometheus monitoring, and Grafana dashboards. These allow Sourcegraph administrators to meet the uptime and availability requirements that enterprises demand of critical tools.

Over 10,000 paid developers and tens of thousands of free open source developers are using Sourcegraph to increase developer productivity, improve code quality, explore the universe of code more effectively—and ship software robustly and quickly.

To try Sourcegraph for free in seconds, visit sourcegraph.com.

Beyang Liu is CTO and co-founder of Sourcegraph. Prior to Sourcegraph, Beyang was a software engineer at Palantir Technologies, where he developed new data analysis software on a small, customer-facing team working with Fortune 500 companies.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.