At Databricks, our engineers work on some of the most actively developed Scala codebases in the world, including our own internal repo called "universe" as well as the various open source projects we contribute to, e.g. Apache Spark and Delta Lake. This guide draws from our experience coaching and working with our engineering teams as well as the broader open source community.
Code is written once by its author, but read and modified multiple times by lots of other engineers. As most bugs actually come from future modification of the code, we need to optimize our codebase for long-term, global readability and maintainability. The best way to achieve this is to write simple code.
Scala is an incredibly powerful language that is capable of many paradigms. We have found that the following guidelines work well for us on projects with high velocity. Depending on the needs of your team, your mileage might vary.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
-
- Case Classes and Immutability
- apply Method
- override Modifier
- Destructuring Binds
- Call by Name
- Multiple Parameter Lists
- Symbolic Methods (Operator Overloading)
- Type Inference
- Return Statements
- Recursion and Tail Recursion
- Implicits
- Exception Handling (Try vs try)
- Options
- Monadic Chaining
- Symbol Literals
- 2015-03-16: Initial version.
- 2015-05-25: Added override Modifier section.
- 2015-08-23: Downgraded the severity of some rules from "do NOT" to "avoid".
- 2015-11-17: Updated apply Method section: apply method in companion object should return the companion class.
- 2015-11-17: This guide has been translated into Chinese. The Chinese translation is contributed by community member Hawstein. We do not guarantee that it will always be kept up-to-date.
- 2015-12-14: This guide has been translated into Korean. The Korean translation is contributed by Hyukjin Kwon and reviewed by Yun Park, Kevin (Sangwoo) Kim, Hyunje Jo and Woochel Choi. We do not guarantee that it will always be kept up-to-date.
- 2016-06-15: Added Anonymous Methods section.
- 2016-06-21: Added Variable Naming Convention section.
- 2016-12-24: Added Case Classes and Immutability section.
- 2017-02-23: Added Testing section.
- 2017-04-18: Added Prefer existing well-tested methods over reinventing the wheel section.
- 2019-12-18: Added Symbol Literals section.
- 2022-08-05: Updated Monadic Chaining section: do not monadic-chain with an if-else block.
We mostly follow Java's and Scala's standard naming conventions.
-
Classes, traits, objects should follow Java class convention, i.e. PascalCase style.
class ClusterManager trait Expression
-
Packages should follow Java package naming conventions, i.e. all-lowercase ASCII letters.
package com.databricks.resourcemanager
-
Methods/functions should be named in camelCase style.
-
Constants should be all uppercase letters and be put in a companion object.
object Configuration { val DEFAULT_PORT = 10000 }
-
An enumeration class or object which extends the
Enumerationclass shall follow the convention for classes and objects, i.e. its name should be in PascalCase style. Enumeration values shall be in the upper case with words separated by the underscore character_. For example:private object ParseState extends Enumeration { type ParseState = Value val PREFIX, TRIM_BEFORE_SIGN, SIGN, TRIM_BEFORE_VALUE, VALUE, VALUE_FRACTIONAL_PART, TRIM_BEFORE_UNIT, UNIT_BEGIN, UNIT_SUFFIX, UNIT_END = Value }
-
Annotations should also follow Java convention, i.e. PascalCase. Note that this differs from Scala's official guide.
final class MyAnnotation extends StaticAnnotation
-
Variables should be named in camelCase style, and should have self-evident names.
val serverPort = 1000 val clientPort = 2000
-
It is OK to use one-character variable names in small, localized scope. For example, "i" is commonly used as the loop index for a small loop body (e.g. 10 lines of code). However, do NOT use "l" (as in Larry) as the identifier, because it is difficult to differentiate "l" from "1", "|", and "I".
- Limit lines to 100 characters.
- The only exceptions are import statements and URLs (although even for those, try to keep them under 100 chars).
"If an element consists of more than 30 subelements, it is highly probable that there is a serious problem" - Refactoring in Large Software Projects.
In general:
- A method should contain less than 30 lines of code.
- A class should contain less than 30 methods.
-
Put one space before and after operators, including the assignment operator.
def add(int1: Int, int2: Int): Int = int1 + int2
-
Put one space after commas.
Seq("a", "b", "c") // do this Seq("a","b","c") // don't omit spaces after commas
-
Put one space after colons.
// do this def getConf(key: String, defaultValue: String): String = { // some code } // don't put spaces before colons def calculateHeaderPortionInBytes(count: Int) : Int = { // some code } // don't omit spaces after colons def multiply(int1:Int, int2:Int): Int = int1 * int2
-
Use 2-space indentation in general.
if (true) { println("Wow!") }
-
For method declarations, use 4 space indentation for their parameters and put each in each line when the parameters don't fit in two lines. Return types can be either on the same line as the last parameter, or start a new line with 2 space indent.
def newAPIHadoopFile[K, V, F <: NewInputFormat[K, V]]( path: String, fClass: Class[F], kClass: Class[K], vClass: Class[V], conf: Configuration = hadoopConfiguration): RDD[(K, V)] = { // method body } def newAPIHadoopFile[K, V, F <: NewInputFormat[K, V]]( path: String, fClass: Class[F], kClass: Class[K], vClass: Class[V], conf: Configuration = hadoopConfiguration) : RDD[(K, V)] = { // method body }
-
For classes whose header doesn't fit in two lines, use 4 space indentation for its parameters, put each in each line, put the extends on the next line with 2 space indent, and add a blank line after class header.
class Foo( val param1: String, // 4 space indent for parameters val param2: String, val param3: Array[Byte]) extends FooInterface // 2 space indent here with Logging { def firstMethod(): Unit = { ... } // blank line above }
-
For method and class constructor invocations, use 2 space indentation for its parameters and put each in each line when the parameters don't fit in two lines.
foo( someVeryLongFieldName, // 2 space indent here andAnotherVeryLongFieldName, "this is a string", 3.1415) new Bar( someVeryLongFieldName, // 2 space indent here andAnotherVeryLongFieldName, "this is a string", 3.1415)
-
Do NOT use vertical alignment. They draw attention to the wrong parts of the code and make the aligned code harder to change in the future.
// Don't align vertically val plus = "+" val minus = "-" val multiply = "*" // Do the following val plus = "+" val minus = "-" val multiply = "*"
- A single blank line appears:
- Between consecutive members (or initializers) of a class: fields, constructors, methods, nested classes, static initializers, instance initializers.
- Exception: A blank line between two consecutive fields (having no other code between them) is optional. Such blank lines are used as needed to create logical groupings of fields.
- Within method bodies, as needed to create logical groupings of statements.
- Optionally before the first member or after the last member of the class (neither encouraged nor discouraged).
- Between consecutive members (or initializers) of a class: fields, constructors, methods, nested classes, static initializers, instance initializers.
- Use one or two blank line(s) to separate class or object definitions.
- Excessive number of blank lines is discouraged.
- Methods should be declared with parentheses, unless they are accessors that have no side-effect (state mutation, I/O operations are considered side-effects).
class Job { // Wrong: killJob changes state. Should have (). def killJob: Unit // Correct: def killJob(): Unit }
- Callsite should follow method declaration, i.e. if a method is declared with parentheses, call with parentheses.
Note that this is not just syntactic. It can affect correctness when
applyis defined in the return object.class Foo { def apply(args: String*): Int } class Bar { def foo: Foo } new Bar().foo // This returns a Foo new Bar().foo() // This returns an Int!
Put curly braces even around one-line conditional or loop statements. The only exception is if you are using if/else as an one-line ternary operator that is also side-effect free.
// Correct:
if (true) {
println("Wow!")
}
// Correct:
if (true) statement1 else statement2
// Correct:
try {
foo()
} catch {
...
}
// Wrong:
if (true)
println("Wow!")
// Wrong:
try foo() catch {
...
}Suffix long literal values with uppercase L. It is often hard to differentiate lowercase l from 1.
val longValue = 5432L // Do this
val longValue = 5432l // Do NOT do thisUse Java docs style instead of Scala docs style.
/** This is a correct one-liner, short description. */
/**
* This is correct multi-line JavaDoc comment. And
* this is my second line, and if I keep typing, this would be
* my third line.
*/
/** In Spark, we don't use the ScalaDoc style so this
* is not correct.
*/If a class is long and has many methods, group them logically into different sections, and use comment headers to organize them.
class DataFrame {
///////////////////////////////////////////////////////////////////////////
// DataFrame operations
///////////////////////////////////////////////////////////////////////////
...
///////////////////////////////////////////////////////////////////////////
// RDD operations
///////////////////////////////////////////////////////////////////////////
...
}Of course, the situation in which a class grows this long is strongly discouraged, and is generally reserved only for building certain public APIs.
-
Avoid using wildcard imports, unless you are importing more than 6 entities, or implicit methods. Wildcard imports make the code less robust to external changes.
-
Always import packages using absolute paths (e.g.
scala.util.Random) instead of relative ones (e.g.util.Random). -
In addition, sort imports in the following order:
java.*andjavax.*scala.*- Third-party libraries (
org.*,com.*, etc) - Project classes (
com.databricks.*ororg.apache.sparkif you are working on Spark)
-
Within each group, imports should be sorted in alphabetic ordering.
-
You can use IntelliJ's import organizer to handle this automatically, using the following config:
java javax _______ blank line _______ scala _______ blank line _______ all other imports _______ blank line _______ com.databricks // or org.apache.spark if you are working on Spark
-
For method whose entire body is a pattern match expression, put the match on the same line as the method declaration if possible to reduce one level of indentation.
def test(msg: Message): Unit = msg match { case ... }
-
When calling a function with a closure (or partial function), if there is only one case, put the case on the same line as the function invocation.
list.zipWithIndex.map { case (elem, i) => // ... }If there are multiple cases, indent and wrap them.
list.map { case a: Foo => ... case b: Bar => ... } -
If the only goal is to match on the type of the object, do NOT expand fully all the arguments, as it makes refactoring more difficult and the code more error prone.
case class Pokemon(name: String, weight: Int, hp: Int, attack: Int, defense: Int) case class Human(name: String, hp: Int) // Do NOT do the following, because // 1. When a new field is added to Pokemon, we need to change this pattern matching as well // 2. It is easy to mismatch the arguments, especially for the ones that have the same data types targets.foreach { case target @ Pokemon(_, _, hp, _, defense) => val loss = sys.min(0, myAttack - defense) target.copy(hp = hp - loss) case target @ Human(_, hp) => target.copy(hp = hp - myAttack) } // Do this: targets.foreach { case target: Pokemon => val loss = sys.min(0, myAttack - target.defense) target.copy(hp = target.hp - loss) case target: Human => target.copy(hp = target.hp - myAttack) }
Avoid infix notation for methods that aren't symbolic methods (i.e. operator overloading).
// Correct
list.map(func)
string.contains("foo")
// Wrong
list map (func)
string contains "foo"
// But overloaded operators should be invoked in infix style
arrayBuffer += elemAvoid excessive parentheses and curly braces for anonymous methods.
// Correct
list.map { item =>
...
}
// Correct
list.map(item => ...)
// Wrong
list.map(item => {
...
})
// Wrong
list.map { item => {
...
}}
// Wrong
list.map({ item => ... })Case classes are regular classes but extended by the compiler to automatically support:
- Public getters for constructor parameters
- Copy constructor
- Pattern matching on constructor parameters
- Automatic toString/hash/equals implementation
Constructor parameters should NOT be mutable for case classes. Instead, use copy constructor. Having mutable case classes can be error prone, e.g. hash maps might place the object in the wrong bucket using the old hash code.
// This is OK
case class Person(name: String, age: Int)
// This is NOT OK
case class Person(name: String, var age: Int)
// To change values, use the copy constructor to create a new instance
val p1 = Person("Peter", 15)
val p2 = p1.copy(age = 16)Avoid defining apply methods on classes. These methods tend to make the code less readable, especially for people less familiar with Scala. It is also harder for IDEs (or grep) to trace. In the worst case, it can also affect correctness of the code in surprising ways, as demonstrated in Parentheses.
It is acceptable to define apply methods on companion objects as factory methods. In these cases, the apply method should return the companion class type.
object TreeNode {
// This is OK
def apply(name: String): TreeNode = ...
// This is bad because it does not return a TreeNode
def apply(name: String): String = ...
}Always add override modifier for methods, both for overriding concrete methods and implementing abstract methods. The Scala compiler does not require override for implementing abstract methods. However, we should always add override to make the override obvious, and to avoid accidental non-overrides due to non-matching signatures.
trait Parent {
def hello(data: Map[String, String]): Unit = {
print(data)
}
}
class Child extends Parent {
import scala.collection.Map
// The following method does NOT override Parent.hello,
// because the two Maps have different types.
// If we added "override" modifier, the compiler would've caught it.
def hello(data: Map[String, String]): Unit = {
print("This is supposed to override the parent method, but it is actually not!")
}
}Destructuring bind (sometimes called tuple extraction) is a convenient way to assign two variables in one expression.
val (a, b) = (1, 2)However, do NOT use them in constructors, especially when a and b need to be marked transient. The Scala compiler generates an extra Tuple2 field that will not be transient for the above example.
class MyClass {
// This will NOT work because the compiler generates a non-transient Tuple2
// that points to both a and b.
@transient private val (a, b) = someFuncThatReturnsTuple2()
}Avoid using call by name. Use () => T explicitly.
Background: Scala allows method parameters to be defined by-name, e.g. the following would work:
def print(value: => Int): Unit = {
println(value)
println(value + 1)
}
var a = 0
def inc(): Int = {
a += 1
a
}
print(inc())in the above code, inc() is passed into print as a closure and is executed (twice) in the print method, rather than being passed in as a value 1. The main problem with call-by-name is that the caller cannot differentiate between call-by-name and call-by-value, and thus cannot know for sure whether the expression will be executed or not (or maybe worse, multiple times). This is especially dangerous for expressions that have side-effect.
Avoid using multiple parameter lists. They complicate operator overloading, and can confuse programmers less familiar with Scala. For example:
// Avoid this!
case class Person(name: String, age: Int)(secret: String)One notable exception is the use of a 2nd parameter list for implicits when defining low-level libraries. That said, implicits should be avoided!
Do NOT use symbolic method names, unless you are defining them for natural arithmetic operations (e.g. +, -, *, /). Under no other circumstances should they be used. Symbolic method names make it very hard to understand the intent of the methods. Consider the following two examples:
// symbolic method names are hard to understand
channel ! msg
stream1 >>= stream2
// self-evident what is going on
channel.send(msg)
stream1.join(stream2)Scala type inference, especially left-side type inference and closure inference, can make code more concise. That said, there are a few cases where explicit typing should be used:
- Public methods should be explicitly typed, otherwise the compiler's inferred type can often surprise you.
- Implicit methods should be explicitly typed, otherwise it can crash the Scala compiler with incremental compilation.
- Variables or closures with non-obvious types should be explicitly typed. A good litmus test is that explicit typ
