|
CIS120 OCaml Style Guide
One important goal in this class is to teach you how to program
elegantly. You have most likely spent many years in secondary school
learning style with respect to the English language -- programming
should be no different. Every programming language demands a
particular style of programming, and forcing one language's style upon
another can have disastrous results. Of course there are some
elements of writing a computer program that are shared between all
languages. You should be able to pick up these elements through
experience.
Listed below are some style guidelines for OCaml. Egregious
violation of these guidelines may result in loss of programming style
points. Note that these guidelines cover many more OCaml features than we
will be expecting you to use in CIS120.
Although the list below seems daunting, most of the suggestions
are common sense. Also, you should note that these rules come no where
near to the style mandates you will likely come across in industry. Many
companies go so far as to dictate exactly where spaces can go. You
can rejoice that you do not have to learn Hungarian notation.
Acknowledgement: Much of this style guide is adapted from CS
312 at Cornell University.
File Submission Requirements:
- Code must compile
- 80 column limit
- No tab characters
Commenting:
- Comments go above the code they reference
- Avoid useless comments
- Avoid over-commenting
- Line breaks
- Proper multi-line commenting
Naming and Declarations:
- Use meaningful names
- Naming conventions
- Type annotations
- Avoid global mutable variables
- When to rename variables
- Order of declarations in a structure
Indentation:
- Indent two spaces at a time
- Indenting nested let expressions
- Indenting match expressions
- Indenting if expressions
- Indenting comments
Using Parentheses:
- Parenthesize to help indentation
- Wrap match expressions with parenthesis
- Over parenthesizing
Pattern Matching:
- No incomplete pattern matches
- Pattern match in the function arguments when possible
- Function arguments should not use values for patterns
- Avoid using too many projections
- Pattern match with as few match expressions as necessary
- Don't use List.hd, List.tl, or List.nth
Code Factoring:
- Don't let expressions take up multiple lines
- Breakup large functions into smaller functions
- Over-factoring code
Verbosity:
- Don't rewrite existing code
- Misusing if expressions
- Misusing match expressions
- Other common misuses
- Don't rewrap functions
- Avoid computing values twice
File Submission Requirements
- Code Must Compile: Any code you submit must
compile. If it does not compile, we won't grade the project and
you will lose all the points for the project. You should treat any
compiler warnings as errors.
- 80 Column Limit: No line of code should have more than
80 columns. Using more than 80 columns causes your code to wrap around
to the next line which is devastating for readability.
Ensuring that all your lines fall within the 80 column limit
is not something you should do when you have finished programming.
- No Tab Characters: Do not use the tab character
(0x09). Instead, use spaces to control indenting. The Emacs
package from the OCaml website avoids using tabs (with the exception of pasting
text from the clipboard or kill ring). When in ml-mode, Emacs
uses the TAB key to control indenting instead of inserting the tab
character.
Commenting
- Comments Go Above the Code They Reference: Consider
the following:
let sum = List.fold_left (+) 0
let sum = List.fold_left (+) 0
The latter is the better style, although you may find some source
code that uses the first. We require that you use
the latter.
- Avoid Useless Comments: Comments that merely repeat
the code it references or state the obvious are a travesty to
programmers. Comments should state the invariants, the non-obvious, or
any references that have more information about the code.
- Avoid Over-commenting: Incredibly long comments are
not very useful. Long comments should only appear at the top of a file
-- here you should explain the overall design of the code and reference any
sources that have more information about the algorithms or data
structures. All other comments in the file should be as short as
possible, after all brevity is the soul of wit. Most often the best
place for any comment is just before a function declaration. Rarely
should you need to comment within a function -- variable naming should be
enough.
- Line Breaks: Obviously the best way to stay within
the 80 character limit imposed by the rule above is pressing the enter key
every once and a while. Empty lines should be included
between value declarations within a struct block, especially
between function declarations. Often it is not necessary to have empty
lines between other declarations unless you are separating the different
types of declarations (such as structures, types, exceptions and
values). Unless function declarations within a let block are
long, there should be no empty lines within a let block. There should
never be an empty line within an expression.
- Proper Multi-line Commenting: When comments are
printed on paper, the reader lacks the advantage of color highlighting
performed by an editor such as Emacs. This makes it important for you
to distinguish comments from code. When a comment extends beyond one
line, it should be preceded with a * similar to the following:
let complicatedFunction () = ...
Naming and Declarations
- Use Meaningful Names: Variable names should
describe what they are for. Distinguishing what a variable references
is best done by following a particular naming convention (see suggestion
below). Variable names should be words or combinations of words.
Cases where variable names can be one letter are in a short let
blocks. Often it is the case that a function used in a fold, filter,
or map is bound to the name f. Here is an example for short
variable names:
let d = Unix.localtime (Unix.time ()) in let m = d.Unix.tm_min in let s = d.Unix.tm_min in let f n = (n mod 3) = 0 in List.filter f [m;s]
- Naming Conventions: The following are
the naming guidelines that are followed by the OCaml library; try to
follow similar conventions:
Token |
|
Convention |
|
Example |
Variables
and functions |
|
Symbolic
or initial lower case. Use underscores for multiword names: |
|
get_item |
Constructors |
|
Initial upper
case. Use embedded caps for multiword names. Historic
exceptions are true, and false.
Rarely are symbolic names like :: used. |
|
Node
EmptyQueue |
Types |
|
All lower
case. Use underscores for multiword names. |
|
priority_queue |
Module
Types |
|
Initial upper
case. Use embedded caps for multiword names. |
|
PriorityQueue |
Modules |
|
Same as
module type convention. |
|
PriorityQueue |
Functors |
|
Same as module type convention. |
|
PriorityQueue |
These conventions are not enforced by the compiler, though
violations of the variable/constructor conventions ought to cause warning
messages because of the danger of a constructor turning into a variable when
it is misspelled.
- Type Annotations: Complex or potentially
ambiguous top-level
functions and values should be declared with types to aid the reader. Consider the following:
let get_bit bitidx n =
let shb = Int32.shift_left 1l bitidx in
Int32.logand shb n = shb
let get_bit (bitidx:int) (n:int32):bool =
let shb = Int32.shift_left 1l bitidx in
Int32.logand shb n = shb
The latter is considered better. Such type annotations can also
help significantly when debugging typechecking problems.
- Avoid Global Mutable Variables: Mutable values
should be local to closures and almost never declared as a structure's
value. Making a mutable value global causes many problems.
First, running code that mutates the value cannot be ensured that the value
is consistent with the algorithm, as it might be modified outside the
function or by a previous execution of the algorithm. Second, and more
importantly, having global mutable values makes it more likely that your
code is nonreentrant. Without proper knowledge of the ramifications,
declaring global mutable values can extend beyond bad style to incorrect
code.
- When to Rename Variables: You should rarely need
to rename values, in fact this is a sure way to obfuscate code.
Renaming a value should be backed up with a very good reason. One instance
where renaming a variable is common and encouraged is aliasing structures.
In these cases, other structures used by functions within the current
structure are aliased to one or two letter variables at the top of the struct
block. This serves two purposes: it shortens the name of the structure and
it documents the structures you use. Here is an example:
module H = Hashtbl
module L = List
module A = Array
...
- Order of Declarations in a Structure: When
declaring elements in a file (or nested module) you first alias the structures
you intend to use, followed by the types, followed by exceptions, and lastly
list all
the value declarations for the structure. Here is an example:
module L = List
type foo = unit
exception InternalError
let first list = L.nth list 0
Note that every declaration within the structure should be indented the same
amount.
Indenting
- Indent Two Spaces at a Time: Most lines that
indent code should only indent by two spaces more than the previous line of
code.
- Indenting nested let
expressions: Blocks of code that have nested let
expressions should not be indented.
Bad:
let x = exp1 in
let y = exp2 in
x + y
Good:
let x = exp1 in
let y = exp2 in
x + y
- Indenting match Expressions: Indent
similar to the following.
match expr with
| pat1 -> ...
| pat2 -> ...
- Indenting if Expressions: Indent similar
to the following.
if exp1 then exp2 if exp1 then
else if exp3 then exp4 exp2
else if exp5 then exp6 else exp3
else exp8
if exp1 then exp2 else exp3
if exp1 then exp2
else exp3
- Indenting Comments: Comments should be indented to
the level of the line of code that follows the comment.
Using Parentheses:
- Parenthesize to Help Indentation: Indentation
algorithms are often assisted by added parenthesization. Consider the
following:
let x = "Long line..."^
"Another long line."
let x = ("Long line..."^
"Another long line.")
The latter is considered better style.
- Wrap match Expressions with Parenthesis:
This avoids a common (and confusing) error that you get when you have a
nested match expression.
- Over Parenthesizing: Parenthesis have many
semantic purposes in ML, including constructing tuples, grouping sequences
of side-effect expressions, forcing higher-precedence on an expression for
parsing, and grouping structures for functor arguments. Clearly, the
parenthesis must be used with care. You may only use parentheses when
necessary or when it improves readability. Consider the following two
function applications:
let x = function1 (arg1) (arg2) (function2 (arg3)) (arg4)
let x = function1 arg1 arg2 (function2 arg3) arg4
The latter is considered better style. Parentheses should usually not appear on a
line by themselves, nor should they be the first graphical character --
parentheses do not serve the same purpose as brackets do in C or Java.
Pattern Matching
- No Incomplete Pattern Matches: Incomplete pattern
matches are flagged with compiler warnings. We strongly discourage compiler
warnings when grading; thus, if there is a compiler warning, the project
will get reduced style points.
- Pattern Match in the Function Arguments When Possible:
Tuples, records and datatypes can be deconstructed using pattern
matching. If you simply deconstruct the function argument before you
do anything useful, it is better to pattern match in the function argument.
Consider these examples:
Bad |
|
Good |
let f arg1 arg2 =
let x = fst arg1 in
let y = snd arg1 in
let z = fst arg2 in
...
|
|
let f (x,y) (z,_) = ...
|
let f arg1 =
let x = arg1.foo in
let y = arg1.bar in
let baz = arg1.baz in
...
| |
let f {foo=x, bar=y, baz} = ...
|
- Function Arguments Should Not Use Values for Patterns:
You should only deconstruct values with variable names and/or wildcards in
function arguments. If you want to pattern match against a specific
value, use a match expression or an if expression. We
include this rule because there are too many errors that can occur when you
don't do this exactly right. Consider the following:
let fact 0 = 1
| fact n = n * fact(n-1)
let fact n =
if n=0 then 1
else n * fact(n-1)
The latter is considered better style.
- Avoid Using Too Many Projections: Frequently
projecting a value from a record or tuple causes your code to become
unreadable. This is especially a problem with tuple projection because
the value is not documented by a variable name. To prevent
projections, you should use pattern matching with a function argument or a
value declaration. Of course, using projections is okay as long as it
is infrequent and the meaning is clearly understood from the context.
The above rule shows how to pattern match in the function arguments.
Here is an example for pattern matching with value declarations.
Bad |
|
Good |
let v = someFunction() in
let x = fst v in
let y = snd v in
x+y
|
|
let x,y = someFunction() in
x+y
|
- Pattern Match with as Few match Expressions as
Necessary: Rather than nest match expressions, you can combine
them by pattern matching against a tuple. Of course, this doesn't work
if one of the nested match expressions matches against a value
obtained from a branch in another match expression.
Nevertheless, if all the values are independent of each other you should
combine the values in a tuple and match against that. Here is an
example:
Bad
let d = Date.fromTimeLocal(Unix.time()) in
match Date.month d with
| Date.Jan -> (match Date.day d with
| 1 -> print "Happy New Year"
| _ -> ())
| Date.Jul -> (match Date.day d with
| 4 -> print "Happy Independence Day"
| _ -> ())
| Date.Oct -> (match Date.day d with
| 10 -> print "Happy Metric Day"
| _ -> ())
Good
let d = Date.fromTimeLocal(Unix.time()) in
match (Date.month d, Date.day d) of
| (Date.Jan, 1) -> print "Happy New Year"
| (Date.Jul, 4) -> print "Happy Independence Day"
| (Date.Oct, 10) -> print "Happy Metric Day"
| _ -> ()
- Don't use List.hd,
List.tl, or List.nth:
The functions hd, tl, and nth are used to
deconstruct list types; however, they raise exceptions on
certain inputs. You should rarely use these functions. In the
case that you find it absolutely necessary to use these (something that
probably won't ever happen), you should handle any exceptions that can be
raised by these functions.
Code Factoring
- Don't Let Expressions Take Up Multiple Lines: If a
tuple consists of more than two or three elements, you should consider using
a record instead of a tuple. Records have the advantage of placing
each name on a separate line and still looking good. Constructing a
tuple over multiple lines makes your code look hideous -- the expressions
within the tuple construction should be extraordinarily simple. Other
expressions that take up multiple lines should be done with a lot of
thought. The best way to transform code that constructs expressions
over multiple lines to something that has good style is to factor the code
using a let expression. Consider the following:
Bad
fun euclid (m:int,n:int) : (int * int * int) =
if n=0
then (b 1, b 0, m)
else (#2 (euclid (n, m mod n)), u - (m div n) *
(euclid (n, m mod n)), #3 (euclid (n, m mod n)))
Good
fun euclid (m:int,n:int) : (int * int * int) =
if n=0
then (b 1, b 0, m)
else
let q = m div n in
let r = n mod n in
let (u,v,g) = euclid (n,r) in
(v, u-(q*v), g)
- Breakup Large Functions into Smaller Functions:
One of the greatest advantages of functional programming is that it
encourages writing smaller functions and combining them to solve bigger
problems. Just how and when to break up functions is something that
comes with experience.
- Over-factoring code: In some situations, it's not
necessary to bind the results of an expression to a variable. Consider
the following:
Bad
letl x = TextIO.inputLine TextIO.stdIn in
match x with
...
Good
match TextIO.inputLine TextIO.stdIn with
...
Here is another example of over-factoring (provided y is not a large
expression):
let x = y*y in x+z
y*y + z
The latter is considered better.
Verbosity
- Don't Rewrite Existing Code: The OCaml standard
libraries have a great number of functions and data structures
-- use them! Often students will recode List.filter,
List.map, and similar functions. Another common
way in which one can avoid recoding is to use the fold
functions. Writing a function that recursively walks down a
list can almost always make use of List.fold_left or
List.fold_right. Other data structures often have similar
folding functions; use them whenever they are available.
- Misusing if Expressions: Remember that
the type of the condition in an if expression is bool. In
general, the type of an if expression is 'a, but in the
case that the type is bool, you should not be using if at
all. Consider the following:
Bad |
|
Good |
if
e then true else
false |
|
e |
if
e then false else
true |
|
not e |
if
beta then beta else
false |
|
beta |
if
not e then x else
y |
|
if
e then y else
x |
if
x then true else
y |
|
x ||
y |
if
x then y else
false |
|
x &&
y |
if
x then false else
y |
|
not x &&
y |
if x then
y else true |
|
not x || y |
- Misusing match Expressions: The match
expression is misused in two common situations. First, match
should never be used in place of an if expression (that's why if
exists). Note the following:
match e with
| true -> x
| false -> y
if e then x else y
The latter expression is much better. Another situation where if
expressions are preferred over match expressions is as follows:
match e with
| c -> x
| _ -> y
if e=c then x else y
The latter expression is definitely better. The other misuse is using match
when pattern matching with a val declaration is enough. Consider
the following:
letl x = match expr with (y,z) -> y
let x,_ = expr
The latter is considered better.
- Other Common Misuses: Here is a bunch of other
common mistakes to watch out for:
Bad |
|
Good |
l::nil |
|
[l] |
l::[] |
|
[l] |
length + 0 |
|
length |
length * 1 |
|
length |
big exp * same big exp |
|
let
x = big exp in
x*x |
if
x then f a b c1
else f a b c2 |
|
f a b (if
x then c1 else
c2) |
- Don't Rewrap Functions: When passing a function
around as an argument to another function, don't rewrap the function if it
already does what you want it to. Here's an example:
List.map (fun x -> sqrt x) [1.0; 4.0; 9.0; 16.0]
List.map sqrt [1.0; 4.0; 9.0; 16.0]
The latter is better. Another case for rewrapping a function is often
associated with infix binary operators. To prevent rewrapping the binary
operator, use the op keyword. Consider this example:
fold_left (fun x y -> x + y) 0
fold_left (+) 0
The latter is considered better style.
- Avoid Computing Values Twice: When computing
values twice you're wasting the CPU time and making your program ugly. The
best way to avoid computing things twice is to create a let
expression and bind the computed value to a variable name. This has the
added benefit of letting you document the purpose of the value with a
variable name -- which means less commenting.
|