Expand description
This crate contains parser combinators, roughly based on the Haskell libraries parsec and attoparsec.
A parser in this library can be described as a function which takes some input and if it
is successful, returns a value together with the remaining input.
A parser combinator is a function which takes one or more parsers and returns a new parser.
For instance the many parser can be used to convert a parser for single digits into one that
parses multiple digits. By modeling parsers in this way it becomes easy to compose complex
parsers in an almost declarative way.
Overview
combine limits itself to creating LL(1) parsers
(it is possible to opt-in to LL(k) parsing using the attempt combinator) which makes the
parsers easy to reason about in both function and performance while sacrificing
some generality. In addition to you being able to reason better about the parsers you
construct combine the library also takes the knowledge of being an LL parser and uses it to
automatically construct good error messages.
extern crate combine;
use combine::Parser;
use combine::stream::state::State;
use combine::parser::char::{digit, letter};
const MSG: &'static str = r#"Parse error at line: 1, column: 1
Unexpected `|`
Expected `digit` or `letter`
"#;
fn main() {
// Wrapping a `&str` with `State` provides automatic line and column tracking. If `State`
// was not used the positions would instead only be pointers into the `&str`
if let Err(err) = digit().or(letter()).easy_parse(State::new("|")) {
assert_eq!(MSG, format!("{}", err));
}
}This library is currently split into a few core modules:
-
parseris where you will find all the parsers that combine provides. It contains the coreParsertrait as well as several submodules such assequenceorchoicewhich each contain several parsers aimed at a specific niche. -
streamcontains the second most important trait next toParser. Streams represent the data source which is being parsed such as&[u8],&stror iterators. -
easycontains combine’s default “easy” error and stream handling. If you use theeasy_parsemethod to start your parsing these are the types that are used. -
errorcontains the types and traits that make up combine’s error handling. Unless you need to customize the errors your parsers return you should not need to use this module much.
Examples
extern crate combine;
use combine::parser::char::{spaces, digit, char};
use combine::{many1, sep_by, Parser};
use combine::stream::easy;
fn main() {
//Parse spaces first and use the with method to only keep the result of the next parser
let integer = spaces()
//parse a string of digits into an i32
.with(many1(digit()).map(|string: String| string.parse::<i32>().unwrap()));
//Parse integers separated by commas, skipping whitespace
let mut integer_list = sep_by(integer, spaces().skip(char(',')));
//Call parse with the input to execute the parser
let input = "1234, 45,78";
let result: Result<(Vec<i32>, &str), easy::ParseError<&str>> =
integer_list.easy_parse(input);
match result {
Ok((value, _remaining_input)) => println!("{:?}", value),
Err(err) => println!("{}", err)
}
}If we need a parser that is mutually recursive or if we want to export a reusable parser the
parser! macro can be used. In effect it makes it possible to return a parser without naming
the type of the parser (which can be very large due to combine’s trait based approach). While
it is possible to do avoid naming the type without the macro those solutions require either allocation
(Box<Parser<Input = I, Output = O, PartialState = P>>) or nightly rust via impl Trait. The
macro thus threads the needle and makes it possible to have non-allocating, anonymous parsers
on stable rust.
#[macro_use]
extern crate combine;
use combine::parser::char::{char, letter, spaces};
use combine::{between, choice, many1, parser, sep_by, Parser};
use combine::error::{ParseError, ParseResult};
use combine::stream::{Stream, Positioned};
use combine::stream::state::State;
#[derive(Debug, PartialEq)]
pub enum Expr {
Id(String),
Array(Vec<Expr>),
Pair(Box<Expr>, Box<Expr>)
}
// `impl Parser` can be used to create reusable parsers with zero overhead
fn expr_<I>() -> impl Parser<Input = I, Output = Expr>
where I: Stream<Item = char>,
// Necessary due to rust-lang/rust#24159
I::Error: ParseError<I::Item, I::Range, I::Position>,
{
let word = many1(letter());
// A parser which skips past whitespace.
// Since we aren't interested in knowing that our expression parser
// could have accepted additional whitespace between the tokens we also silence the error.
let skip_spaces = || spaces().silent();
//Creates a parser which parses a char and skips any trailing whitespace
let lex_char = |c| char(c).skip(skip_spaces());
let comma_list = sep_by(expr(), lex_char(','));
let array = between(lex_char('['), lex_char(']'), comma_list);
//We can use tuples to run several parsers in sequence
//The resulting type is a tuple containing each parsers output
let pair = (lex_char('('),
expr(),
lex_char(','),
expr(),
lex_char(')'))
.map(|t| Expr::Pair(Box::new(t.1), Box::new(t.3)));
choice((
word.map(Expr::Id),
array.map(Expr::Array),
pair,
))
.skip(skip_spaces())
}
// As this expression parser needs to be able to call itself recursively `impl Parser` can't
// be used on its own as that would cause an infinitely large type. We can avoid this by using
// the `parser!` macro which erases the inner type and the size of that type entirely which
// lets it be used recursively.
//
// (This macro does not use `impl Trait` which means it can be used in rust < 1.26 as well to
// emulate `impl Parser`)
parser!{
fn expr[I]()(I) -> Expr
where [I: Stream<Item = char>]
{
expr_()
}
}
fn main() {
let result = expr()
.parse("[[], (hello, world), [rust]]");
let expr = Expr::Array(vec![
Expr::Array(Vec::new())
, Expr::Pair(Box::new(Expr::Id("hello".to_string())),
Box::new(Expr::Id("world".to_string())))
, Expr::Array(vec![Expr::Id("rust".to_string())])
]);
assert_eq!(result, Ok((expr, "")));
}Re-exports
Modules
- Stream wrapper which provides an informative and easy to use error type.
- Error types and traits which define what kind of errors combine parsers may emit
- A collection of both concrete parsers as well as parser combinators.
- Traits and implementations of arbitrary data streams.
Macros
- Takes a number of parsers and tries to apply them each in order. Fails if all the parsers fails or if an applied parser consumes input before failing.
- Convenience macro over
opaque. - Declares a named parser which can easily be reused.
- Sequences multiple parsers and builds a struct out of them.
Traits
- Trait which defines a combine parse error.
- By implementing the
Parsertrait a type says that it can be used to parse an input stream into the typeOutput. - A type which has a position.
- A
RangeStreamis an extension ofStreamwhich allows for zero copy parsing. - A
RangeStreamis an extension ofStreamOncewhich allows for zero copy parsing. - A stream of tokens which can be duplicated
StreamOncerepresents a sequence of items that can be extracted one by one.
Functions
- Parses any token.
attempt(p)behaves aspexcept it acts as if the parser hadn’t consumed any input ifpfails after consuming input. (alias fortry)- Parses
openfollowed byparserfollowed byclose. Returns the value ofparser. - Parses
p1 or more times separated byop. The value returned is the one produced by the left associative application of the function returned by the parserop. - Parses
pone or more times separated byop. The value returned is the one produced by the right associative application of the function returned byop. - Takes a tuple, a slice or an array of parsers and tries to apply them each in order. Fails if all the parsers fails or if an applied parser consumes input before failing.
- Parses
parserfrom zero up tocounttimes. - Parses
parserfrommintomaxtimes (includingminandmax). - Constructs a parser out of an environment and a function which needs the given environment to do the parsing. This is commonly useful to allow multiple parsers to share some environment while still allowing the parsers to be written in separate functions.
- Succeeds only if the stream is at end of input, fails otherwise.
- Takes a parser that outputs a string like value (
&str,String,&[u8]orVec<u8>) and parses it usingstd::str::FromStr. Errors if the output ofparseris not UTF-8 or ifFromStr::from_strreturns an error. look_ahead(p)acts aspbut doesn’t consume input on success.- Parses
pzero or more times returning a collection with the values fromp. - Parses
pone or more times returning a collection with the values fromp. - Extract one token and succeeds if it is not part of
tokens. - Succeeds only if
parserfails. Never consumes any input. - Extract one token and succeeds if it is part of
tokens. - Parses
parserand outputsSome(value)if it succeeds,Noneif it fails without consuming any input. Fails ifparserfails after having consumed some input. - Wraps a function, turning it into a parser.
- Parser which just returns the current position in the stream.
- Parses a token and succeeds depending on the result of
predicate. - Parses a token and passes it to
predicate. IfpredicatereturnsSomethe parser succeeds and returns the value inside theOption. IfpredicatereturnsNonethe parser fails without consuming any input. - Parses
parserzero or more time separated byseparator, returning a collection with the values fromp. - Parses
parserone or more time separated byseparator, returning a collection with the values fromp. - Parses
parserzero or more times separated and ended byseparator, returning a collection with the values fromp. - Parses
parserone or more times separated and ended byseparator, returning a collection with the values fromp. - Parses
parserfrom zero up tocounttimes skipping the output ofparser. - Parses
parserfrommintomaxtimes (includingminandmax) skipping the output ofparser. - Parses
pzero or more times ignoring the result. - Parses
pone or more times ignoring the result. - Parses a character and succeeds if the character is equal to
c. - Parses multiple tokens.
- Parses multiple tokens.
- tryDeprecated
try(p)behaves aspexcept it acts as if the parser hadn’t consumed any input ifpfails after consuming input. - Always fails with
messageas an unexpected error. Never consumes any input. - Always fails with
messageas an unexpected error. Never consumes any input. - Always returns the value
vwithout consuming any input.
Type Aliases
- A
Resulttype which has the consumed status flattened into the result. Conversions to and fromstd::result::Resultcan be done usingresult.into()orFrom::from(result) - A type alias over the specific
Resulttype used by parsers to indicate whether they were successful or not.Ois the type that is output on success.Iis the specific stream type used in the parser.