December 2011

Warning! Some information on this page is older than 5 years now. I keep it for reference, but it probably doesn't reflect my current knowledge and beliefs.

# Parsing Numeric Constants

Dec 2011

As a personal project I started coding a scripting language. First thing I want to do is parsing of integer and floating point numeric constants. My decision about what syntax to support is based on C++ language, but with some modifications.

Integer constant in C++ can be written as:

123    Decimal       Starting with non-zero digit
0x7B   Hexadecimal   Starting with "0x"
0173   Octal         Starting with "0"

It can also be suffixed with "u" for unsigned type and "l" for long or "ll" for "long long".

"l" makes no sense in Visual C++ because "long" type is equal to normal "int" - it has 32 bits, even in 64-bit code. So I'd prefer to use "long" as type and "ll" as suffix for 64-bit numbers.

I also don't like the octal form. First, I can't see any use of it. In the whole computer science I've seen only one situation where octal system is used: file permissions in Unix. I didn't see any single use of octal form in C++ code. On the other hand, I think preceding number with zeros shouldn't change its meaning, so the choice of "0" as prefix for octal system (instead of, for example, "0o") is very unfortunate in my opinion.

It would be much more useful if we could place binary numbers in code. Java 7 introduces such syntax with "0b" prefix. It has also another interesting feature I like - it allows underscores in numeric literals so you can make long constants more readable by grouping digits, like "0b0011_1010".

I'd like to support decimal, hexadecimal and binary numbers in my language. Regular expressions that match these are:


Floating-point numbers are more sophisticated. A constant that uses all possible features might look like this:


Question is which parts are required and which are optional? It may seem that floating-point numbers and their representation in code is something obvious, but there actually are subtle differences between programming languages. "111" is obviously an integer constant, but is the presence of a dot with no digits on the left, no digits on the right, an exponent part or "f" suffix enough to for a proper floating-point constant?

111.222   C++: OK      HLSL: OK      C#: OK
111.      C++: OK      HLSL: OK      C#: Error
.222      C++: OK      HLSL: OK      C#: OK
111e3     C++: OK      HLSL: OK      C#: OK
111f      C++: Error   HLSL: Error   C#: OK

I want to support all these options, so regular expressions that match floating-point constants in my language are:


Comments | #languages #compilers Share

# C++/CLI Tutorial

Dec 2011

I finished and present my latest production - a tutorial for C++/CLI programming language. C++/CLI is an extension to C++ made by Microsoft and available in Visual Studio / Visual C++ IDE. At the same time it is one of the languages of .NET platform, next to C# or VB.NET. With it you can freely mix native and managed code, which gives extraordinary power in some applications. I've been using this language in my previous job and now I want to share this piece of knowledge. Here is the PDF document and ZIP archive with sample source code:

Comments | #teaching #productions #c++ Share

[Download] [Dropbox] [pub] [Mirror] [Privacy policy]
Copyright © 2004-2020