I made a Quick Data Format parser in 590 characters.

#include<ios>
#define $ S()
#define I if(
#define F for(
#define C c++
#define X throw;
using K=char;struct V{K*s;V*n;};struct W{V*a;W*c,*n;};K*c;K ${F;*c&&*c-33u>93;C);I!strncmp(c,"//",2)){F;*c&*c-10;C);$;}return*c;}K P(W&p){K f=0;W*l=0,*y;F;$;){I'}'==$){I!p.a)X return 0;}y=new W();V*h=0;K u=0;bool N=0;do{K q=$==34,*s=c+=q;I'}'==$|$==41)X I N||!(N=$==40&u==1?C:0)){F;q?*c&*c-34:*c-33u<94&!strchr("{}()/\"",*c);C);I q?!*c:0)X h=(h?h->n:y->a)=new V{(K*)memcpy(calloc(1+c-s,1),s,c-s)},u++;c+=q;}}while($&$-'{'&(N?$-41:u-2));c+=N;I'{'==$)C,P(*y),C;l=(l?l->n:p.c)=y;I!y->c&!y->a->n)X}I p.a)X}

and this is how it works


Background

It was a dark and not so stormy Wednesday night, September 23, 2020. A friend of mine was attempting to recreate his build system in C++, and my other friend, swissChili, and myself were giving him advice on it. After realizing the syntax was a bit difficult to parse without context, we ended up coming up with a new file format for fun.
We named it Quick Data Format, or QDF for short, and it was good.

And then, disaster struck.

swissChili: i have done it
swissChili: a parser for the new syntax that is exactly 1500 bytes, short enough to fit in one ethernet packet
swissChili: for the enjoyment of the programmer, i have included a further minified version in 1113 chars

The golf was on.
You can, and should, read about his parser here. He's much better at brevity than I am.

Quick Data Format

Inspired by Valve's KeyValues format, the purpose of QDF was easy data storing in pairs of keys and values. Values can be a string or a list of strings and more key values can be appended as a sub block of the key value pair.

Here's some examples

// We can support comments, so I'll leave some
// But, we don't support multiline comments! How sad!
// Here's the most basic of key value pairs
Key Value
// But that's just it's most basic form

// This is still just a basic key value pair with two nice strings.
// Keys and string values are treated the same as to how they are parsed. If they want control characters or whitespace, they need quotation marks!
// Our control characters are ( ) { } / "
"We can use quotes to have spaces and control characters in our keys and values!" ButWeCantUseWhitespaceIfWeDont
Key "Value"
"Key" "Value"

// The format is completely whitespace invariant. We can put anything anywhere as long as it's set up right
// This is why we really cant be going about putting control characters in our strings!
Key // Rad comment inbetween the key and value
Value

// We also support lists!
Key ( ListValue1 ListValue2 ListValue3 )

// Just like with our other values, we can use quotes!
Key ( "List Values!" Woo "More Values!" )

// Now for the fun stuff. Sub pairs of a key
// Each key can have more pairs within it, which in turn can have more pairs, and onward forever.
Key {
	SubKey SubValue
	SubKey ( ListValue1 ListValue2 )
	SubKey {
		MoreKeys MoreValues
	}
}

// But keys can have more than just a string, list, or sub pairs. They can have a string or list with sub pairs!
Key Value {
	SubKey SubValue
}
Key ( ListValue1 ListValue2 ) {
	SubKey SubValue
}

// What a fun format!

"I don't care! Tell me about the code!!"

Okay okay!
As we go, I'll be slowly decrypting the code for you. If you're running short on time, you can skip to the end and view the fully deobfuscated version.

Let's slice this apart.

Starting with our single #include, suggested by my friend, <ios>.
We include just this one header since it includes <cstring> and <cstdlib> down the line and is super short.

Next up, the defines!

#define $ S()
#define I if(
#define F for(
#define C c++
#define X throw;

These are just short hands for commonly used snips of code.
"Why have the ( at the end of if and for?"
Good question.
The ( is there as it's used in every instance of for and if, and most of the time, we can overcome the required delimiter after the macro with some clever code placement.

Up on homeplate, we've got the types declarations and a variable definition ready to bat!

using K=char;
struct V{K*s;V*n;};
struct W{V*a;W*c,*n;}; 
K*c;

Right out of the dugout, we've got using K=char; please don't hurt me I don't know anything about baseball.
This was suggested to me by my modern C++ loving friend, and is just an ever so slighly smaller than a #define way of declaring K as a char.

struct V{K*s;V*n;}; is really just struct stringValue_t and contains a string, K*s; being char* string;, and what's up next in our value linked list, V*n; or stringValue_t* next;.
We have to use this linked list style struct since a std::list would take up too many chars and we don't know how many values we're going to have to parse out of a list, if any at all.

We end up needing to use linked lists again for our tree like structure containing our actual key value pairs. This is struct W, or struct keyValueNode_t.
This struct contains three nice values for us: V*a or stringValue_t* strings, W*c or keyValueNode_t* child, and W*n aka keyValueNode_t* next.

"But where's the keys! Where are the values! Think about the children!"
Shhhh. Shh. They're still there, just hidden within stringValue_t* strings.
If our key is a string, and so are all of our values, we might as well save a few chars and abstract them all away in to one linked list.
As for the children, they're stored as, you guessed it, a linked list, with the first one assigned to keyValueNode_t* child.

As for K*c;, this is actually char* currentCharacter;, but I'll shorten it to char* cur; so my fingers don't fall off.
This is our current position in our string, and also functions as our input. This variable is constantly moved throught the code and is the only way to tell the code what we're actually parsing.


Currently, our understanding of the code looks like this.

// This includes cstring and cstdlib for us
#include<ios>

// This is used for our linked list string values
struct stringValue_t
{
	// This is the string of this value
	char* string;
	
	// The next stringValue_t in the linked list
	stringValue_t* next;
};

// This is used for our linked list pairs
struct keyValueNode_t
{
	// All of our strings for this node. First string is the key, all following are values
	stringValue_t* strings;
	
	// If we have a subblock, this is a linked list, otherwise this is null
	keyValueNode_t* child;
	
	// The next keyValueNode_t in the linked list. If this is null, we've reached the end of the list
	keyValueNode_t* next;
};

// Our input string and current position in that string
char* cur;

Much easier to read, eh?


Now for some actual code!

K ${F;*c&&*c-33u>93;C);I!strncmp(c,"//",2)){F;*c&*c-10;C);$;}return*c;}

Ooh, that's a bit of a mess... Let's use some of the knowledge from above and clean this up.

char S()
{
	for(;*cur&&*cur-33u>93;cur++);
	if(!strncmp(cur,"//",2))
	{
		for(;*cur&*cur-10;c++);
		S();
	}
	return*cur;
}

Hrmm... Still a bit hard to read. Let's try to understand this!

Starting from the top, what does S mean?
Turns out, S is really just short for SkipSpaceAndReturnCurrentChar, but that's a bit of a long name. I'll just call it Skip for now on.
It's a real handy tool for us as it skips over all of the extra whitespace and comments in its path and gives us a valid character, and if it's currently on a valid character, it just returns that.

"But how does it do that? What are all those weird numbers? Why is there a u at the end of that number???!!"
They really don't make readers like they used to... All this screaming at me, and so impatient too!

Let's look at this part by part.

for(;*cur&&*cur-33u>93;cur++);

Lots to unravel in this roll!
We start out with nothing...
Great! Less chars!
Then we perform a check for if we've hit the end of the string. Passing the length takes extra chars, so we just check if we've hit the 0 at the end of the rainbow ever so often.
The convenient part of this check, *c, is that C++ treats any 0 as false and any thing else as true! We abuse this fact quite often and it allows us to skip the whole *c=='\0' hocus pocus.

As for *cur-33u>93, this is a bit more involved and relies on how characters are laid out in ASCII.
Here's my favorite ASCII diagram. It's really quite good.
You might be able to tell that the first column and first character in the second of that graph, basically everything up to decimal 33, and the very last character, 127 [DEL], is pretty much useless to us.
We want to skip those in the shortest possible way we can.
Unsigned numbers in a computer, numbers that are just positive, really have a hard time becoming negative. Actually, they flat out refuse to be negative!
Whenever we try to tell them to become negative, they just jump right on up to whatever the largest number they can be within their size constraints. This is called underflow, and for our char, -1 becomes 255
We can abuse that fact and push all of that whitespace out of they way, and into the upper range via underflow, by subtracting the first index of a useful character from our current character.
*cur-33
But, oh! Woe is us. In C++, chars are signed numbers!
But, we've got a trick up our sleeves. We can turn that char into an unsigned char by tacking a u onto the end of 33.
This makes the whole thing unsigned and turns every thing from space to new line into a higher number.
Now that ! at decimal 33 is at 0 and [SPACE] at 32 is 255, we can make a nice check to see if the number is less that whatever ~'s new index is, which we can determine by subtracting it's index by 33 to get 93.
Now, if the number is greater than 93, it's whitespace, or at least something strange and unsupported, for sure, and we can skip forward, cur++, until the condition is false. Once it's done, we've hit land.

Now that we've hit something, we should probably make sure it's not a comment, otherwise we'd be handing out garbage we don't want later on.

if(!strncmp(cur,"//",2))
{
	for(;*cur&*cur-10;c++);
	S();
}

We start this off with a simple check, if(!strncmp(cur,"//",2)), for if there's a / on the current character, and a / the next. strncmp kindly checks for the end of string for us with the downside of returning 0 on success. Normally it returning 0 is nice, since when the comparison fails it gives useful information, but, at the moment, we have to logical not with ! to make it true on success so the following code gets executed.

"What is the following code?"
I thought you'd never ask!

If the condition was successful, we're now in a comment. Spooky! We have to keep trekking forward until we hit an end line if we ever want to escape this terrible place.

for(;*cur&*cur-10;c++);

Just like our earlier, whitespace skip, we start out by making sure we haven't hit the end of the string followed by checking if we're not on top of an endline.
We check if we're not on top of an end line by subtracting the end line's decimal index, since it's shorter than the literal representation, from the current character, *cur-10. If the current character is equal to 10, an end line, then the value becomes 0.
You might remember from earlier that whenever C++ sees a 0, it thinks false. This means, when we hit an endline, the condition turns to false and the loop stops.

"But what's the deal with that weird single ampersand?!!"
Gah! I've been exposed!

The single ampersand, or bitwise and, is a fun shortcut that occasionally works as a replacement to the normal double ampersand, logical and.
It works by one by one comparing the individual bits of the two values rather than if it's just true or false, and can conveniently save us a char here.

Now that we're at the end of the line, we should check if there's anymore whitespace infront of us so that we dont give out garbage.
We can do that real simply with recursion by calling S();. This will skip any whitespace or comments for us. Nice!


Now that we're finally out of the woods, we give the caller what the character we landed on for their use, return*cur;.
This is really useful for character cutting as it allows us to just get the value rather than having to constantly check it ourselves and planning out where to skip whitespace and comments.

So what does this look like to us now that we understand it?

char Skip()
{
	// Skip whitespace
	for(; *cur && *cur-33u > 93; cur++);
	
	// Did we hit a comment?
	if(!strncmp(cur, "//", 2))
	{
		// Skip until endline
		for(;*cur & *cur - '\n';c++);
		
		// Skip all whitespace after the comment
		Skip();
	}
	
	// Return the current character
	return *cur;
}

Little bit easier to understand.


Finally, the parser.
Whew, lotta writing for 217 character. Only 373 characters to go...


K P(W&p){K f=0;W*l=0,*y;F;$;){I'}'==$){I!p.a)X return 0;}y=new W();V*h=0;K u=0;bool N=0;do{K q=$==34,*s=c+=q;I'}'==$|$==41)X I N||!(N=$==40&u==1?C:0)){F;q?*c&*c-34:*c-33u<94&!strchr("{}()/\"",*c);C);I q?!*c:0)X h=(h?h->n:y->a)=new V{(K*)memcpy(calloc(1+c-s,1),s,c-s)},u++;c+=q;}}while($&$-'{'&(N?$-41:u-2));c+=N;I'{'==$)C,P(*y),C;l=(l?l->n:p.c)=y;I!y->c&!y->a->n)X}I p.a)X}

Oh, my. What a mess. Let's clean that up using what we know from earlier.


char P(keyValueNode_t&p)
{
	char f=0;
	keyValueNode_t*l=0,*y;
	for(;Skip();)
	{
		if('}' == Skip())
		{
			if(!p.strings)
				throw;
			return 0;
		}
		y=new keyValueNode_t();
		stringValue_t*h=0;
		char u=0;
		bool N=0;
		do
		{
			char q=Skip()=='"',*s=cur+=q;
			if('}'==Skip()|Skip()==')')
				throw;
			if(N||!(N=Skip()=='('&u==1?cur++:0))
			{
				for(;q?*cur&*cur-34:*cur-33u<94&!strchr("{}()/\"",*cur);cur++);
				if(q?!*cur:0)
					throw;
				h=(h?h->next:y->strings)=new stringValue_t{(char*)memcpy(calloc(1+cur-s,1),s,cur-s)},u++;
				cur+=q;
			}
		}
		while(Skip()&&Skip()-'{'&&(N?Skip()-41:u-2));
		cur+=N;
		if('{'==Skip())
			cur++,P(*y),cur++;
		l=(l?l->next:p.child)=y;
		if(!y->child&!y->strings->next)
			throw;
	}
	if(p.strings)
		throw;
}

Easier to read, but still hard to understand...
Let's fix that!

First up, you might have noticed that a lot of numbers just magically became letters. As we spoke about earlier, each character has a decimal representation, and somethimes, that decimal representation is shorter than typing out the actual character. How nice!
Looking at the code now, starting with our function declaration, we've got char P(keyValueNode_t&p). We have our return as char here since we declare K as char earlier, and K is shorter than void. It's a nice little save.
Next we've got the name, P. This is just a short 1 char way to say Parse, but this doesnt apply to the next p!
The p in keyValueNode_t&p stands for parent, and we pass it as a reference to a keyValueNode_t because it's cheaper char count wise than a pointer due to not needing to use -> constantly later on and not having to null check it.


Next up we have char f=0;. This does nothing. And is a left over from an old error reporting system. :(

Now we've got a bit of a messy definition of two different, but related, variables keyValueNode_t*l=0,*y;.
l is shorthand for last and holds a pointer to the last keyValueNode_t we created. We need this as all of our data is linked together and we'll have to set its next later on.
y is really currentNode, but I'll call it node. It holds a pointer to the current keyValueNode_t.


Now that we're done with the definitions, let's talk about some logic.

for(;Skip();)

Front and center, we've got a for loop that takes Skip() as its condition. The great part about this is that it not only skips all whitespace and comments, but it also stops the loop when we the end of the string. Nice!
We use for rather than while since we made the definition of #define F for( earlier on which makes the obfuscated version of this nice and short!

Within this for loop, we've got a lot of code.
We'll be resolving out the shorthand versions of variables with their longer versions as we go for better readability.

if('}' == Skip())
{
	if(!parent.strings)
		throw;
	return 0;
}

This here's a nice little check to see if we've hit a }, one of our fancy control characters that means that our sub block is done.
This is important since we have to actually get out of the subblock at some point, but if our parent does have strings, that means we're currently parsing for root.
If we're parsing for root, the layer that had no starting {, and and we hit a }, that's a syntax error. Let's throw an exception

Next on the shopping list, we've gotta pick up an explanation for the next block of code.

node=new keyValueNode_t();
stringValue_t*h=0;
char u=0;
bool N=0;

"What's with those parenthesis? Couldn't you save some chars by dropping them?"
We can't! When you initalize with parenthesis in C++, it actually zeros all of the fields in that type. Super convenient because now we don't have to write out = 0 a bunch.
Too bad we can't magically zero all of the following variables...

stringValue_t*h=0; is our current stringValue_t and it has to be zero so that we know when we're parsing the first element in the list. We'll call this curVal
Chhah Although, in retrospect, I should have just used u to detect that. AHhem. Sorry something caught in my throat.
char u=0; is the index of our current stringValue_t. We use this for error checking and figuring out when to stop parsing values. For now on, you'll see this called valIndex
bool N=0; is a kind and simple variable that informs us if we're currently parsing out a nice stringValue_t list, rather than a single value for our key. N was their father's name, please, call them inList

Now that we've got those names out of the way, this next block of code should be super easy to understand. Right?

do
{
	char q=Skip()=='"',*s=cur+=q;
	if('}'==Skip()|Skip()==')')
		throw;
	if(inList||!(inList=Skip()=='('&valIndex==1?cur++:0))
	{
		for(;q?*cur&*cur-34:*cur-33u<94&!strchr("{}()/\"",*cur);cur++);
		if(q?!*cur:0)
			throw;
		curVal=(curVal?curVal->next:node->strings)=new stringValue_t{(char*)memcpy(calloc(1+cur-s,1),s,cur-s)},valIndex++;
		cur+=q;
	}
}
while(Skip()&&Skip()-'{'&&(inList?Skip()-41:valIndex-2));
cur+=inList;

Oh, my. Let's talk about this mess...

This section of code is the string parser. It grabs all of our keys, values, list values, and etcetera.
First come, first serve, and do got here really quite early. We make use of do because no matter what, we're always parsing out a key, and the key and value parsing code got mixed up together.
Next in line, we have a strange variable definition, char q=Skip()=='"',*s=cur+=q;. This is really two separate variables of two separate types.
q is really a bool in disguise as a char that informs us of if the string we're parsing out is a quoted string or not. We'll call it isQuoteString in all future code.
s is a char* that points to the start of the string. You'll see it referred to as strVal. But before we set it to cur, we have to skip over the quotation if it's there. If isQuoteString ended up being true, we move cur forward one, over the quote.

Our logic has been patiently waiting in line, so let's go talk with it.

if('}'==Skip()|Skip()==')')
	throw;

This nice little section of code skips whitespace and checks if we're hitting a } or ). If we are, that's totally a syntax error and shouldn't be happening, so we throw a fit.
We make use of two nice shortcuts in this line: a logical or can be replaced with a bitwise or and, because of our macro #define $ Skip(), it's cheaper to Skip() twice than to dereference our position in the string.
In retrospect, this check should be done before we start out the string, as we could have a ) or } within a quoted string...

Now for the big mess.

if(inList||!(inList=Skip()=='('&valIndex==1?cur++:0))
{
	for(;isQuoteString?*cur&*cur-34:*cur-33u<94&!strchr("{}()/\"",*cur);cur++);
	if(isQuoteString?!*cur:0)
		throw;
	curVal=(curVal?curVal->next:node->strings)=new stringValue_t{(char*)memcpy(calloc(1+cur-strVal,1),strVal,cur-strVal)},valIndex++;
	cur+=isQuoteString;
}

Still messy, but at least the variable names are all cleared up at this point.

At the top of this onion of code, we have if(inList||!(inList=Skip()=='('&valIndex==1?cur++:0)). As with onions, it has some layers to it.
Not only does this if statment always pass when we're reading a list, it also figures out if we're reading into a list in its second half. Let's focus on that second half, !(inList=Skip()=='('&valIndex==1?cur++:0), going inside out.
At the core of this we have a ternary statement with the condition Skip()=='('&valIndex==1. This condition checks if the current character is our start list control character, ( and that we've just parsed out our key, valIndex==1. While doing this, it makes use of the logical to bitwise operator shortcut again.
If that condition turns out to be true, we set inList to be true in a backwards way by assigning it to the memory address of our position, since any non zero is true in C++, and we increment our position in the string by one to skip the control character, (.
Otherwise, if it's false, we carry on our merry way and set inList to be false by setting it to be 0 in the latter half of the ternary.
Now, if we just hit a (, we're not parsing a string, we're just switching modes to be parsing lists. Because of that, we need to make sure this if statement doesn't pass, so we plop a ! to invert our, newly assigned to be true, inList.
Because our first half of the if checked if inList was true, and got false, it's trying to check our other side of the ||. Now that we've hit a (, and inList is true, it gets inverted into a false thus not executing the if's code.
If we're in a list, that whole second half of the if statement never gets checked due to C++'s condition short-circuiting. Short-circuiting is great because it means that if the front half of the logical or, ||, is true, it doesn't check our second half, which would set inList back to false. It's like a nice always on switch. The only downside of using this trick is that we cannot use a bitwise or, |, to save us a char here like we have before.
Whew, I'm out of breath.

Now, on to the inner part of the if.
We start out with a nice for loop that skips to the end of our string.

"Ugh. He's finally not explaining something in extreme detail."
Wh- I though you loved extreme detail!
I feel lied to...
Whatever. I'm still explaining in extreme detail. You can't stop me! HAH!
This for loop skips out on the fun of its init statement, but it makes great use of its condition, isQuoteString?*cur&*cur-34:*cur-33u<94&!strchr("{}()/\"",*cur).
Once again, we're making use of a ternary. Here we're using it to change how this loop works depending on if it's a quoted string or not.

If it's a quoted string, we seek out either the end of the string, or our string ending control character, " or its numerical representation 34 for char conservation's sake. If we hit a ", it'll subtract 34 and get 0, or false, thus ending the loop.

But, if we aren't parsing a quoted string, it's a whole other ordeal.
We start out with the same whitespace check from earlier, *cur&*cur-34:*cur-33u<94, but flipped.
Now we want to end the loop if we're reading whitespace or if we've hit the end of the string, as quoteless strings are only delimmited by the end of the string, whitespace, or a control character.
If we haven't managed to end the string reading yet, we want to check if we're hitting a control character, and luckily for us, strchr has just the solution.
strchr seeks through "{}()/\"", a list of all of our control characters, and checks if our current character matches it. If a match is found, that means we're on a control character, and the read needs to end. The return of strchr is a pointer, a non zero, which can be turned into a false to break the loop with a simple !.

Now that we're at the end of the string, we need to perform a quick error check to see if a quoted string ended without a quotation mark.

if(isQuoteString?!*cur:0)
	throw;

We perform this test by checking if we're a quoted string and if our current char isn't 0.


Lotta work for a little string eh?
Now to actually make use of it...


curVal=(curVal?curVal->next:node->strings)=new stringValue_t{(char*)memcpy(calloc(1+cur-strVal,1),strVal,cur-strVal)},valIndex++;
cur+=isQuoteString;

Yikes is that ever ugly. Let's get to boogieing and try to unravel this ball of yarn.
We start out with creating a new stringValue_t with new stringValue_t{(char*)memcpy(calloc(1+cur-strVal,1),strVal,cur-strVal)}.
With this, we create a new stringValue_t with a string of (char*)memcpy(calloc(1+cur-strVal,1),strVal,cur-strVal)
This snippet starts out by allocating a block of memory with an initial value of zero, due to the use of calloc instead of malloc. This zero initalization saves us the hassel, and extra chars, of assigning the last character to zero manually. The length of this block is how long the string is plus one extra char for the zero at the end of every string, 1+cur-strVal.
Now that we have our memory block, using memcpy, we copy out of the string we're parsing and into the memory we just allocated. Once again, we recalculate the length of how long the string is, as it's cheaper to recalculate char count wise than it is to create a variable for this.
Luckily for us, memcpy and calloc return the pointer their working on, so we don't have to create new variables.

Unluckily for us, we have to play C++'s silly casting game and make sure the type of our new element fits what's actually in the struct. O C, C! Wherefore art thou C?

Now that we have our string, we do a weird assignment for our new element, (curVal?curVal->next:node->strings)=. This is a really fun way to determine if we're working on our key or our values.
In our ternary, we check if curVal ready and waiting for our key, holding a zero. If we are parsing the key, we want to set the head of the linked list in our node, node->strings, to be our fist element, the key. Otherwise, we want to tack on our new string to the end of the linked list, curVal->next.

Now with our head or end set, we need to update curVal to properly represent what our current value is. We do this with a simple curVal= right at the front of the mess.
Since we have a new value, we also have to increment our value count with a valIndex++
If the string we just parsed was quoted, we need to skip over that quote with cur+=isQuoteString;. The nice thing about bools is that when they're true, they're 1 and when they're false, they're 0; this means our position only increments by 1 when we're quoted.

Now with our string set, we need to determine if we need to keep looping.
We do this with our condition, Skip()&&Skip()-'{'&&(inList?Skip()-41:valIndex-2) in our while at the end of the loop.
Bit ugly, but nothing we can't explain!
We start out with a simple end loop if end of string check and a end loop if start subblock control character, {, check, Skip()&&Skip()-'{'.
The real meat and potatoes is in the next bit, inList?Skip()-41:valIndex-2. This code states that, if we're in a list and we hit a end of list control character, 41 or ), we need to drop the loop now. Otherwise, if we're not in a list, we just need to end on the second string parsed.
We perform these checks with two nice little subtraction not equals to skips.

Finally, if we were parsing a list, we just have to move our position over the end list control character, (. We do this with a simple cur+=inList; using the same true being 1 trick from earlier.

"That's gotta be it right? We have the strings set and the loop has been completed. What else is there possibly to do!"
Unfortunately, there's a decent bit more... Let's try to get into it before I collapse...

Now that our string parsing is done, we've got a bit more work to do our potential subblock.

if('{'==Skip())
	cur++,Parse(*node),cur++;

This tiny snippet takes care of our subblock parsing. We kick it off with a if('{'==Skip()) to check if we actually need to parse a subblock, and then end it with cur++,Parse(*node),cur++;.
In this second section we use commas instead of semicolons so that we don't need to surround the code with { and }. We increment our position once at the start to skip over the start subblock control character, {, and the end subblock control character, }.
Inbetween these two, we simply recursively call parse again to figure out the contents of that subblock.

Such a peaceful and tranquil piece of code. Ahh.


last=(last?last->next:parent.child)=node;
if(!node->child&!node->strings->next)
	throw;

WHAT! MORE CODE? I swear, I don't pay myself enough for this garbage...
Whatever. We're almost done anyways.

This bit of code is another one of those oddball linked list linkers, like what we saw above, and an error checker.
We start by checking if we're the first element, with a last of zero, or if we're midway through. If we're at the start, we link up the head of the linked list, parent.child, with our current element. Otherwise, we just link the current element to the back of the linked list, last->next.
Now, with our linked list all linked up, we need last to properly reflect the last element in our linked list. We do this with a nice little last= in front of our first assignment.

With all of our lists linked, strings parsed, and subblocks read, we actually have to check if we have any values at all...
We do this with if(!node->child&!node->strings->next).
In this, we check if our node had kids, a subblock, or if it had any strings past a key. If this passes, we throw since it's a syntax error, and we escape as quickly as we can.



Finally, our parse is done, our loops are complete.
And, we have one last syntax error check...

if(parent.strings)
	throw;

Way earlier on, we return 0; on a end subblock control character }, so no sublock should ever get here.
We check if the parent has a key, first string in the list, since the root node has no key, only sublocks do.
If we do have a key, we're on a subblock with no end control character, so we throw and drop the parse.



Finally, we're done, and we can put all of this together.

// This includes cstring and cstdlib for us
#include<ios>

// This is used for our linked list string values
struct stringValue_t
{
	// This is the string of this value
	char* string;
	
	// The next stringValue_t in the linked list
	stringValue_t* next;
};

// This is used for our linked list pairs
struct keyValueNode_t
{
	// All of our strings for this node. First string is the key, all following are values
	stringValue_t* strings;
	
	// If we have a subblock, this is a linked list, otherwise this is null
	keyValueNode_t* child;
	
	// The next keyValueNode_t in the linked list. If this is null, we've reached the end of the list
	keyValueNode_t* next;
};

// Our input string and current position in that string
char* cur;

char Skip()
{
	// Skip whitespace
	for(; *cur && *cur-33u > 93; cur++);
	
	// Did we hit a comment?
	if(!strncmp(cur, "//", 2))
	{
		// Skip until endline
		for(;*cur & *cur - '\n';c++);
		
		// Skip all whitespace after the comment
		Skip();
	}
	
	// Return the current character
	return *cur;
}

char Parse(keyValueNode_t& parent)
{
	// Unused...
	char f = 0;

	// Last node, current node
	keyValueNode_t* last = 0, *currentNode;
	
	// Main parsing loop. Skips over whitespace and comments. Ends on end of string
	for(;Skip();)
	{
		// Did we hit an end of subblock?
		if('}' == Skip())
		{
			// If our parent has no string, and we're ending a subblock, there's an extra end subblock control character somewhere...
			if(!p.strings)
				throw;
				
			// Stop parsing now that we've reached the end.
			return 0;
		}
		
		// Create a new node and make room for strings.
		currentNode = new keyValueNode_t();
		stringValue_t* curVal = 0;
		
		// Index of the current string being parsed
		char valIndex = 0;
		// Are we parsing a list?
		bool inList = 0;
		
		do
		{
			// Is current string we're parsing a quoted string?
			char isQuoteString = Skip()=='"';
			// Start of string, excluding the quotation mark
			char *strVal = cur += isQuoteString;
			
			// Check if we hit a control character where we shouldn't
			if('}'==Skip() | Skip()==')')
				throw;
			
			// If we're in a list, keep going, otherwise check if we should be in a list or should be parsing the key.
			if(inList || !(inList = Skip()=='('&valIndex==1 ? cur++ : 0))
			{
				// If we're a quoted string, parse until the quote or end of file. Otherwise, parse until we hit a control character or end of file. 
				for(;isQuoteString ? *cur & *cur-34 : *cur-33u < 94 & !strchr("{}()/\"",*cur); cur++);
				
				// If we hit the end of the file as a quoted string, that's a syntax error. Throw and escape!
				if(isQuoteString ? !*cur : 0)
					throw;
				
				// Make a copy of the string and link it up to the linked list
				curVal = (curVal ? curVal->next : node->strings) = new stringValue_t{ (char*)memcpy(calloc(1+cur-strVal, 1), strVal, cur-strVal) }, valIndex++;
				
				// If we're parsing a quoted string, skip over the end string control character
				cur += isQuoteString;
			}
		}
		// Keep parsing until the end of file, end of subblock, end of list, or second argument out of a list
		while(Skip() && Skip()-'{' && (inList ? Skip()-41 : valIndex-2));
		
		// If we were in a list, move over the end list control character.
		cur += inList;
		
		// If we hit a start subblock control character, parse out a subblock and skip over its control characters
		if('{' == Skip())
			cur++, Parse(*node), cur++;

		// Link up our new node into the linked list
		last = (last ? last->next : parent.child) = node;
		
		// If we had no string values or children, that's a syntax error. Throw and run away!
		if(!node->child & !node->strings->next)
			throw;
	}
	
	// If we reached this point as a subblock, a node with a parent with strings, we failed to return 0 earlier on. There's an extra end subblock control character somewhere. Throw an exception!
	if(parent.strings)
		throw;
}


Conclusion

Wow. Lotta writing for little code, eh? Now we can finally use the parser with a simple cur = myInputString; keyValueNode_t root; Parse(root);. Not a really clean one call, but it is what it is.


Thank you for taking the time to read this. I hope this was an entertaining read for you, and that you enjoyed reading it as much as I enjoyed writing this code. If you haven't yet, please go read swissChili's write up on his excellent parser over here.

I hope you have an excellent day!
 - Ozxy


Published: December 26, 2020