XMLParser: possible bug in pos() method
Posted: Sat Jul 24, 2021 6:58 pm
Hi there
I'm using XMLParser to parse an XML (obviously) and my goal is to get the full XML of each listed item. I was having quite some troubles trying that, but in the end I could pin it down to a possible bug in the pos() method having troubles with special characters like ä, ö, ü, é, à, etc.
Here's a strongly simplified example that shows the bug:
The first example doesn't contain any special characters and runs just fine, but the second containing some does not:
It seems that with every special character the internal position-counter is off by one. This adds up, and as my real XML is quite large, by the end I'm only getting rubbish...
I'm using XMLParser to parse an XML (obviously) and my goal is to get the full XML of each listed item. I was having quite some troubles trying that, but in the end I could pin it down to a possible bug in the pos() method having troubles with special characters like ä, ö, ü, é, à, etc.
Here's a strongly simplified example that shows the bug:
Code: Select all
@REQUIRE "RapaGUI", {Link = True}
@REQUIRE "xmlparser", {Link = True}
@APPTITLE "XmlParser-Test"
Global xml1$ = [[<root>
<item id="Q:01">
<dc:title>A Prayer For England</dc:title>
<r:narrator>Sinead O'Connor</r:narrator>
</item>
<item id="Q:02">
<dc:title>Remember</dc:title>
<r:narrator>aouaou</r:narrator>
</item>
<item id="Q:03">
<dc:title>Now Is The Time</dc:title>
<r:narrator>eaeeae</r:narrator>
</item>
</root>]]
Global xml2$ = [[<root>
<item id="Q:01">
<dc:title>A Prayer For England</dc:title>
<r:narrator>Sinéad O'Connor</r:narrator>
</item>
<item id="Q:02">
<dc:title>Remember</dc:title>
<r:narrator>äöüäöü</r:narrator>
</item>
<item id="Q:03">
<dc:title>Now Is The Time</dc:title>
<r:narrator>éàèéàè</r:narrator>
</item>
</root>]]
Function p_EventFunc(msg)
Switch(msg.ID)
Case "btnStart1":
moai.DoMethod("ctrlLog", "clear")
p_ParseItemList(xml1$)
Case "btnStart2":
moai.DoMethod("ctrlLog", "clear")
p_ParseItemList(xml2$)
EndSwitch
EndFunction
Function p_Log(text$)
moai.DoMethod("ctrlLog", "insert", text$ .. "\n", "bottom")
EndFunction
Function p_ParseItemList(xml$)
p_Log("----------------------------------------\nXML to parse:\n----------------------------------------")
p_Log(xml$)
p_Log("\n----------------------------------------\nStart parsing:\n----------------------------------------")
Local tracks = { }, currentIndex, line, column, currentPos, endPos, currentElement$ = "", currentAttributes$ = ""
callbacks = {
StartElement = Function (parser, elementName$, attributes)
currentElement$ = elementName$
currentAttributes = attributes
Switch elementName$
Case "item":
currentIndex = ListItems(tracks)
line, column, currentPos = parser:pos()
currentPos = currentPos - 1 ; currentPos is on char 'i', substract 1 to include the opening <
tracks[currentIndex] = { id = attributes.id }
EndSwitch
EndFunction,
EndElement = Function (parser, elementName$)
Switch elementName$
Case "item":
line, column, endPos = parser:pos()
endPos = endPos + 6 ; endPos is on char '<', add 6 to include "/item>"
tracks[currentIndex].item = MidStr(xml$, currentPos, endPos - currentPos)
p_Log("Start: " .. currentpos .. ", End: " .. (endPos - currentPos))
p_Log(tracks[currentIndex].item .. "\n")
EndSwitch
EndFunction
}
Local p = xmlparser.new(callbacks)
p:parse(xml$)
p:close()
callbacks = Nil
Return(tracks)
EndFunction
InstallEventHandler({RapaGUI = p_EventFunc})
moai.CreateApp([[<?xml version="1.0" encoding="iso-8859-1"?>
<application id="app">
<window id="mainWindow" width="400" height="800" title="XmlParser-Test">
<vgroup>
<button id="btnStart1">Start parsing XML 1</button>
<button id="btnStart2">Start parsing XML 2</button>
<texteditor id="ctrlLog" noWrap="true" />
</vgroup>
</window>
</application>]])
Repeat
WaitEvent
Forever
It seems that with every special character the internal position-counter is off by one. This adds up, and as my real XML is quite large, by the end I'm only getting rubbish...