2022 Session (22 - 26 June)
9:30am - 11:30am
Introducing Regular Expressions: patterns in text files
- Perl regular expression patterns in oXygen
- From “plain” text to XML: comparing source documents
- Sources of text for up-conversion:
|return by ~1:45 pm|
2 - 4:15 pm
- Regular Expressions Workshop:
Into the Weeds! Files to Up-convert from the Digital Mitford Project [to be posted]
- Perform document analysis: study and look for patterns you can search for. Jot them down
- Remove everything you don’t need at start and end of document
- Simplify/reduce white spaces.
- Work from the inside out: Try starting with what you can match readily and the most numerous line-by-line tags.
- When working with larger structures (chapters, scenes, speeches, line groups), try the
clopen (close-open) strategy. If you find the start of a thing, you have found the end of the previous thing. Remember if you use
clopen, you have to clean it up afterwards: you’ll have an extra start-tag at the start and you’ll be missing the end tag at the end.
4:30 - 5:30pm
Introduction to XPath
- Last full day: Housekeeping and travel arrangements
- Taking stock: Research questions, project ideas, applications.
- Some options we like for publishing TEI XML editions:
10 - 11:25am
XPath intensive with Digital Mitford Site Index
In oXygen open URL: https://digitalmitford.org/si.xml
11:30am - 12:30pm
XQuery or XSLT demonstration: pulling and remixing data
|return by ~2pm|
2 - 3 pm
Document Data Modeling with the Digital Mitford Journal: Discussio
3 - 4:30 pm
|Class choice or project-specific work|
9:30am - 12:30pm
Conclusions/Farewells/Last Questions! For those who can stay, hands-on Practice with anything we have introduced in this workshop
|Mitford Editors work session
Thanks to SyncroSoft for generously contributing complimentary extended trial licenses for their <oXygen/> XML editor for the use of our Coding School participants.
eXist-db is an open source native XML database and application platform. TEI Publisher is an open source product of eXist Solutions.