2022 Session (22 - 26 June)
Agenda
Thursday, 6/23
Time |
Activity |
9:30 - 10:15am
|
Preliminaries:
- Introductions/nametags. Internet access working?
- Discussion of projects and interests bringing you all here.
- Digital Mitford intro:
|
10:15 - 10:45am
|
- Orientation to XML elements and attributes. For further reading see
Obdurodon’s What is
XML and Why should humanists care?
- Ordered Hierarchy of Content Objects (OHCO): the XML Tree hierarchy
- Well-formedness vs. Validity
- Character entities: ampersands and angle bracket characters in XML
- Example XML files:
- The process of document analysis and document data modeling
- Issues with imposing an ordered hierarchy on messy documents (see
Ozymandias example): What do we do with overlapping hierarchies?
|
10:45am - lunch
|
TEI XML Orientation
- What is the Text Encoding Initiative? and why work with it?
- Key concepts for document modeling:
- Mindful file management
- Distinguish semantics from display
- Cross-platform compatibility
- Long-range sustainability (outlasting changes to
software)
- Microsoft Office Word as XML
- Surveying the TEI in the Mitford project:
- Planning a TEI project
- Working with TEI in oXygen
- associated schema lines
- syntax checking
|
Lunch Break
|
return by ~1:30pm
|
1:30pm - 2:30pm
|
TEI ODD walk-through exercise
|
2:30pm - 5pm
|
MS Paleography workshop: Mitford letters
|
5 - 5:30pm
|
Discussion and review of MS workshop code
|
Friday, 6/24
Time | Activity |
9:30am - 11:30am | Introducing Regular Expressions: patterns in text files
- Perl regular expression patterns in oXygen
- From “plain” text to XML: comparing source documents
- Sources of text for up-conversion:
|
Lunch Break | return by ~1:45 pm |
2 - 4:15 pm |
- Regular Expressions Workshop:
Into the Weeds! Files to Up-convert from the Digital Mitford Project [to be posted]
- Steps:
- Perform document analysis: study and look for patterns you can search for. Jot them down
- Remove everything you don’t need at start and end of document
- Simplify/reduce white spaces.
- Work from the inside out: Try starting with what you can match readily and the most numerous line-by-line tags.
- When working with larger structures (chapters, scenes, speeches, line groups), try the
clopen (close-open) strategy. If you find the start of a thing, you have found the end of the previous thing. Remember if you use clopen , you have to clean it up afterwards: you’ll have an extra start-tag at the start and you’ll be missing the end tag at the end.
|
4:30 - 5:30pm | Introduction to XPath
|
Saturday, 6/25
Time | Activity |
9:30 -10am |
Wrapping up!
- Last full day: Housekeeping and travel arrangements
- Taking stock: Research questions, project ideas, applications.
- Some options we like for publishing TEI XML editions:
|
10 - 11:25am | XPath intensive with Digital Mitford Site Index
In oXygen open URL: https://digitalmitford.org/si.xml
|
11:30am - 12:30pm | XQuery or XSLT demonstration: pulling and remixing data |
Lunch Break | return by ~2pm |
2 - 3 pm | Document Data Modeling with the Digital Mitford Journal: Discussio |
3 - 4:30 pm | Class choice or project-specific work |
Sunday, 6/26
Time | Activity |
9:30am - 12:30pm | Conclusions/Farewells/Last Questions! For those who can stay, hands-on Practice with anything we have introduced in this workshop |
2pm onward | Mitford Editors work session |
Thanks to SyncroSoft for generously contributing complimentary extended trial licenses for their <oXygen/> XML editor for the use of our Coding School participants.
eXist-db is an open source native XML database and application platform. TEI Publisher is an open source product of eXist Solutions.