Working with Sustainability Roadmap

Module A1:

Site Index Workflow

  1. Coders on the ground: tag named entities. Look up if they are in SI. If not, add.
    • If problematic, write an update.
    • Known danger of producing duplicate entries.
  2. @lmwilson and @ebeshero : Collect (harvest) and “dedupe” (merge duplicates), revise, and add to official SI for new releases of the SI.
    • We might want to start naming releases of the SI!
  3. @lmwilson and @ebeshero: Macro level cleanup of entries, standardization of notes, create basis for linked open data.

  4. @ebeshero: publishing from SI on the website: (the way annotations are pulled into edition files,
    • AND separate publications profiling significant names. (not currently delivered, but will be).
  5. Lather, rinse, repeat–this is recursive. As we publish, we see issues to correct in the source code. We find duplicate entries, etc.

  6. Invade wikipedia and ODNB etc with our rich data!

Module A2

General notes: Digital projects

Phases:

1) and 2) Cathedral scale. Letters are forever and SI , Lit editions more finite and less pressing. However, in 10-15 years the senior editors will start retiring. We need to package up the project to train up a new generation of scholar-coders on this project who will sustain the letters and SI activity. And we need to Teach and Document our work for that to happen.

–Next generations may be to us as we are to Coles? Our interpretations will become dated. However, the data on named entities, document sources, etc should be stable, though software support will change.

2) Lifespan why? Because economic (time), financial (no regular funding), intellectual (scale of edition projects)

3) “Mid-morning phase”:

4) Next phase of development: (maybe after 5 years)

Module A3: Who’s visiting?

Who uses your project?

Why do they use it? what needs do they have?

What do you imagine they get out of it?

General notes: we need to be aware of other coding projects looking to learn from us or link with us

See our photo of post-it notes on our user communities

Module A4 What are the project’s sustainability priorities?

Significant Property Function on the Project Designated Communities Served
Box Workspace Workspace and teaching/research resources whole project team
Site Index integration with editions sharing new data about 19c contexts builds linked open data, available knowledge on web
Schema Code and Codebook Manage and guide the code, ease selection of tags, provide examples project team and other coders
oXygen XML Editor syntax awareness, guiding the code project team
public-facing website (Apache on Digital Ocean) and eXist-db publish our data, share code, “surface” SI data in search engines all our user groups, project team members’ accreditation of work, proofreading. Data surfacing encourages “stumbling upon” our site in quest of named entities we’re coding.
Social Media 2: Blog / Listservs Announcements, detailed posts about project process reaches potential new editors, coding school participants, involves our students
Social Media 3: Ancestry.com public genealogy trees finds people with overlapping research interests on historical people and places
GitHub repos (Digital Mitford GitHub Organization) share and refine code, with version control, our code Documentation repo as well as various development projects serves project and other coders
Social Media 1: Instagram / Twitter / Facebook Popularize/ publicize reaches librarians, archivisits/rare books fans

RANKINGS: We have organized the above properties from first to last in terms of their priorities to our project.

Eventually in course of this workshop, we’ll be matching up user-groups with properties we list in A4.

Module A5: Project Documentation Checklist

Types of Documentation? Designated a Reliable Site for the Project ? Accessible by whom? Funded how?
Box comments No(!) People whose institutions start Box accounts are forced to change accounts (we lose their identities) Project team Box corporate / university
XML comments and TEI Header Yes (but file/by/file access or XPath-able by a few) Project team project team’s time and oXygen
Google Docs No these are ephemeral: for Lisa’s use to find info quickly Project Mgr first, rest of team as we work together Google
Codebook Yes, but we need to phase this out as Digital Mitford 1.0. This Google Doc is inconsistent with our current GitHub and ODD-based documentation. A subproject is to curate and replace this. Project team and public (coders) Google
Box files in MRMS Project Support Yes reliable Project team Box corporate
File Directory Structure (in Box, but mappable elsewehere) Yes reliable Project team Us: Project team intellectual time and debate and energy

Hey! We can now run Sustainability workshops on our own!

Part II

Module B1: Who is on the project team and what are their roles?

Tools and Tech roles (Tech Infrastructure = part of the institutional base of project)

People:

** See our [Digital Mitford Staff] (http://digitalmitford.org/staff.html) which is reading from records of our staff in our site index at http://digitalmitford.org/si.xml ) We need to come back to this module in a lot more detail!

Google Sheet for Module B2 (Note: internal to Digital Mitford project team)

We just set up a Center for Open Science (osf.io) account to help centralize our various data streams for our project. This unifies all our work under a single ORCID! Digital Mitford on OSF: https://osf.io/he34x/

Module C2: File Formats and Metadata

What is your desired level? Why? At least level 3. Probably level 4. Because we need to curate the original codebook.mitford.pitt.edu for the project team to revisit its earliest decisions. We can’t have less than level 3 b/c everyone on this large-scale project needs the same kinds of access.

How high a priority is reaching your desire level? (Low / Medium / High) High. (maintain this level)

What is your current level? Why? We think, 4, but we do need to update some massive Excel spreadsheets to a more tractable long-term database format.

What resources and actions are required to reach (and maintain) your desired level?

Module C3: File Formats and Metadata

(Note: Level 4 doesn’t necessarily make sense for people who are working in progress actively on projects. These standards are developed for people curating finished projects.)

Desired level: 4

How high a priority? Low to Medium We should generate the inventory of files. We don’t necessarily need to act quickly on the format transfer from Excel. The key thing for us is that the Excel data is being regularly updated (by @ghbondar, our MS Archaeologist.)

What is your current level and why? Level 1 b/c we lack the inventory. Level 3 b/c we routinely monitor for obsolescence.

Resources and actions? Time and studying up on code scripts.

Module C4: Permissions and Data Integrity

Note: project members are typically the biggest threat to data integrity in the project!

fixity keyword: data integrity. actual quality of the bits of data as they’re embedded in the real world, and ensuring that the order of bits is maintained.

Our logs/logging systems include:

Understand: Box is a workspace, not a sustainable storage space.

What is your desired level and why? Level 4, defined for our project as visible via continuous integration

How high a priority is reaching your desired level in this area and why? Medium (because we’re kind of okay without it, but things would be easier with it. Takes time to learn the tech for continuous integration.)

What is your current level and why? Level 2.5? In between 2 and 3: We have fixity for file formats. We aren’t maintaining logs separate from the software systems (like Box or GitHub or server logs) that are automatically generated. Continuous integration would help this.

What resources and actions are required to reach your desired level? PI needs to learn and develop continuous integration, or recruit some willing and knowledgeable helper for this.