>> From VM1.NoDak.EDU!LINES-L Sun Oct 4 19:10:59 1992 >> Reply-To: LifeLines Genealogical System >> Sender: LifeLines Genealogical System >> From: ttw@PETREL.ATT.COM >> Subject: LifeLines Newsletter 1 - - - - - - - - - LIFELINES NEWSLETTER 1 4 October 1992 T. T. Wetmore IV, ed. <<>> o EDITOR'S NOTE o CLOSETS, USERS AND LIST SUBSCRIBERS o BRIEF HISTORY OF LIFELINES o PROPOSED EXTENSION TO THE LIFELINES MERGING FEATURE o HINTS: ENTERING A NEW FAMILY <<>> I intend to e-publish an irregular sequence of LifeLines newsletters, the operative words being "irregular" and "intend." Obvious topics include LifeLines news, usage hints, useful report programs, feature discussions, enhancement proposals, GEDCOM topics, user questions, users' articles and so on. In this first newsletter all contributions are mine, but I invite contribu- tions from others. As this is the first, I thought I would give you a brief status report and report on the history of LifeLines. The newsletter ends with a proposed change to LifeLines that will make database merging easier to do. Database merging is a very complex subject, but a very im- portant one. <<>> LifeLines is out of the closet. I have wanted to make it available to UNIX users from some time. First I had to get a release from my employer be- cause they own my intellectual property rights. Secondly I had to have some kind of documentation. And thirdly the system had to be stable enough that maintenance would not become a nightmare. On 23 September I announced the existence of LifeLines on the soc.roots/ROOTS-L list, and I have been sending out copies of LifeLines to interested UNIX users ever since. A flurry of problems and suggestions quickly drove the version number from 2.1.5 to 2.1.11 in little over a week. As of this writing there are 32 recipients of the LifeLines execut- able, but only they know if they are users. The 32 are made up of 24 SUN4 systems, 6 386/486 systems, and 2 SUN3 systems. And the 32 users come from all over the world, now including Australia, New Zealand, Japan, Canada, Switzerland, France, and many U.S. states. Soon after I announced LifeLines, Cliff Manis negotiated a mailing list for those interested in the system from the LISTSERV organization in North Dakota, and we now have the LINES-L list dedicated to us. At last count there were 41 subscribers to the list. Since four of the recipients of the executable are not yet subscribed, the list has 13 members who don't have the software. <<>> I wrote the first version of LifeLines during the fall and winter of 1990/91. There were two main motivations behind LifeLines: o UNIX -- there were no genealogical systems available for UNIX o power -- existing systems were inadequate to meet my needs Version one had the same basic functionality as today's system, but did not have the curses/ETI panels-based user interface; it was a typical, terse LifeLines Newsletter 2 UNIX tool. The primary goals of LifeLines were and still are: o flexibility in the data o flexibility of the generated reports o naturalness of use These goals were met the way this unabashed UNIX programmer felt were best for him. (Some might snicker to see UNIX and naturalness of use mentioned in the same place!) These goals led to a number of implementation deci- sions, including: o database -- a new B-Tree database was developed o GEDCOM -- GEDCOM was chosen as the transport and storage format o reports -- reports are interpreted from programs, not built-in o data entry -- input is direct though an editor, not via screens The B-Tree database allows records to have essentially any size. The deci- sions on GEDCOM, report generation, and data entry were fairly radical, and they have been the source of some criticism. LifeLines users seem charac- terized by a willingness to accept these three design decisions as a jus- tifiable tradeoff for the power that LifeLines provides. Since its development LifeLines has slowly evolved; its only major phase shift was the introduction of the panels-based user interface. There have been a number performance improvements. LifeLines has been subject to some creeping featurism, including many extensions to the report generator's run-time library. If I were to do it again I would make the report programming language less functional and more like a normal programming language. I wrote the report programming feature over a snowy weekend in January 1991, and was so intent to get it going that I took too many shortcuts in the language design, and I never got back to changing the language. For more than a year LifeLines had only two users, Cliff Manis and myself, so it mostly reflects what Cliff and I think we need most in genealogy software. As more users come on board, and their needs and wants take shape, LifeLines will evolve further. <<>> <> The current merging feature in LifeLines is primitive. Using tandem brows- ing you browse to the two persons you wish to merge. You then give the 'j' command ('join') and the following happens: o The GEDCOM records of the two persons, minus linking information, are merged together, and you are placed in your chosen screen editor to edit this combined record. o When you return from the editor to LifeLines, and if you confirm that you want to go ahead with the merge, the top person is updated with the combined record, and the bottom person is removed from the data- base. o All links to the bottom person are removed from any families that he or she had belonged to as a child or a parent/spouse. If this causes a family to no longer refer to any persons in the database, that family's database record is silently removed from the database. LifeLines Newsletter 3 <> Let's say we are given two genealogical databases, and let's presume that within each database persons and families are not duplicated. Let's also presume that between the databases there is duplication. The genealogical merging problem is that of combining two databases and then merging the resulting duplicate persons and families. In LifeLines the merging problem generally occurs when a GEDCOM file (representing a second database) is im- ported into a LifeLines database, and that GEDCOM file contains person or family records that have duplicates in the database. In general, this is a difficult problem, and I invite you to think about it for awhile. A long-term solution might be a recursive pattern matching al- gorithm that can apply heuristics, iteratively and interactively suggest mergings, and perform the mergings you direct it to. I consider algorithms like this to be too difficult and awkward for the LifeLines style of in- teraction (and my ability to implement!). But I also consider merging to be a major requirement of genealogical systems, and one which needs better support than LifeLines now provides. <> I am proposing to improve the LifeLines merging feature by removing the current person merge function and replacing it with a new person merge function and a family merge function. The two proposed merging functions are described below. <> Two persons can be merged if they meet two criteria. First, if they are each a parent in at least one family, then they must have the same sex. Second, if they are both children, then they must be children in the same family (don't scream yet -- wait until you read the rest). When persons are merged, the combined person will become a spouse/parent in exactly the same families that the two original persons were a spouse/parent in. This means that merging persons has no effect on the number of families. This function will probably be made available from the tandem browsing mode, just as merging is done now. <> Two families can be merged together if they meet two criteria. First, if they both have a father/husband, then these persons must be the same. Second, if they both have a mother/wife, then these persons must be the same. It is okay for one or both families to be missing one or both parents/spouses. The merged family will have the same parents/spouses as the original families, while the children in the merged family will be the simple catenation of the children from the original families. This means that merging families has no effect on the number of persons. The family merge function will probably be made available from a new tandem family browsing mode in LifeLines. <> I claim that these two merging functions are sufficient for handling all database merging, and can be used to merge in a logical and methodical way. Here are the steps that can be followed. There is an example at the end. o Start by assuming that a LifeLines database contains duplicate persons and families, either by importing a GEDCOM file, or by manual input; regardless a LifeLines database has more than one record for some per- sons and families. o Find pairs of persons to merge where at least one of the two persons is an "end of line" (not a child), or where both persons are children LifeLines Newsletter 4 in the same family. Merge these persons together. o Find pairs of families to merge that meet the criteria given above (if they both have a father/husband then he is the same person; if they both have a mother/wife then she is the same person). Merge them. As long as any merging remains to be done, it will always be possible to do either this step or the preceding step. o Repeat the two merging steps, in any order, until all merging is done. The two steps can always be carried out in some order that will accom- plish any merging situation. <> Here is an example. On the left is a group of records from an original da- tabase. On the right is a group of records from an imported GEDCOM file. H1 W1 H3 W3 |___| |___| | | F1 F3 | | H2 W2 H4 W4 |___| |___| | | F2 F4 _|_ _|_ | | | | C1 C2 C3 C4 In this example, the H's are husband/father records, the W's are wife/mother records, the F's are family records, and the C's are children records. Now assume that H2 is the same person as H4, W2 is the same per- son as W4, F2 is the same family as F4, and C2 is the same person as C3. This means that H1, W1, H3, W3, F1, F3, C1 and C4 are unique and are not to be merged. The first two steps in merging would be the following: o Merge H2 and H4 (possible because H4 is an end of line). o Merge W2 and W4 (possible because W2 is an end of line). At this point the database will have the following structure: H1 W1 H3 W3 |___| |___| | | F1 F3 | | H24 W24 | |_________| | | | | | F2 | | _|_ | | | | | | C1 C2 | |_____________| | F4 _|_ | | C3 C4 LifeLines Newsletter 5 The final two steps would be: o Merge F2 with F4 (possible because the parents/spouses are the same in both). o Merge C2 with C3 (possible because they are [after step 3] children in the same family). The database is now properly merged: H1 W1 H3 W3 |___| |___| | | F1 F3 | | H24 W24 |___________| | F24 ____|___ | | | C1 C23 C4 <> The two merging functions described are sufficient to merge lineage-linked databases as a series of individual person and family merges. Neither function leaves a database in an unstable state. This means that a large merging job can be started in one sitting and continued in another. These are major plusses. The negatives of this method involve the amount of work necessary to merge databases with large overlap. Every person and every family that is common to both databases must be individually merged. I plan to add these functions to LifeLines, along with a more powerful tan- dem browse feature, as time permits. This will take LifeLines considerably beyond the PAF level of merging, but will still leave a lot to be desired for the future. <<>> There are lots of ways to enter new persons and families into a LifeLines database. You can start with recent people and work backwards. You can start with distant ancestors and work forwards. Or you can jump all around. But no matter how you do it, you should be doing it from the per- son or family browse modes. In this hints section I outline a quick procedure for adding a new family to a database. o From the person or family browse modes use the 'n' command to add one of the parents to the database. When you return to LifeLines from the editor you will be in person browse mode displaying that person. o Now use the 'n' command again to add the other parent. When you re- turn to LifeLines you will be in person browse mode displaying the second parent. o Now use the 'a' command to create a new family record; select option '2' to create a family from one or two spouses/parents. o LifeLines creates a family record with the displayed person as one of the parents/spouses. Because you just created this person, LifeLines also assumes that the previously displayed person is to be the second parent/spouse, and will ask you to confirm this. After you do so LifeLines Newsletter 6 LifeLines has you edit the family record for the new family. When you return from the editor to LifeLines you will automatically be in the family browse mode with the new family. o Now use the 'n' command to create the oldest child (actually you can add the children in any order, but it's easy to add them from oldest to youngest). When you return from the editor you will be back in the family browse mode for the two parents. o Now use the 'a' command to add a child to the family. Because you just created a person, LifeLines assumes that that person is to become the child, and will ask you to confirm this. After you do so Life- Lines adds the person as the only child (there is another confirmation step when adding the first child, but it's obvious). o Repeat the previous steps for each additional child. For the second and all other children, LifeLines gives you a chance to choose the particular place in the list of children where the new child should go. If you add the children from oldest to youngest you always choose the option to add the child to the end of the list. By using this procedure LifeLines always guesses right about who to add as a spouse and who to add as children. As a result you never have to identi- fy these persons to LifeLines, which greatly speeds up adding information to the database.